### ANOVA: Analysis of Variance
- The Analysis of Variance  (ANOVA) is a statistical method used to test whether there are significant differences between the means of two or more groups. ANOVA returns two parameters:</p>

- **F-test score:** ANOVA assumes the means of all groups are the same, calculates how much the actual means deviate from the assumption, and reports it as the F-test score. A larger score means there is a larger difference between the means.

- **P-value:**  P-value tells how statistically significant is our calculated score value.</p>

- If our price variable is strongly correlated with the variable we are analyzing, expect ANOVA to return a sizeable F-test score and a small p-value.

In [1]:
from scipy import stats
import pandas as pd

In [2]:
df = pd.read_csv('data/car_dataset.csv')

In [3]:
grouped = df[['drive-wheels', 'price']].groupby('drive-wheels')

In [4]:
f_val, p_val = stats.f_oneway(
    grouped.get_group('fwd')['price'], 
    grouped.get_group('rwd')['price'], 
    grouped.get_group('4wd')['price']
)
 
print('ANOVA results: F=', f_val, ', P=', p_val)   

ANOVA results: F= 67.95406500780399 , P= 3.3945443577151245e-23


This is a great result, with a large F test score showing a strong correlation and a P value of almost 0 implying almost certain statistical significance. But does this mean all three tested groups are all this highly correlated?

In [5]:
f_val, p_val = stats.f_oneway(
    grouped.get_group('fwd')['price'], 
    grouped.get_group('rwd')['price']
)
 
print('ANOVA results: F=', f_val, ', P=', p_val)

ANOVA results: F= 130.5533160959111 , P= 2.2355306355677845e-23


In [6]:
f_val, p_val = stats.f_oneway(
    grouped.get_group('4wd')['price'], 
    grouped.get_group('rwd')['price']
)
   
print('ANOVA results: F=', f_val, ', P=', p_val)

ANOVA results: F= 8.580681368924756 , P= 0.004411492211225333


In [7]:
f_val, p_val = stats.f_oneway(
    grouped.get_group('4wd')['price'], 
    grouped.get_group('fwd')['price']
)  
 
print('ANOVA results: F=', f_val, ', P=', p_val) 

ANOVA results: F= 0.665465750252303 , P= 0.41620116697845666


There is a not statistical significance between 4wd and fwd, even so, that metric can be useful for machine learning algorithms.