# ANOVA (Analysis Of Variance)
An ANOVA test is a way to find out if survey or experiment results are significant. In other words, they help you to figure out if you need to reject the null hypothesis or accept the alternate hypothesis. 

# One Way ANOVA

A one way ANOVA is used to compare two means from two independent (unrelated) groups using the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a significant result means that the two means are unequal. Examples of when to use a one way ANOVA

Situation 1: You have a group of individuals randomly split into smaller groups and completing different tasks. For example, you might be studying the effects of tea on weight loss and form three groups: green tea, black tea, and no tea.


In [25]:
import pandas as pd
from scipy import stats

In [26]:
data = pd.read_csv("C:\\Users\\karth\\Desktop\\Diet.csv") 

In [27]:
data

Unnamed: 0,Person,gender,Age,Height,Pre.weight,Diet,weight loss
0,25,Male,41,171,60,black tea,60.0
1,26,Male,32,174,103,black tea,103.0
2,1,Male,22,159,58,green tea,54.2
3,2,Male,46,192,60,green tea,54.0
4,3,Male,55,170,64,green tea,63.3
...,...,...,...,...,...,...,...
73,74,Female,35,183,83,no tea,80.2
74,75,Female,49,177,84,no tea,79.9
75,76,Female,28,164,85,no tea,79.7
76,77,Female,40,167,87,no tea,77.8


In [28]:
data_anova = data[['weight loss','Diet']]

In [29]:
groups = pd.unique(data_anova.Diet.values)

In [30]:
print(groups)

['black tea' 'green tea' 'no  tea' 'no tea']


In [31]:
data_data = {group:data_anova['weight loss'][data_anova.Diet == group] for group in groups}

In [32]:
data_data

{'black tea': 0      60.0
 1     103.0
 16     60.1
 17     56.0
 18     57.3
 19     56.7
 20     55.0
 21     62.4
 22     60.3
 23     59.4
 24     62.0
 25     64.0
 26     63.8
 27     63.3
 28     72.7
 29     77.5
 55     66.8
 56     72.6
 57     69.2
 58     72.5
 59     72.7
 60     76.3
 61     73.6
 62     72.9
 63     71.1
 64     81.4
 65     75.7
 Name: weight loss, dtype: float64,
 'green tea': 2     54.2
 3     54.0
 4     63.3
 5     61.1
 6     62.2
 7     64.0
 8     65.0
 9     60.5
 10    68.1
 11    66.9
 12    70.5
 13    69.0
 14    68.4
 15    81.1
 45    71.6
 46    70.9
 47    69.5
 48    73.9
 49    71.0
 50    77.6
 51    79.1
 52    81.5
 53    81.9
 54    84.5
 Name: weight loss, dtype: float64,
 'no  tea': 30    53.0
 31    56.4
 32    60.6
 33    58.2
 34    58.2
 35    61.6
 36    60.2
 37    61.8
 38    63.0
 39    62.7
 40    71.1
 41    64.4
 42    68.9
 43    68.7
 44    71.0
 Name: weight loss, dtype: float64,
 'no tea': 66    68.5
 67    72.1
 6

In [36]:
F, p = stats.f_oneway(data_data['green tea'], data_data['black tea'], data_data['no tea'])

In [37]:
print(p)

0.04349286498564399


In [38]:
if p<0.05:
    print("Accepting HA and there is a relationship between tea types and weight loss")
else:
    print("Accepting  H0 and there is a no relationship between tea types and weight loss")

Accepting HA and there is a relationship between tea types and weight loss


# Two Way Anova
In Two Way ANOVA, there are two independents. Use a two way ANOVA when you have one measurement variable (i.e. a quantitative variable) and two nominal variables. In other words, if your experiment has a quantitative outcome and you have two categorical explanatory variables, a two way ANOVA is appropriate.

For example, you might want to find out if there is an interaction between no. of questions solved and gender for noise level.

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv("C:\\Users\\karth\\Desktop\\2 way.csv") 

In [3]:
data

Unnamed: 0,Gender,Level,ques
0,Male,Low Noise,10
1,Male,Low Noise,12
2,Male,Low Noise,11
3,Male,Low Noise,9
4,Female,Low Noise,12
5,Female,Low Noise,13
6,Female,Low Noise,10
7,Female,Low Noise,13
8,Male,High Noise,4
9,Male,High Noise,5


In [4]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [5]:
#perform two-way ANOVA
model = ols('ques ~ C(Gender) + C(Level) + C(Gender):C(Level)', data=data).fit()
sm.stats.anova_lm(model, typ=2)

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Gender),20.166667,1.0,9.810811,0.00575844
C(Level),200.333333,2.0,48.72973,5.439849e-08
C(Gender):C(Level),16.333333,2.0,3.972973,0.03722434
Residual,37.0,18.0,,


Interpret the results.

    Fcal>Fcrict
    
Since the Fcal>Fcri for Gender and Noise Level, we reject H0, this means that both factors have a statistically significant effect on number of questions solved.

And the interaction effect Gender*Noise Level Fcal>Fcrict, we reject H0 And hence different genders behave differently to noise