# One-way ANOVA

![](https://qph.fs.quoracdn.net/main-qimg-e9fc2e1fa443efd89fbb3f72fe6d7cac)
Source: [Four ways to conduct one-way ANOVA with Python](https://www.marsja.se/four-ways-to-conduct-one-way-anovas-using-python/)

In [7]:
import pandas as pd

In [3]:
data = pd.read_csv("./data/PlantGrowth.csv")
data.head()

Unnamed: 0.1,Unnamed: 0,weight,group
0,1,4.17,ctrl
1,2,5.58,ctrl
2,3,5.18,ctrl
3,4,6.11,ctrl
4,5,4.5,ctrl


### Create some summary statistics

One-way ANOVA is considered a __parametric test__ therefore, there are some assumptions we need to be aware of when relying ANOVA's F-distribution, which include:
* __Normality__ - the data need to be normally distribued _(each gorups data should be roughly normally distributed)_ for the F-statistic to be reliable. 
* __Equal Variance__ - each experimental condiction should have roughly the same variance _(i.e., homogeneity of variance)_
* __Independence__ - the onbservations _(e.g., each group)_ should be independent, both between groups and within groups. 
* the dependent variable hsould be measured on, at least, an interval scale. 

In [4]:
#Create a boxplot
data.boxplot('weight', by='group', figsize=(12, 8))
 
ctrl = data['weight'][data.group == 'ctrl']
 
grps = pd.unique(data.group.values)
d_data = {grp:data['weight'][data.group == grp] for grp in grps}
 
k = len(pd.unique(data.group))  # number of conditions
N = len(data.values)  # conditions times participants
n = data.groupby('group').size()[0] #Participants in each condition

There appears to be a difference in weight for the two treatment groups `trt1` and `trt2` in comparision on the control `ctrl` group. Additionally, `trt1` has known outliers and is negatively skewed, while `trt2` is slightly positively skewed.

### One Way ANOVA

In [6]:
from scipy import stats

F, p = stats.f_oneway(d_data['ctrl'], d_data['trt1'], d_data['trt2'])
F, p

(4.846087862380136, 0.0159099583256229)

next we need to added the `effect size` which is (e.g., _eta squared_) as well as `Degrees of Freedom`. 

In [9]:
df_between = k - 1
df_within = N - k
df_total = N - 1

### Calculating ANOVA 
![](https://i.stack.imgur.com/mMjTj.png)

First we start by calculating the __Sum of Squares Between (SSb)__ and the __Sum of Squares Within (SSw)__ and the __Sume of Squared Total (SSt)__. 

In [16]:
# Sum of Squares Between (SSb) - variablility due to interactions between the groups

SSb = (sum(data.groupby('group').sum()['weight']**2)/n) - (data['weight'].sum()**2)/N
SSb

3.766340000000014

In [20]:
# Sum of Squares Within - variability in the data due to differences within people
sum_y_squared = sum([value**2 for value in data['weight'].values])
SSw = sum_y_squared - sum(data.groupby('group').sum()['weight']**2)/n
SSw

10.492090000000076

In [21]:
# Sum of Squares total
SSt = sum_y_squared - (data['weight'].sum()**2)/N
SSt

14.25843000000009

In [22]:
MSb = SSb / df_between # Mean Square Between (MSb) - SSw / DFb

MSw = SSw / df_within # Mean Square Within (MSw)

F = MSb / MSw # F-value
F


4.846087862380118

To __reject the null ypotehsis__ we check if the _Obtained F-value_ is above the critical value for rejecting the null hypothese. We woucl look it up in the F-value table based on the `df_within` and `df_between`. however we will just us `SciPy` for obtaining the `p-value`

In [25]:
# p-value
p = stats.f.sf(F, df_between, df_within)
p

0.015909958325623124

Since the p-value is below 0.05 we can reject the null hypothesis

Next we are going to calculate the effect size to see how practical theis significance finding was , which w ecan use the commonly used `eta-squared (η²)`

In [27]:
eta_sqrd = SSb / SSt
eta_sqrd

0.264148296832119

`eta_aquared` can be somewhat biased because it is based purely on sum of suares from the samples. No adjustment is made for the eft that we are aiming ot do this to estimate the effect size in the population. thus we can use the less biased effect size measure `Omega squared`

In [28]:
om_sqrd = (SSb - (df_between * MSw))/(SSt + MSw)
om_sqrd

0.20407884598997017

The results we get from both the SciPy and the above method can be reported according to APA style; F(2, 27) = 4.846, p =  .016, η² =  .264. If you want to report Omega Squared: ω2 = .204

### Using Statmodesl

In [30]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
 
mod = ols('weight ~ group',
                data=data).fit()
                
aov_table = sm.stats.anova_lm(mod, typ=2)
print( aov_table)

            sum_sq    df         F   PR(>F)
group      3.76634   2.0  4.846088  0.01591
Residual  10.49209  27.0       NaN      NaN


Here we can see the sum_squared, the degrees of freedom, the F value and P value.

Also, on effect size was calculated when we used statmodels. We can calc `est_squared`  we can use the sum of squares from the table. 

In [31]:
esq_sm = aov_table['sum_sq'][0] / (anova_table['sum_sq'][0] + aov_table['sum_sq'][1])
esq_sm

0.2641482968321193