# ANOVA

This notebook shows several examples of one-way and two-way ANOVA.

## One-way ANOVA

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. In Pingouin, the one-way ANOVA is implemented in the `anova` function. The ANOVA test has several assumptions that must be satisfied to provide accurate results:

- The samples must be independent (i.e. by opposition with repeated measurements in a single group, see `rm_anova`).
- Each sample should be normally distributed.
- The variance of the samples are all equal (= homoscedasticity).

*Note: Assumptions #2 and #3 can be checked using the `test_normality` and `test_homoscedasticity` functions.*

### Load data

For this first example, we are going to load the McClave (1991) dataset which compares the pain threshold of subjects as a function of their hair color.

In [1]:
import numpy as np
import pandas as pd
from pingouin.datasets import read_dataset

df = read_dataset('mcclave1991')

df.groupby('Hair color')['Pain threshold'].agg(['mean', 'std', 'count']).round(2)

Unnamed: 0_level_0,mean,std,count
Hair color,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dark Blond,51.2,9.28,5
Dark Brunette,37.4,8.32,5
Light Blond,59.2,8.53,5
Light Brunette,42.5,5.45,4


### Run the ANOVA

The detailed ANOVA summary table includes the following columns:

- SS : sums of squares
- DF : degrees of freedom
- MS : mean squares (= SS / DF)
- F : F-value (test statistic)
- p-unc : uncorrected p-values
- np2 : partial eta-square effect size \*

\* *In one-way ANOVA, partial eta-square is the same as eta-square and generalized eta-square.*

In the example below, there is a main effect of group (F(3, 15) = 6.79, p = .004)), so we can reject the null hypothesis that the groups have equal means.

In [2]:
from pingouin import anova

aov = anova(data=df, dv='Pain threshold', between='Hair color', detailed=True)
aov

Unnamed: 0,Source,SS,DF,MS,F,p-unc,np2
0,Hair color,1360.726316,3,453.575439,6.791407,0.004114,0.575962
1,Within,1001.8,15,66.786667,,,


### Tukey post-hocs

Often, you will want to compute post-hoc tests to look at the pairwise differences between the groups. For one-way ANOVA, this can be achieved using the `pairwise_tukey` function.

In [3]:
from pingouin import pairwise_tukey

pairwise_tukey(data=df, dv='Pain threshold', between='Hair color')

Unnamed: 0,A,B,mean(A),mean(B),diff,SE,tail,T-val,p-tukey,efsize,eftype
0,Dark Blond,Dark Brunette,51.2,37.4,13.8,5.169,two-sided,2.67,0.074168,1.423,hedges
1,Dark Blond,Light Blond,51.2,59.2,-8.0,5.169,two-sided,-1.548,0.436903,-0.825,hedges
2,Dark Blond,Light Brunette,51.2,42.5,8.7,5.482,two-sided,1.587,0.416008,0.846,hedges
3,Dark Brunette,Light Blond,37.4,59.2,-21.8,5.169,two-sided,-4.218,0.003713,-2.248,hedges
4,Dark Brunette,Light Brunette,37.4,42.5,-5.1,5.482,two-sided,-0.93,0.769703,-0.496,hedges
5,Light Blond,Light Brunette,59.2,42.5,16.7,5.482,two-sided,3.046,0.036653,1.623,hedges


### Power of the ANOVA
In some cases, it might be useful to compute the power of the test, i.e. the probability that we correctly reject the null hypothesis when it is indeed false (with higher power indicating higher reliability). This can be calculated easily from the ANOVA summary using the `anova_power` function.

In [4]:
from pingouin import anova_power

achieved_power = anova_power(eta=aov.loc[0, 'np2'], ntot=df.shape[0], ngroups=df['Hair color'].unique().size)
print(achieved_power)

0.973


### Assumptions check

Finally, to check that (1) each sample is normally distributed and (2) the variance of the samples are all equal, we can use the `test_normality` and `test_homoscedasticity` functions, respectively.

In [5]:
from pingouin import test_normality
for group in df['Hair color'].unique():
    print(test_normality(df[df['Hair color'] == group]['Pain threshold'].values))

(True, 0.983)
(True, 0.664)
(True, 0.598)
(True, 0.324)


***

## One-way repeated measures ANOVA

The one-way repeated measures ANOVA is the equivalent of the one-way ANOVA, but for related, not independent groups. It is sometimes called *within-subjects ANOVA*. In Pingouin, the one-way repeated measures ANOVA is implemented in the `rm_anova` function. The repeated measures ANOVA test has several assumptions that must be satisfied to provide accurate results:

- Normality: for each level of the within-subjects factor, the dependent variable must have a normal distribution.
- Sphericity: difference scores computed between two levels of a within-subjects factor must have the same variance for the comparison of any two levels. (This assumption only applies if there are more than 2 levels of the independent variable.)
- Randomness: cases should be derived from a random sample, and scores from different participants should be independent of each other.

### Load data

For this first example, we are going to load the Bugs (Ryan et al 2013) dataset in which participants were asked to rate their hostility (desire to kill) towards different types of insects. For this first one-way repeated measures ANOVA, we are only going to focus on the Disgustingness rating factor. For that we need to aggregate our data:

In [6]:
df = read_dataset('ryan2013')
df = df.groupby(['Subject', 'Disgustingness']).agg({'Gender': 'first', 'DesireToKill': 'mean'}).reset_index()
df.head()

Unnamed: 0,Subject,Disgustingness,Gender,DesireToKill
0,1,High,Female,9.5
1,1,Low,Female,6.0
2,2,High,Female,10.0
3,2,Low,Female,10.0
4,3,High,Female,10.0


### Run the ANOVA

In [7]:
from pingouin import rm_anova

rm_anova(data=df, dv='DesireToKill', within='Disgustingness', detailed=True).round(3)

Unnamed: 0,Source,SS,DF,MS,F,p-unc,p-GG-corr,np2,sphericity,W-Mauchly,X2-Mauchly,DF-Mauchly
0,Disgustingness,27.485,1,27.485,12.044,0.001,0.001,0.116,False,1.0,-0.0,0
1,Error,209.952,92,2.282,,,,,,,,0
