## Python Repeated Measures ANOVA using Statsmodels AnovaRM

Based on: https://github.com/marsja/jupyter/blob/master/Python%20repeated%20measures%20ANOVA.ipynb

This code example shows you how to carry out a repeated measures ANOVA test using statsmodels AnovaRM. This notebook contains the code for the YouTube [Tutorial on how to carry out repeated measures anova](https://youtu.be/_X3g-dvlMF0) in Python. You can learn both one-way and two-way anova for repeated measures by watching the video.

In [None]:
import pandas as pd
from statsmodels.stats.anova import AnovaRM

help(AnovaRM)

First we will load some data for 2 groups and see if there is a difference between their means.

In [None]:
df = pd.read_csv('data/rmAOV1way.csv')
df.head()

In [None]:
df.tail()

According to the data source, these columns represent the subject identifier number `Sub_id`, the puzzle completion time `rt` which is the dependent variable, and the noise level `cond` which is the independent variable. Note, there are two levels of cond (using df.cond.unique() will show us noise and quiet). 

The original data source does not specify what the measurements represent, so to make it easy to understand let's say that `rt`, the dependent variable, is how long it takes each subject to complete a puzzle. Each subject is given 2 different puzzles of the same type, and completes one in a quiet setting and one in a noisy setting. 

We will perform ANOVA on these data to determine if there is a relationship between noise level and puzzle completion time.

In [None]:
aovrm = AnovaRM(df, 'rt', 'Sub_id', within=['cond'])
res = aovrm.fit()

print(res)

Based on this analysis, the P-value is so close to 0 that we can't see any value. This is very strong evidence that there is a difference in mean for the 2 groups.

Now we will compare 2 different variables, each with 2 categories. This is called 2-way ANOVA. 

In [None]:
df2way = pd.read_csv('data/rmAOV2way.csv')
df2way.head()

In [None]:
df2way.tail()

Let's say that in this experiment, as before researchers were giving each subject a puzzle to complete in a quiet setting and a noisy setting. Additionally, in this version the also try with 3 different levels of lighting: lights up, middle, and down. This gives a total of 6 different treatments for each of the 60 subjects, for a total of 360 observations.

Our variables are: subject identifier `Sub_id`, puzzle completion time `rt`, noise level `iv1`, and lighting level `iv2`.

We are going to test for 3 null hypotheses:

* H01: Variable iv1 (noise) has no effect on rt
* H02: Variable iv2 (light) has no effect on rt
* H03: The two variables noise and light are independent of each other (no interaction).

Perform the analysis:

In [None]:
aovrm2way = AnovaRM(df2way, 'rt', 'Sub_id', within=['iv1', 'iv2'])
res2way = aovrm2way.fit()

print(res2way)

### Conclusion
Now we can look at our 3 P values for our 3 hypotheses.

For H01, P is approximately 0.
For H02, P is approximately 0.
For H03, P is 0.159.

State what conclusions we can draw from these results.