# Two-Way ANOVA in Python

In [1]:
# Start writing code here...
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

A two-way ANOVA is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups that have been split on two factors.

The purpose of a two-way ANOVA is to determine how two factors impact a response variable, and to determine whether or not there is an interaction between the two factors on the response variable.

**Example:** A botanist wants to know whether or not plant growth is influenced by sunlight exposure and watering frequency. I am plants 30 seeds and lets them grow for two months under different conditions for sunlight exposure and watering frequency. 
After two months he recods that  height of each plant, in `inches`


|Sunlight|Water|Height(inc)|
|--------|-----|-----------|


Use the following steps to perform a two-way ANOVA to determine if watering frequency and sunlight exposure have a significant effect on plant growth, and to determine if there is any interaction effect between watering frequency and sunlight exposure.

**Step-1** Enter the data.
First, we’ll create a pandas DataFrame that contains the following three variables:

* **water:** how frequently each plant was watered: daily or weekly
* **sun:** how much sunlight exposure each plant received: low, medium, or high
* **height:** the height of each plant (in inches) after two months


In [9]:
df = pd.DataFrame({
    "water": np.repeat(['daily','weekly'],15),
    "sun": np.tile(np.repeat(['low', 'med', 'high'],5),2),
    "height": [6, 6, 6, 5, 6, 5, 5, 6, 4, 5,6, 6, 7, 8, 7, 3, 4, 4, 4, 5,4, 4, 4, 4, 4, 5, 6, 6, 7, 8]
})

In [10]:
df

Unnamed: 0,water,sun,height
0,daily,low,6
1,daily,low,6
2,daily,low,6
3,daily,low,5
4,daily,low,6
5,daily,med,5
6,daily,med,5
7,daily,med,6
8,daily,med,4
9,daily,med,5


## Step 2: Perform the two-way ANOVA.

In [16]:
import statsmodels.api as sm
from statsmodels.formula.api import ols


In [17]:
#perform two-way ANOVA Ordinary Least Squares
model = ols('height ~ C(water)+C(sun)+C(water):C(sun)',data=df).fit()

In [22]:
sm.stats.anova_lm(model,typ=2)

Unnamed: 0,sum_sq,df,F,PR(>F)
C(water),8.533333,1.0,16.0,0.000527
C(sun),24.866667,2.0,23.3125,2e-06
C(water):C(sun),2.466667,2.0,2.3125,0.120667
Residual,12.8,24.0,,


## Step 3: Interpret the results.

Least Squares Means for effect GROUP
                   Pr &gt; |t| for H0: LSMean(i)=LSMean(j)

We can see the following p-values for each of the factors in the table:
* water: **p-value** = .000527
* sun: **p_value** = 0000002
* water*sun = 0.12066712248670274

Since the p-values for water and sun are both less than .05, this means that both factors have a statistically significant effect on plant height.


And since the p-value for the interaction effect (.120667) is not less than .05, this tells us that there is no significant interaction effect between sunlight exposure and watering frequency.

### Two-way ANOVA with interactive effects
Once again, you're going to look at our dataset of **Olympic athletes**. As in previous exercises, you'll be looking at the variation in **athlete Weight**. You're going to look at athletes of either Sex competing in one of two Events: the 100 meter and 10,000 meter run. Have a look at these data in the boxplots below.

![](https://lh3.googleusercontent.com/ji3zeKi-oJZzi-hFuRnvUrG8WX63L_7kXzGXevl0K34fizJSdKXyIoS4-WzwsUeRQbuBiXHqAZAzAU17Rv4jbU6pIPolWBnO1MKuc4u-suVO9cMYDpmu-fe5APvFuemqdu3Vu3HEsyQ7vY7vWkZKi45o0R_4kuUbogHVphAocWVJ9IW7bZFI18q_Kq1fA_fuPIk7ScbatQw=w512)

This dataset is provided in your workspace as `athletes`. An ANOVA will allow you to work out which of these variables affect `Weight` and whether an interactive effect is present. `pandas`, `statsmodels`, and `plotnine` have been loaded into the workspace as `pd`, `sm`, and `p9`, respectively.

In [None]:
# Start Create your Data by using above Example

In [None]:
# Run the ANOVA
model = sm.api.formula.ols('Weight ~ ____ + ____ + ____:____', data = ____).fit()

# Extract our table
aov_table = sm.api.stats.anova_lm(____, typ=2)

# Print the table
print(____)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=f5f90ba1-3290-463e-8fc6-44108f4fa21b' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>