# Module 5: Two-way ANOVA

## Renal function
Assume you are performing a study on renal function. You are measuring the urine volume (mL) of subjects one hour after they receive a treatment. The subjects consist of 9 males and 9 females where 3 receive a sugar pill, 3 receive a salt pill, and 3 receive a caffeine pill. The collected data is stored to a .csv file.

Before starting the Python, discuss briefly: what are the null hypotheses for this study?

In [None]:
import scipy.stats as stats
import numpy as np
import pandas as pd
import statsmodels.api as sm              # A new stats package - you'll fine there are a lot
from statsmodels.formula.api import ols

df = pd.read_csv("../data/urine_volume_data.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   treatment  18 non-null     object
 1   sex        18 non-null     object
 2   volume     18 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 560.0+ bytes


## Calculating sum of squares
Assume the data passes the assumptions necessary to perform a two-way ANOVA. Fill out the table below:

| | Sum of squares (SS) | Degrees of freedom (DF) |
| --- | --- | --- |
| Total | | |
| Cells (groups) | | |
| Error (within-cells) | | |
| Factor A (treatment) | | |
| Factor B (sex) | | |
| A x B interaction | | |

In [None]:
# statsmodels.api will calculate some of the SS for us. Calculate the rest.
model = ols('volume ~ C(treatment) + C(sex) + C(treatment):C(sex)', data=df).fit()
ss_results = sm.stats.anova_lm(model, typ=2)

ss_factorA = ss_results['sum_sq']['C(treatment)']
ss_factorB = ss_results['sum_sq']['C(sex)']
ss_AxB = ss_results['sum_sq']['C(treatment):C(sex)']
ss_error = ss_results['sum_sq']['Residual']
ss_groups = ss_factorA+ss_factorB+ss_AxB
ss_total = ss_groups+ss_error

print('Sum of squares:')
print('Total: %.2f' % ss_total)
print('Groups: %.2f' % ss_groups)
print('Error: %.2f' % ss_error)
print('Factor A: %.2f' % ss_factorA)
print('Factor B: %.2f' % ss_factorB)
print('AxB interaction: %.2f' % ss_AxB)

Sum of squares:
Total: 7544.44
Groups: 6327.78
Error: 1216.67
Factor A: 5386.11
Factor B: 555.56
AxB interaction: 386.11


Using your results from the part above, fill out the table below for α = 0.05: Which hypotheses can you reject?

| | Mean sum of squares (MSS) | F-statistic | F-critical |
| --- | --- | --- | --- |
| Factor A | | | |
| Factor B | | | |
| AxB interaction | | | |
| Error (within cells) | | N/A | N/A |

In [None]:
# Use ss_results again - there's a lot in that data frame
mss_factorA = ss_results['sum_sq']['C(treatment)']/ss_results['df']['C(treatment)']
mss_factorB = ss_results['sum_sq']['C(sex)']/ss_results['df']['C(sex)']
mss_AxB = ss_results['sum_sq']['C(treatment):C(sex)']/ss_results['df']['C(treatment):C(sex)']
mss_error = ss_results['sum_sq']['Residual']/ss_results['df']['Residual']

print('Mean sum of squares:')
print('Factor A: %.2f' % mss_factorA)
print('Factor B: %.2f' % mss_factorB)
print('AxB interaction: %.2f' % mss_AxB)
print('AxB interaction: %.2f' % mss_error)

print('F-statistic:')
print('Factor A: %.2f' % ss_results['F']['C(treatment)'])
print('Factor B: %.2f' % ss_results['F']['C(sex)'])
print('AxB interaction: %.2f' % ss_results['F']['C(treatment):C(sex)'])

df_error = ss_results['df']['Residual']

alpha = 0.05
# Remember this function?
f_factorA = stats.f.ppf(1-alpha,ss_results['df']['C(treatment)'],df_error)
f_factorB = stats.f.ppf(1-alpha,ss_results['df']['C(sex)'],df_error)
f_AxB = stats.f.ppf(1-alpha,ss_results['df']['C(treatment):C(sex)'],df_error)

print('F-critical:')
print('Factor A: %.2f' % f_factorA)
print('Factor B: %.2f' % f_factorB)
print('AxB interaction: %.2f' % f_AxB)

Mean sum of squares:
Factor A: 2693.06
Factor B: 555.56
AxB interaction: 193.06
AxB interaction: 101.39
F-statistic:
Factor A: 26.56
Factor B: 5.48
AxB interaction: 1.90
F-critical:
Factor A: 3.89
Factor B: 4.75
AxB interaction: 3.89
