In [1]:
import numpy as np 
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Two-Way ANOVA

#### Example
A physiologist was interested in learning whether smoking history and different types of stress tests influence the timing of a subject's maximum oxygen uptake, as measured in minutes. The researcher classified a subject's smoking history as either heavy smoking, moderate smoking, or non-smoking. He was interested in seeing the effects of three different types of stress tests — a test performed on a bicycle, a test on a treadmill, and a test on steps. The physiologist recruited 9 non-smokers, 9 moderate smokers, and 9 heavy smokers to participate in his experiment, for a total of $n = 27$ subjects. He then randomly assigned each of his recruited subjects to undergo one of the three types of stress tests.

Is there sufficient evidence at the $\alpha = 0.05$ significance level to conclude that smoking history has an effect on the time to maximum oxygen uptake? Is there sufficient evidence at the $\alpha = 0.05$ significance level to conclude that the type of stress test has an effect on the time to maximum oxygen uptake? And, is there evidence of an interaction between smoking history and the type of stress test? (Don't forget to define the null hypothesis $H_0$ and the alternative hypothesis $H_1$.)

## Hypotheses:
Main Effect of Smoking History
- $H_0: \mu_{\text{Non-Smoker}} = \mu_{\text{Moderate Smoker}} = \mu_{\text{Heavy Smoker}}$ (No difference in the timing of maximum oxygen uptake across smoking history groups)
- $H_1: \text{At least one group mean differs.}$ (At least one smoking history group differs in the timing of maximum oxygen uptake)

Main Effect of Stress Test Type
- $H_0: \mu_{\text{Bicycle}} = \mu_{\text{Treadmill}} = \mu_{\text{Step}}$ (No difference in the timing of maximum oxygen uptake across stress test types)
- $H_1: \text{At least one test mean differs.}$ (At least one stress test type differs in the timing of maximum oxygen uptake.)

Interaction Effect between Smoking History and Stress Test Type
- $H_0: \text{The effect of Smoking History on timing is the same across all stress test types.}$ (No interaction between smoking history and stress test type in affecting the timing of maximum oxygen uptake)
- $H_1: \text{The effect of Smoking History on timing differs by Stress Test type.} $ (The effect of smoking history on timing of maximum oxygen uptake depends on the type of stress test)

Independent Variables (Factors):
- Smoking Status (Nonsmoker, Moderate, Heavy)
- Exercise Test Type (Bicycle, Treadmill, Step Test)

Dependent Variable: 
- The numerical scores from the tests

In [2]:
# Here is the data smoking history vs test on bicyle, treadmill and step:
# Bicycle Test
bicycle_nonsmoker = [12.8, 13.5, 11.2]
bicycle_moderate = [10.9, 11.1, 9.8]
bicycle_heavy = [8.7, 9.2, 7.5]

# Treadmill Test
treadmill_nonsmoker = [16.2, 18.1, 17.8]
treadmill_moderate = [15.5, 13.8, 16.2]
treadmill_heavy = [14.7, 13.2, 8.1]

# Step Test
step_nonsmoker = [22.6, 19.3, 18.9]
step_moderate = [20.1, 21.0, 15.9]
step_heavy = [16.2, 16.1, 17.8]

### Reorganize data if necessary

In [3]:
data = {
    "Smoking": ["Nonsmoker"] * 3 + ["Moderate"] * 3 + ["Heavy"] * 3 +
               ["Nonsmoker"] * 3 + ["Moderate"] * 3 + ["Heavy"] * 3 +
               ["Nonsmoker"] * 3 + ["Moderate"] * 3 + ["Heavy"] * 3,
    "Test": ["Bicycle"] * 9 + ["Treadmill"] * 9 + ["Step"] * 9,
    "Score": [12.8, 13.5, 11.2, 10.9, 11.1, 9.8, 8.7, 9.2, 7.5,
              16.2, 18.1, 17.8, 15.5, 13.8, 16.2, 14.7, 13.2, 8.1,
              22.6, 19.3, 18.9, 20.1, 21.0, 15.9, 16.2, 16.1, 17.8]
}

# Convert to DataFrame
data = pd.DataFrame(data)
data

Unnamed: 0,Smoking,Test,Score
0,Nonsmoker,Bicycle,12.8
1,Nonsmoker,Bicycle,13.5
2,Nonsmoker,Bicycle,11.2
3,Moderate,Bicycle,10.9
4,Moderate,Bicycle,11.1
5,Moderate,Bicycle,9.8
6,Heavy,Bicycle,8.7
7,Heavy,Bicycle,9.2
8,Heavy,Bicycle,7.5
9,Nonsmoker,Treadmill,16.2


In [4]:
df = pd.DataFrame(data)

# Step 2: Calculate the overall mean
overall_mean = df['Score'].mean()

# Step 3: Calculate the sums of squares for each source of variation

# 3.1: Between groups (Smoking)
smoking_means = df.groupby('Smoking')['Score'].mean()
smoking_ss = sum(len(df[df['Smoking'] == level]) * (smoking_means[level] - overall_mean) ** 2 for level in smoking_means.index)

# 3.2: Between groups (Test)
test_means = df.groupby('Test')['Score'].mean()
test_ss = sum(len(df[df['Test'] == level]) * (test_means[level] - overall_mean) ** 2 for level in test_means.index)

# 3.3: Interaction (Smoking * Test)
interaction_ss = 0
for smoking_level in smoking_means.index:
    for test_level in test_means.index:
        subset = df[(df['Smoking'] == smoking_level) & (df['Test'] == test_level)]
        interaction_mean = subset['Score'].mean()
        interaction_ss += len(subset) * (interaction_mean - smoking_means[smoking_level] - test_means[test_level] + overall_mean) ** 2

# 3.4: Total sum of squares
total_ss = sum((df['Score'] - overall_mean) ** 2)

# 3.5: Error (Residual) sum of squares
error_ss = total_ss - smoking_ss - test_ss - interaction_ss

# Step 4: Calculate degrees of freedom (df)
df_smoking = len(smoking_means) - 1
df_test = len(test_means) - 1
df_interaction = df_smoking * df_test
df_error = len(df) - (df_smoking + df_test + df_interaction + 1)

# Step 5: Calculate Mean Squares (MS)
ms_smoking = smoking_ss / df_smoking
ms_test = test_ss / df_test
ms_interaction = interaction_ss / df_interaction
ms_error = error_ss / df_error

# Step 6: Calculate F-statistics
f_smoking = ms_smoking / ms_error
f_test = ms_test / ms_error
f_interaction = ms_interaction / ms_error

# Step 7: Calculate p-values
p_smoking = 1 - st.f.cdf(f_smoking, df_smoking, df_error)
p_test = 1 - st.f.cdf(f_test, df_test, df_error)
p_interaction = 1 - st.f.cdf(f_interaction, df_interaction, df_error)

# Step 8: Create an ANOVA table
anova_table = pd.DataFrame({
    'Source': ['Smoking', 'Test', 'Smoking * Test', 'Error'],
    'SS': [smoking_ss, test_ss, interaction_ss, error_ss],
    'df': [df_smoking, df_test, df_interaction, df_error],
    'MS': [ms_smoking, ms_test, ms_interaction, ms_error],
    'F': [f_smoking, f_test, f_interaction, np.nan],
    'p-value': [p_smoking, p_test, p_interaction, np.nan]
})

print(anova_table)


           Source          SS  df          MS          F       p-value
0         Smoking   84.898519   2   42.449259  12.896703  3.347912e-04
1            Test  298.071852   2  149.035926  45.279284  9.472739e-08
2  Smoking * Test    2.814815   4    0.703704   0.213795  9.273412e-01
3           Error   59.246667  18    3.291481        NaN           NaN


### Compare your from-scratch results with two-way ANOVA from the library statsmodels

In [5]:

# Fit two-way ANOVA model
model = ols("Score ~ Smoking*Test", data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

                  sum_sq    df          F        PR(>F)
Smoking        84.898519   2.0  12.896703  3.347912e-04
Test          298.071852   2.0  45.279284  9.472739e-08
Smoking:Test    2.814815   4.0   0.213795  9.273412e-01
Residual       59.246667  18.0        NaN           NaN


In [6]:
print(p_smoking < 0.05)
print(p_test < 0.05)
print(p_interaction < 0.05)


True
True
False


### Please explain your results using $\alpha = 0.05$ significance level.

- Smoking: 

Since the p-value is much smaller than 0.05, we reject the null hypothesis. This means there is a significant main effect of Smoking on the scores.

- Test: 

The p-value is very small, so we reject the null hypothesis. There is a significant main effect of the Test type on the scores.

- Interaction:

Since the p-value is greater than 0.05, we fail to reject the null hypothesis. This indicates there is no significant interaction effect between Smoking and Test.


# General Factorial Design

The quality control department of a fabric finishing plant is studying the effect of several factors on the dyeing of cotton-synthetic cloth used to manufacture men's shirts. Three operators, three cycle times, and two temperatures were selected, and three small specimens of cloth were dyed under each set of conditions. The finished cloth was compared to a standard, and a numerical score was assigned. 

Define the levels of each factor as the following:
- Temperature takes values 300 and 350.
- Cycle Time  takes values 40, 50, 60. 
- Operator takes values 1, 2, 3.

Define the model for this experiment and calculate the parameters $\alpha_i, \beta_j, \gamma_k$ for the main effects and interaction effects into the model. Find confidence interval of $0.05$ significance level for each parameter. Then, create the ANOVA table. Explain your results.

### Data

In [7]:
data = {
    'Temperature': [300, 300, 300, 350, 350, 350] * 9,
    'CycleTime': [40] * 18 + [50] * 18 + [60] * 18,
    'Operator': [1, 2, 3] * 18,
    'Score': [
        23, 27, 31, 24, 38, 34, 
        24, 28, 32, 23, 36, 36, 
        25, 26, 29, 28, 35, 39,
        36, 34, 33, 37, 34, 34, 
        35, 38, 34, 39, 38, 36, 
        36, 39, 35, 35, 36, 31,
        28, 35, 26, 26, 36, 28, 
        24, 35, 27, 29, 37, 26, 
        27, 34, 25, 25, 34, 24
    ]
}

df = pd.DataFrame(data)


print(df)


    Temperature  CycleTime  Operator  Score
0           300         40         1     23
1           300         40         2     27
2           300         40         3     31
3           350         40         1     24
4           350         40         2     38
5           350         40         3     34
6           300         40         1     24
7           300         40         2     28
8           300         40         3     32
9           350         40         1     23
10          350         40         2     36
11          350         40         3     36
12          300         40         1     25
13          300         40         2     26
14          300         40         3     29
15          350         40         1     28
16          350         40         2     35
17          350         40         3     39
18          300         50         1     36
19          300         50         2     34
20          300         50         3     33
21          350         50      

### Calculations

In [8]:
#overall mean
overall_mean = df['Score'].mean()
print("Overall Mean", overall_mean)

#main effects
# Temperature
temperature_means = df.groupby('Temperature')['Score'].mean()
alpha = {temp: temp_mean - overall_mean for temp, temp_mean in temperature_means.items()}

# Cycle Time
cycle_time_means = df.groupby('CycleTime')['Score'].mean()
beta = {cycle: cycle_mean - overall_mean for cycle, cycle_mean in cycle_time_means.items()}

# Operator
operator_means = df.groupby('Operator')['Score'].mean()
gamma = {op: op_mean - overall_mean for op, op_mean in operator_means.items()}

# interaction effects

# Interaction effect Temperature * Cycle Time
interaction_alpha_beta = {}
for temp in temperature_means.index:
    for cycle in cycle_time_means.index:
        subset = df[(df['Temperature'] == temp) & (df['CycleTime'] == cycle)]
        interaction_mean = subset['Score'].mean()
        interaction_alpha_beta[(temp, cycle)] = interaction_mean - alpha[temp] - beta[cycle] - overall_mean

# Interaction effect Temperature * Operator
interaction_alpha_gamma = {}
for temp in temperature_means.index:
    for op in operator_means.index:
        subset = df[(df['Temperature'] == temp) & (df['Operator'] == op)]
        interaction_mean = subset['Score'].mean()
        interaction_alpha_gamma[(temp, op)] = interaction_mean - alpha[temp] - gamma[op] - overall_mean

# Interaction effect  Cycle Time * Operator
interaction_beta_gamma = {}
for cycle in cycle_time_means.index:
    for op in operator_means.index:
        subset = df[(df['CycleTime'] == cycle) & (df['Operator'] == op)]
        interaction_mean = subset['Score'].mean()
        interaction_beta_gamma[(cycle, op)] = interaction_mean - beta[cycle] - gamma[op] - overall_mean

# Interaction effect Temperature * Cycle Time * Operator
interaction_alpha_beta_gamma = {}
for temp in temperature_means.index:
    for cycle in cycle_time_means.index:
        for op in operator_means.index:
            subset = df[(df['Temperature'] == temp) & (df['CycleTime'] == cycle) & (df['Operator'] == op)]
            interaction_mean = subset['Score'].mean()
            interaction_alpha_beta_gamma[(temp, cycle, op)] = interaction_mean - alpha[temp] - beta[cycle] - gamma[op] - overall_mean

# Model Results
print("Main Effects for Temperature")
print(alpha)

print("Main Effects for Cycle Time")
print(beta)

print("Main Effects for Operator")
print(gamma)

print("Interaction Effects for Temperature * Cycle Time")
print(interaction_alpha_beta)


print("Interaction Effects for Temperature * Operator")
print(interaction_alpha_gamma)

print("Interaction Effects for Cycle Time * Operator")
print(interaction_beta_gamma)

print("Interaction Effects for Temperature * Cycle Time * Operator")
print(interaction_alpha_beta_gamma)


Overall Mean 31.555555555555557
Main Effects for Temperature
{300: -0.9629629629629655, 350: 0.9629629629629619}
Main Effects for Cycle Time
{40: -1.6666666666666679, 50: 4.0, 60: -2.3333333333333357}
Main Effects for Operator
{1: -2.4444444444444464, 2: 2.8888888888888857, 3: -0.4444444444444464}
Interaction Effects for Temperature * Cycle Time
{(300, 40): -1.7037037037037024, (300, 50): 0.9629629629629619, (300, 60): 0.7407407407407476, (350, 40): 1.7037037037037095, (350, 50): -0.9629629629629619, (350, 60): -0.7407407407407405}
Interaction Effects for Temperature * Operator
{(300, 1): 0.518518518518519, (300, 2): -0.5925925925925952, (300, 3): 0.07407407407407618, (350, 1): -0.5185185185185155, (350, 2): 0.5925925925925952, (350, 3): -0.07407407407407263}
Interaction Effects for Cycle Time * Operator
{(40, 1): -2.944444444444443, (40, 2): -1.1111111111111072, (40, 3): 4.055555555555557, (50, 1): 3.2222222222222285, (50, 2): -1.9444444444444429, (50, 3): -1.277777777777775, (60, 1):

### Confidence Intervals

In [9]:
# Residual Variance (MSE)
df2 = df.copy()
df2['Predicted'] = df2.apply(lambda row: overall_mean + alpha[row['Temperature']] + beta[row['CycleTime']] + gamma[row['Operator']], axis=1)
df2['Residual'] = df2['Score'] - df2['Predicted']
MSE = np.var(df2['Residual'], ddof=len(alpha) + len(beta) + len(gamma))

# Standard Errors
n = len(df2) / (len(alpha) * len(beta) * len(gamma))  
se_alpha = np.sqrt(MSE / (n * len(beta) * len(gamma)))
se_beta = np.sqrt(MSE / (n * len(alpha) * len(gamma)))
se_gamma = np.sqrt(MSE / (n * len(alpha) * len(beta)))
se_interaction = np.sqrt(MSE / n)

# 95% confidence level
significance_level = 0.05  
df_error = len(df2) - (len(alpha) + len(beta) + len(gamma) + 1)  # Degrees of freedom
t_crit = st.t.ppf(1 - significance_level / 2, df_error)  # t-value for CI

# CI for Main Effects
ci_alpha = {temp: (val - t_crit * se_alpha, val + t_crit * se_alpha) for temp, val in alpha.items()}
ci_beta = {cycle: (val - t_crit * se_beta, val + t_crit * se_beta) for cycle, val in beta.items()}
ci_gamma = {op: (val - t_crit * se_gamma, val + t_crit * se_gamma) for op, val in gamma.items()}

# CI for Interaction Effects
ci_interaction_alpha_beta = {key: (val - t_crit * se_interaction, val + t_crit * se_interaction) for key, val in interaction_alpha_beta.items()}
ci_interaction_alpha_gamma = {key: (val - t_crit * se_interaction, val + t_crit * se_interaction) for key, val in interaction_alpha_gamma.items()}
ci_interaction_beta_gamma = {key: (val - t_crit * se_interaction, val + t_crit * se_interaction) for key, val in interaction_beta_gamma.items()}
ci_interaction_alpha_beta_gamma = {key: (val - t_crit * se_interaction, val + t_crit * se_interaction) for key, val in interaction_alpha_beta_gamma.items()}

print("Confidence Intervals for Temperature")
for temp, ci in ci_alpha.items():
    print(f"Temperature {temp}: {ci}")

print("Confidence Intervals for Cycle Time")
for cycle, ci in ci_beta.items():
    print(f"Cycle Time {cycle}: {ci}")

print("Confidence Intervals for Operator")
for op, ci in ci_gamma.items():
    print(f"Operator {op}: {ci}")

print("Confidence Intervals for Interaction - Temperature * Cycle Time")
for key, ci in ci_interaction_alpha_beta.items():
    print(f"Temperature  {key[0]}, Cycle Time {key[1]}: {ci}")
    
print("Confidence Intervals for Interaction - Temperature * Operator")
for key, ci in ci_interaction_alpha_gamma.items():
    print(f"Temperature {key[0]}, Operator {key[1]}: {ci}")

print("Confidence Intervals for Interaction - Cycle Time * Operator")
for key, ci in ci_interaction_beta_gamma.items():
    print(f"Cycle Time  {key[0]}, Operator {key[1]}: {ci}")
    
print("Confidence Intervals for Interaction - Temperature * Cycle Time * Operator")
for key, ci in ci_interaction_alpha_beta_gamma.items():
    print(f"Temperature  {key[0]}, Cycle Time {key[1]}, Operator {key[2]}: {ci}")

Confidence Intervals for Temperature
Temperature 300: (-2.3743934998990572, 0.4484675739731263)
Temperature 350: (-0.44846757397312986, 2.3743934998990537)
Confidence Intervals for Cycle Time
Cycle Time 40: (-3.395308978104623, 0.06197564477128714)
Cycle Time 50: (2.271357688562045, 5.728642311437955)
Cycle Time 60: (-4.061975644771291, -0.6046910218953807)
Confidence Intervals for Operator
Operator 1: (-4.173086755882402, -0.7158021330064914)
Operator 2: (1.1602465774509307, 4.617531200326841)
Operator 3: (-2.1730867558824016, 1.2841978669935086)
Confidence Intervals for Interaction - Temperature * Cycle Time
Temperature  300, Cycle Time 40: (-5.937995314511977, 2.5305879071045725)
Temperature  300, Cycle Time 50: (-3.271328647845313, 5.197254573771237)
Temperature  300, Cycle Time 60: (-3.4935508700675273, 4.9750323515490225)
Temperature  350, Cycle Time 40: (-2.5305879071045654, 5.937995314511984)
Temperature  350, Cycle Time 50: (-5.197254573771237, 3.271328647845313)
Temperature  

In [10]:
# Fit three-way ANOVA model
# Convert categorical variables to categorical type
df['Temperature'] = df['Temperature'].astype('category')
df['CycleTime'] = df['CycleTime'].astype('category')
df['Operator'] = df['Operator'].astype('category')

model = ols("Score ~ Temperature * CycleTime * Operator", data=df).fit()
anova_table = sm.stats.anova_lm(model, type=3)
print(anova_table)

                                  df      sum_sq     mean_sq          F  \
Temperature                      1.0   50.074074   50.074074  15.276836   
CycleTime                        2.0  436.000000  218.000000  66.508475   
Operator                         2.0  261.333333  130.666667  39.864407   
Temperature:CycleTime            2.0   78.814815   39.407407  12.022599   
Temperature:Operator             2.0   11.259259    5.629630   1.717514   
CycleTime:Operator               4.0  355.666667   88.916667  27.127119   
Temperature:CycleTime:Operator   4.0   46.185185   11.546296   3.522599   
Residual                        36.0  118.000000    3.277778        NaN   

                                      PR(>F)  
Temperature                     3.933514e-04  
CycleTime                       8.141424e-13  
Operator                        7.438716e-10  
Temperature:CycleTime           1.001927e-04  
Temperature:Operator            1.938948e-01  
CycleTime:Operator              1.982473e-

In [11]:
significance_level = 0.05
p_A = anova_table.loc["Temperature", "PR(>F)"]
p_A < significance_level

True

In [12]:
p_B = anova_table.loc["CycleTime", "PR(>F)"]
p_B < significance_level

True

In [13]:
p_C = anova_table.loc["Operator", "PR(>F)"]
p_C < significance_level

True

In [14]:
p_AB = anova_table.loc["Temperature:CycleTime", "PR(>F)"]
p_AB < significance_level

True

In [15]:
p_AC = anova_table.loc["Temperature:Operator", "PR(>F)"]
p_AC < significance_level

False

In [16]:
p_BC = anova_table.loc["CycleTime:Operator", "PR(>F)"]
p_BC < significance_level

True

In [17]:
p_ABC = anova_table.loc["Temperature:CycleTime:Operator", "PR(>F)"]
p_ABC < significance_level

True

If the p-values is less than 0.05, then it means the effect is significant. For example, CycleTime $\beta_j$ is significant for the model since p_B is less than 0.05. Similarly, the effect of interaction between Temperature and Operator is NOT significant. On the other hand, the confidence intervals tell us the range in which the true parameter values which means effects of Temperature, Cycle Time, Operator, and their interactions are likely to fall. If a CI does not include 0, the effect is significant. That is, if the confidence interval contains 0, it means that the effect could be zero which means the factor might not have a real impact. For example, CI for Operator 3 is (-2.1730867558824016, 1.2841978669935086) and it includes 0. So, we can say that the effect of Operator 3 is NOT significant.