# Module 7: ANOVA and Beyond
Analysis of Variance (ANOVA) is used to test whether there are significant differences between the means of three or more independent groups.

## 🎯 Learning Objectives
- Understand the purpose of ANOVA
- Perform One-Way and Two-Way ANOVA
- Interpret F-statistics and p-values
- Use post-hoc tests to investigate group differences

## 📊 What is ANOVA?
ANOVA compares group means to see if at least one group mean is significantly different.

It works by analyzing the variance within groups and between groups.

**Hypotheses:**
- Null Hypothesis ($H_0$): All group means are equal
- Alternative Hypothesis ($H_1$): At least one group mean is different

## 🧪 One-Way ANOVA Example

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway

# Simulate scores for 3 teaching methods
np.random.seed(42)
method_A = np.random.normal(75, 10, 30)
method_B = np.random.normal(80, 10, 30)
method_C = np.random.normal(70, 10, 30)

# Perform One-Way ANOVA
f_stat, p_val = f_oneway(method_A, method_B, method_C)
print(f"F-statistic: {f_stat:.2f}, P-value: {p_val:.4f}")

**Interpretation:** A low p-value (typically < 0.05) suggests that at least one group mean is significantly different.

## 📈 Visualizing Group Distributions

In [None]:
df = pd.DataFrame({
    'Score': np.concatenate([method_A, method_B, method_C]),
    'Method': ['A']*30 + ['B']*30 + ['C']*30
})
sns.boxplot(data=df, x='Method', y='Score')
plt.title('Scores by Teaching Method')
plt.show()

## 🔁 Two-Way ANOVA (with interaction)
Two-Way ANOVA includes two categorical independent variables.

Example: Does test score depend on both teaching method and instructor?

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulate full dataset
df = pd.DataFrame({
    'Score': np.concatenate([method_A, method_B, method_C]),
    'Method': ['A']*30 + ['B']*30 + ['C']*30,
    'Instructor': ['X']*15 + ['Y']*15 + ['X']*15 + ['Y']*15 + ['X']*15 + ['Y']*15
})

# Fit ANOVA model
model = ols('Score ~ C(Method) + C(Instructor) + C(Method):C(Instructor)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

## 🔎 Post-hoc Testing
If ANOVA is significant, post-hoc tests like Tukey's HSD can identify *which* groups differ.

In [None]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

posthoc = pairwise_tukeyhsd(df['Score'], df['Method'])
print(posthoc)

## ✅ Practice Exercises
1. Simulate data for four groups and perform a One-Way ANOVA.
2. Visualize distributions with boxplots and interpret spread.
3. Add a second factor and run a Two-Way ANOVA.
4. Use post-hoc tests to identify which group means differ.
5. Reflect: What does the interaction term tell you in Two-Way ANOVA?