# Chapter 11: Analysis of Variance (ANOVA)

Learning Goals:
1. Understand the concept and applications of Analysis of Variance (ANOVA).
2. Learn about different types of ANOVA designs.
3. Explore the assumptions underlying ANOVA and how to assess them.
4. Learn how to perform ANOVA in Python and interpret the results.

## Introduction
Analysis of Variance (ANOVA) is a statistical technique used to compare the means of two or more groups or conditions. It determines whether the variability between group means is larger than the variability within groups.

### Applications of ANOVA
ANOVA is commonly used in various scenarios, including:
- Comparing means across multiple treatment groups.
- Investigating the effects of categorical and continuous variables.
- Assessing interactions between variables.

### One-Way ANOVA
The one-way ANOVA compares the means of two or more independent groups or conditions.

$H_0: \mu_1 = \mu_2 = \mu_i$

$H_A: \text{At least one} \mu_i \text{is different}$


$F = \frac{MS_{groups}}{MS_{error}}$

## Calculate F value

#### Partition Sum or squares

$SS_{total} = SS_{error} + SS_{groups}$

n = sample size of $y_i$

N = total number of observations

Grand mean:
$\bar{Y} = \frac{\sum{n_i}{\bar{y}_i}}{N}$

where

$N = \sum(n_i)$

$SS_{groups} = \sum{n_i(\bar{y}_i - \bar{Y})^2}$

$SS_{error} = \sum{s^{2}_{i}(n_i - 1)}$

#### Calculate mean squares

k = number of groups

$MS_{groups} = \frac{SS_{groups}}{df_{groups}}$

where

$df_{groups} = k - 1$

$MS_{error} = \frac{SS_{error}}{df_{error}}$

where

$df_{error} = N - k$

### ANOVA table

| Source of variance | Sum of squares | df                     | Mean squares | F                        | P-value |
|--------------------|----------------|------------------------|--------------|--------------------------|---------|
| Groups             | SS<sub>groups</sub> | groups – 1             | MS<sub>groups</sub> | F = MS<sub>groups</sub> / MS<sub>error</sub> | P-value |
| Error              | SS<sub>error</sub>  | observations – groups  | MS<sub>error</sub>  |                            |         |
| Total              | SS<sub>total</sub>  | df<sub>error</sub> + df<sub>groups</sub> |              |                            |         |


### Two-Way ANOVA
The two-way ANOVA compares the means across two categorical independent variables (factors) and their interactions.

### Assumptions of ANOVA
Assumption of Normality
- Visual Assessment: Q-Q plots, histograms.
- Statistical Tests: Shapiro-Wilk test, Anderson-Darling test.

Assumption of Homogeneity of Variances
- Visual Assessment: Box plots, Levene's test.
- Statistical Test: Levene's test.

### Performing ANOVA in Python

In [13]:
from scipy.stats import f_oneway

# Example data (replace with your own data)
group1 = [5, 6, 8, 7, 9]
group2 = [4, 7, 6, 9, 8]
group3 = [2, 4, 6, 5, 7]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(group1, group2, group3)
print("One-Way ANOVA Results:")
print("F-statistic:", round(f_statistic,2))
print("p-value:", round(p_value, 2))

One-Way ANOVA Results:
F-statistic: 2.24
p-value: 0.15


In [14]:
import statsmodels.api as sm
import pandas as pd
from statsmodels.formula.api import ols

# Example data (replace with your own data)
data = {'A': ['A1', 'A2', 'A1', 'A2', 'A1', 'A2', 'A1', 'A2'],
        'B': ['B1', 'B1', 'B2', 'B2', 'B1', 'B1', 'B2', 'B2'],
        'Value': [5, 7, 4, 6, 8, 10, 9, 11]}

# Create DataFrame
df = pd.DataFrame(data)

# Perform two-way ANOVA
model = ols('Value ~ A + B + A:B', data=df).fit()
anova_table = round(sm.stats.anova_lm(model), 3)
print("Two-Way ANOVA Results:")
print(anova_table)

Two-Way ANOVA Results:
           df  sum_sq  mean_sq      F  PR(>F)
A         1.0     8.0      8.0  0.941   0.387
B         1.0     0.0      0.0  0.000   1.000
A:B       1.0     0.0      0.0  0.000   1.000
Residual  4.0    34.0      8.5    NaN     NaN


### Interpreting ANOVA Results

One-Way ANOVA
- Evaluate the significance level (p-value).
- If the p-value is below the predetermined significance level (e.g., 0.05), reject the null hypothesis and conclude that there are significant differences between the group means.

Two-Way ANOVA
- Examine the main effects of each factor.
- Assess the interaction effect between the factors.
- Consider the significance levels (p-values) to determine the presence of significant effects.

## End of chapter question

Using the Palmer Penguin data set, compare the weights of female penguins of all three species using an ANOVA.