# ANOVA - Analysis of Variance
We will continue our discussion on linear models. Today, we will be learning ANOVA, a generalized form of comparing mean across multiple groups. 
Agenda today:
- Compare t-tests and ANOVA
- Learn to calculate ANOVA & details 
- Implement ANOVA in python


## Part I. T Test or ANOVA?
Suppose we want to compare whether multiple groups differ in some type of measures. For example, we have collected mood data grouped by four types of weather - sunny, raining, overcast, or snowy, and we want to find out whether there is a difference in mood across different weather. What tests would you use?

A natural reaction would be to conduct multiple t-tests. However, that comes with many drawbacks. First, you would need $\frac{n(n-1)}{2}$ t tests, which come out to 6 tests. Having more tests meaning having higher chance of making type I error. In this case, our original probability of making type I error grew from 5% to 5% x 6 = 30%! By conduct 6 tests and comparing their mean to each other, we are running a huge risk of making false positives. How then, can we combat this? -- ANOVA!

Instead of looking at each individual difference, ANOVA examines the ratio of variance between groups, and variance within groups, and find out whether the ratio is big enough to be statistically significant. 

#### T Test statistics 
## $$t = \frac{x\bar - \mu}{\frac{s}{\sqrt n}}$$

#### ANOVA - the F test
## $$F = \frac{MS_bet}{MS_within}$$

#We can also say that t test is a special case of ANOVA in that we are comparing the means of only two groups.

Degrees of Freedom for ANOVA:
- DFbetween = k - 1
- DFwithin = N - k
- DFtotal = N - 1

## Part II. Calculating ANOVA 
In this section, we will learn how to calculate ANOVA without using any pacakges. All we need to calculate is:

- $SS_b$ = $n\sum(\bar X - \bar X_i)^2 $

- $SS_w$ = $\sum (n_i - 1) s_i ^ 2$

- $SS_t$ = $\sum (X_ij - \bar X)^2$

- $MS_b$ = $\frac{SS_b}{DF_b}$

- $MS_w$ = $\frac{SS_w}{DF_w}$

- $F$ = $\frac{MS_b}{MS_w}$

In [3]:
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt