# One-way ANOVA Test

$F = \dfrac{MSS_B}{MSS_W}$

$MSS_W = \dfrac{ \sum_{g \in G} ( X - \bar{X}_{g} )^2 } {n - k}$

$n$ is total number of variables

$k$ is total number of groups

$MSS_B = \dfrac{\sum_{g \in G} n_g ( \bar{X} - \bar{G} ) } {k - 1}$

$df_B = k - 1$

$df_W = n - k$

In [34]:
# Import the required Python packages
import pandas as pd
import numpy as np
from scipy.stats import f

In [24]:
df = pd.DataFrame({
    "4PM to Midnight":  [7, 7, 6, 9, 8, 7, 6, 7, 8, 9, np.nan, np.nan, np.nan],
    "Midnight to 8AM":  [5, 6, 3, 5, 4, 6, 5, 4, 5, 5, 6, 7, 6],
    "8AM to 4PM":       [1, 3, 4, 3, 1, 1, 2, 6, 5, 4, 3, 4, 5],
    "(X_1 - \bar{X}_1)^2":  (np.array([7, 7, 6, 9, 8, 7, 6, 7, 8, 9, np.nan, np.nan, np.nan]) - 7.4) ** 2,
    "(X_2 - \bar{X}_2)^2":  (np.array([5, 6, 3, 5, 4, 6, 5, 4, 5, 5, 6, 7, 6]) - 5.15) ** 2,
    "(X_3 - \bar{X}_3)^2":  (np.array([1, 3, 4, 3, 1, 1, 2, 6, 5, 4, 3, 4, 5]) - 3.23) ** 2,
})
df

Unnamed: 0,4PM to Midnight,Midnight to 8AM,8AM to 4PM,(X_1 - ar{X}_1)^2,(X_2 - ar{X}_2)^2,(X_3 - ar{X}_3)^2
0,7.0,5,1,0.16,0.0225,4.9729
1,7.0,6,3,0.16,0.7225,0.0529
2,6.0,3,4,1.96,4.6225,0.5929
3,9.0,5,3,2.56,0.0225,0.0529
4,8.0,4,1,0.36,1.3225,4.9729
5,7.0,6,1,0.16,0.7225,4.9729
6,6.0,5,2,1.96,0.0225,1.5129
7,7.0,4,6,0.16,1.3225,7.6729
8,8.0,5,5,0.36,0.0225,3.1329
9,9.0,5,4,2.56,0.0225,0.5929


In [17]:
df_summary = pd.DataFrame({
    "n_g": [df["4PM to Midnight"].count(), df["Midnight to 8AM"].count(), df["8AM to 4PM"].count(),],
    "n_G": [df["4PM to Midnight"].count() + df["Midnight to 8AM"].count() + df["8AM to 4PM"].count(), np.nan, np.nan],
    "\bar{X}_g": [df["4PM to Midnight"].mean(), df["Midnight to 8AM"].mean(), df["8AM to 4PM"].mean(),],
    "\bar{X}_G": [
        (df["4PM to Midnight"].sum() + df["Midnight to 8AM"].sum() + df["8AM to 4PM"].sum()) / 
        (df["4PM to Midnight"].count() + df["Midnight to 8AM"].count() + df["8AM to 4PM"].count()), np.nan, np.nan],
    "k": [3, np.nan, np.nan],
}).transpose()

df_summary

Unnamed: 0,0,1,2
n_g,10.0,13.0,13.0
n_G,36.0,,
ar{X}_g,7.4,5.153846,3.230769
ar{X}_G,5.083333,,
k,3.0,,


# Hypothesis

$H_0: \mu_1 = \mu_2 = \mu_3$ There is no difference between the population in the sample

$H_1: \mu_1 \neq \mu_2 \neq \mu_3$

$\alpha = 0.05$

In [30]:
MSS_W = ( df["(X_1 - \x08ar{X}_1)^2"].sum() + df["(X_2 - \x08ar{X}_2)^2"].sum() + df["(X_3 - \x08ar{X}_3)^2"].sum() ) / (36 - 3)
MSS_W

np.float64(1.7090969696969696)

In [31]:
MSS_B = ( 10 * (7.40 - 5.08) ** 2    \
       + 13 * (5.15 - 5.08) ** 2    \
       + 13 * (3.23 - 5.08) ** 2 ) / (3 - 1)
MSS_B

49.19010000000001

In [32]:
F = MSS_B / MSS_W
F

np.float64(28.781339427874375)

In [36]:
f.ppf(1-0.05, 3 - 1, 36 - 3)

np.float64(3.2849176510382883)

F stats > F critical value

So, reject the $H_0$