# 🔍 ANOVA - Analysis of Variance

* **One-Way ANOVA** is often simply referred to as **ANOVA.**

* One-Way ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups to determine if there is a significant difference between them.

* It is used when one variable is categorical and other continuous.

* For example, we may want to compare the average test scores of students taught using different teaching methods (the factor here is the teaching method

* ANOVA examines the variability between the groups (how different the group means are) and compares it to the variability within each group (how much data points vary within each group).

* It calculates an F-statistic, which tells us whether the differences between the group means are large enough to be statistically significant.

Data:
```
Diet A: 5, 6, 7, 5, 6
Diet B: 8, 9, 6, 7, 8
Diet C: 2, 3, 4, 3, 2
```
Objective:

Test if the weight loss means for the three diets are significantly different.

In [1]:
import scipy.stats as stats

# Weight loss data for each diet
diet_A = [5, 6, 7, 5, 6]
diet_B = [8, 9, 6, 7, 8]
diet_C = [2, 3, 4, 3, 2]

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Conclusion based on p-value
if p_value < 0.05:  # Usually, 0.05 is used as the significance level
    print("There is a significant difference between the diets.")
else:
    print("No significant difference between the diets.")


F-statistic: 32.66666666666667
P-value: 1.3960053904813178e-05
There is a significant difference between the diets.


* We use `stats.f_oneway()` to perform the One-Way ANOVA. This function returns the F-statistic and the p-value.
* The p-value helps us decide whether to reject the null hypothesis.
* If the p-value is less than 0.05, we conclude that there is a significant difference between at least one of the groups.

### F-statistic
* The F-statistic tells us if the differences between group averages are bigger than we’d expect.
* If you compare three diets and get an F-statistic of 33 (like in our previous example), it’s a big number, meaning the diets probably cause different amounts of weight loss. 
* If the F-statistic were small (like 1 or 2), the diets would likely have similar effect

Scenario:
You have three groups of students, each taught using a different method (Method 1, Method 2, and Method 3). After the course, you measure their test scores.
```
Method 1 scores: 85, 90, 88, 92, 85
Method 2 scores: 78, 82, 84, 80, 81
Method 3 scores: 91, 95, 94, 92, 90
```
Goal:
We want to see if the teaching methods lead to different average test scores, or if the differences in the scores happened by chance.

In [2]:
import scipy.stats as stats

# Test scores for each method
method_1 = [85, 90, 88, 92, 85]
method_2 = [78, 82, 84, 80, 81]
method_3 = [91, 95, 94, 92, 90]

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(method_1, method_2, method_3)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Conclusion based on p-value
if p_value < 0.05:  # Assuming 0.05 is our significance level
    print("There is a significant difference between the teaching methods.")
else:
    print("No significant difference between the teaching methods.")


F-statistic: 26.372340425531938
P-value: 4.0538055202862745e-05
There is a significant difference between the teaching methods.


Example:

A firm wishes to compare 4 programmes for training workers to perform a certain task. 20 new employees are randomly assigned to these programmes with 5 in each. At the end of the training, a test is conducted to see how quickly the trainees perform that task. The number of times the task is performed per hour is recorded for each trainee Perform ANOVA to check the effectiveness of the programme. (Assume the level of significance = 0.05)

In [3]:
import scipy.stats as stats

# Test scores for each program
program_A = [8, 9, 7, 8, 9]
program_B = [6, 7, 6, 7, 6]
program_C = [10, 11, 10, 12, 11]
program_D = [5, 5, 6, 4, 5]

# Perform One-Way ANOVA
f_stat, p_value = stats.f_oneway(program_A, program_B, program_C, program_D)

# Print the F-statistic and p-value
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Conclusion based on p-value
if p_value < 0.05:  # Significance level of 0.05
    print("There is a significant difference between the training programs.")
else:
    print("There is no significant difference between the training programs.")


F-statistic: 56.96969696969689
P-value: 9.2528147867333e-09
There is a significant difference between the training programs.


In [9]:
import seaborn as sns
import scipy.stats as stats

# Load the Seaborn dataset
penguins = sns.load_dataset('penguins')

penguins.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


In [10]:
# Drop rows with missing values in the 'bill_length_mm' and 'species' columns
penguins = penguins.dropna(subset=['bill_length_mm', 'species'])

In [11]:
# Extract bill length for each species
species_A = penguins[penguins['species'] == 'Adelie']['bill_length_mm']
species_B = penguins[penguins['species'] == 'Chinstrap']['bill_length_mm']
species_C = penguins[penguins['species'] == 'Gentoo']['bill_length_mm']


In [12]:
# Perform One-Way ANOVA
f_stat, p_value = stats.f_oneway(species_A, species_B, species_C)

# Print the F-statistic and p-value
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")



F-statistic: 410.6002550405077
P-value: 2.6946137388895484e-91


In [13]:
# Conclusion based on p-value
if p_value < 0.05:  # Significance level of 0.05
    print("There is a significant difference in bill length between species.")
else:
    print("There is no significant difference in bill length between species.")

There is a significant difference in bill length between species.


## Post Hoc Test
* Its purpose is to identify which specific groups are significantly different from each other.

* Example:

1) A pairwise comparison is a comparison between two separate groups (e.g., a comparison between the ”Male" and ”Female” groups).

2) Common post hoc tests test for all possible combinations of these pairwise comparisons – Male & Female, Male & Do not prefer to say, Female & Do not prefer to say

In [15]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Perform Tukey's HSD test
tukey_result = pairwise_tukeyhsd(endog=penguins['bill_length_mm'], groups=penguins['species'], alpha=0.05)

# Print the result
print(tukey_result)

   Multiple Comparison of Means - Tukey HSD, FWER=0.05   
  group1    group2  meandiff p-adj   lower  upper  reject
---------------------------------------------------------
   Adelie Chinstrap  10.0424    0.0  9.0249  11.06   True
   Adelie    Gentoo   8.7135    0.0  7.8672 9.5598   True
Chinstrap    Gentoo  -1.3289 0.0089 -2.3819 -0.276   True
---------------------------------------------------------


* The `pairwise_tukeyhsd()` function compares the means of bill_length_mm across the three penguin species (Adelie, Chinstrap, Gentoo) to determine if there are significant differences between species

**Explanation:**

* For each pair of penguin species, the mean difference in bill length is calculated.
* The reject column indicates whether the null hypothesis (that the means are equal) is rejected (True) or not (False).
* In this example, all pairs have True, meaning there are significant differences in bill length between all species.

In [17]:
# Import the required libraries
import pandas as pd
from statsmodels.stats.multicomp import pairwise_tukeyhsd, MultiComparison

# Create a DataFrame of the groups (correcting the data and syntax)
df_programme = pd.DataFrame({
    'programme_1': [9, 12, 14, 11, 13],
    'programme_2': [10, 6, 9, 9, 10],
    'programme_3': [12, 14, 11, 13, 11],
    'programme_4': [9, 8, 11, 7, 8]
})

# Stack the data and reset the index
stacked_data = df_programme.stack().reset_index()

# Rename the columns
stacked_data = stacked_data.rename(columns={'level_0': 'id', 'level_1': 'programme', 0: 'number_of_tasks'})


In [20]:
stacked_data.head()

Unnamed: 0,id,programme,number_of_tasks
0,0,programme_1,9
1,0,programme_2,10
2,0,programme_3,12
3,0,programme_4,9
4,1,programme_1,12


In [18]:
# Set up the data for comparison
MultiComp = MultiComparison(stacked_data['number_of_tasks'], stacked_data['programme'])

# Print the Tukey HSD post hoc test
print(MultiComp.tukeyhsd().summary())

     Multiple Comparison of Means - Tukey HSD, FWER=0.05      
   group1      group2   meandiff p-adj   lower   upper  reject
--------------------------------------------------------------
programme_1 programme_2     -3.0 0.0428 -5.9177 -0.0823   True
programme_1 programme_3      0.4 0.9788 -2.5177  3.3177  False
programme_1 programme_4     -3.2 0.0292 -6.1177 -0.2823   True
programme_2 programme_3      3.4 0.0197  0.4823  6.3177   True
programme_2 programme_4     -0.2 0.9972 -3.1177  2.7177  False
programme_3 programme_4     -3.6 0.0133 -6.5177 -0.6823   True
--------------------------------------------------------------


In [22]:
# Import the required libraries
import numpy as np
from scipy.stats import jarque_bera

# Generate the data by creating a random sample
diameter_in_m = np.random.normal(loc=4, scale=0.7, size=6000) + np.random.normal(loc=5, scale=0.1, size=6000)

# Conduct the normality test
stat, p = jarque_bera(diameter_in_m)

# Print the result
print("The test statistic is", stat, "and its p-value is", p)


The test statistic is 0.20635326233093226 and its p-value is 0.9019676438144987


## Chi Square Test

* The Chi-Square test checks if there is a relationship between two categorical variables or if the actual (observed) data matches what we expect (expected data).

* Goodness of Fit Test: Used to see if your observed data fits an expected distribution (e.g., checking if a die is fair).

* Test of Independence: Used to find out if two variables are related (e.g., if gender and snack preference are connected).

* **Question:** If you roll a die 60 times and expect each number to appear 10 times (since the die is fair), but the results are different, you can use the Chi-Square test to see if the die is actually fair or not.

In [23]:
from scipy.stats import chisquare

# Observed rolls of the die (with values between 1 and 6)
observed = [9, 11, 10, 8, 12, 10]

# Expected rolls for a fair die (equal chances for each side)
expected = [10, 10, 10, 10, 10, 10]


**Explanation:**
```
Observed results: [9, 11, 10, 8, 12, 10]

This means:
Side 1 came up 9 times
Side 2 came up 11 times
Side 3 came up 10 times
```

In [24]:
# Perform the Chi-Square test
chi_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

# Print the results
print("Chi-Square Statistic:", chi_stat)
print("P-Value:", p_value)


Chi-Square Statistic: 1.0
P-Value: 0.9625657732472964


In [25]:
# Interpret the result
if p_value < 0.05:
    print("The die is not fair.")
else:
    print("The die is fair.")

The die is fair.
