<a href="https://colab.research.google.com/github/swopnimghimire-123123/Maths_For_ML/blob/main/09_ANOVA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook 9: ANOVA (Analysis of Variance)
### 1. Introduction

ANOVA is used when you want to compare the means of 3 or more groups.

Instead of running multiple t-tests (which increases error), ANOVA gives one test for overall difference.

Example: Comparing exam scores across 3 different teaching methods.

## 2. Theory

**Null Hypothesis (H₀):** All group means are equal.

**Alternative Hypothesis (H₁):** At least one group mean is different.

**Key idea:**

ANOVA looks at the ratio of between-group variance to within-group variance.

Test statistic = F-value.

Larger F means greater chance that groups differ.

**Formula (conceptual):**

$F = \frac{\text{Variance between groups}}{\text{Variance within groups}}$

In [None]:
import numpy as np
import pandas as pd
from scipy import stats

# Example : Exam score from 3 teaching method
method_A = [85, 89, 88, 75, 90]
method_B = [78, 81, 74, 77, 80]
method_C = [92, 95, 88, 91, 94]

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(method_A, method_B, method_C)

# Print the results
print("F-statistic:", f_stat)
print("P-value:", p_value)

F-statistic: 14.06883365200765
P-value: 0.0007141257666666541


The results of the ANOVA test indicate whether there is a statistically significant difference between the means of the groups you are comparing.

*   **F-statistic:** This is the test statistic. A larger F-statistic suggests that the variation between the groups is larger than the variation within the groups. In your case, an F-statistic of 14.07 is relatively high.
*   **P-value:** This is the probability of observing an F-statistic as extreme as, or more extreme than, the one calculated, assuming the null hypothesis (all group means are equal) is true. A small p-value (typically less than 0.05) suggests that you should reject the null hypothesis. Your p-value of 0.000714 is very small, indicating strong evidence against the null hypothesis.

**Conclusion:** With a p-value of 0.000714 (which is much less than 0.05), you can reject the null hypothesis. This means there is a statistically significant difference in the mean exam scores among the three teaching methods.

## 4. Practice Problems

Create datasets for 4 different diets and compare weight loss.

Test if the average daily screen time differs between 3 age groups (teen, adult, senior).

Use real or simulated data and try stats.f_oneway().

In [None]:
import numpy as np

# Simulate weight loss data for 4 diets
# Assuming some differences in means and a bit of variation
np.random.seed(42) # for reproducibility

diet_A_loss = np.random.normal(loc=5, scale=2, size=20) # Average 5 lbs loss
diet_B_loss = np.random.normal(loc=7, scale=2.5, size=20) # Average 7 lbs loss
diet_C_loss = np.random.normal(loc=6, scale=2, size=20) # Average 6 lbs loss
diet_D_loss = np.random.normal(loc=9, scale=3, size=20) # Average 9 lbs loss

print("Simulated Weight Loss Data:")
print("Diet A:", diet_A_loss)
print("Diet B:", diet_B_loss)
print("Diet C:", diet_C_loss)
print("Diet D:", diet_D_loss)

Simulated Weight Loss Data:
Diet A: [5.99342831 4.7234714  6.29537708 8.04605971 4.53169325 4.53172609
 8.15842563 6.53486946 4.06105123 6.08512009 4.07316461 4.06854049
 5.48392454 1.17343951 1.55016433 3.87542494 2.97433776 5.62849467
 3.18395185 2.1753926 ]
Diet B: [10.66412192  6.43555925  7.16882051  3.43812953  5.63904319  7.27730647
  4.12251606  7.93924505  5.49840328  6.27076563  5.49573347 11.63069546
  6.96625694  4.35572268  9.05636228  3.94789088  7.52215899  2.10082469
  3.67953488  7.49215309]
Diet C: [7.47693316 6.34273656 5.76870344 5.39779261 3.04295602 4.56031158
 5.07872246 8.11424445 6.68723658 2.47391969 6.64816794 5.22983544
 4.646156   7.22335258 8.06199904 7.86256024 4.32156495 5.38157525
 6.66252686 7.95109025]
Diet D: [ 7.56247729  8.44302307  5.68099508  5.41138013 11.43757747 13.06872009
  8.78396964 12.01059869 10.08490808  7.06464074 10.08418682 13.6141097
  8.89252188 13.69393097  1.14076469 11.46570751  9.2611412   8.10297795
  9.27528233  3.03729326]


In [None]:
from scipy import stats

# Perform one-way ANOVA on the diet data
f_stat_diet, p_value_diet = stats.f_oneway(diet_A_loss, diet_B_loss, diet_C_loss, diet_D_loss)

# Print the results
print("ANOVA results for Diet and Weight Loss:")
print("F-statistic:", f_stat_diet)
print("P-value:", p_value_diet)

ANOVA results for Diet and Weight Loss:
F-statistic: 10.85242391762891
P-value: 5.1693218189125e-06


## 5. Conclusion

ANOVA tests if means across groups differ significantly.

If p-value < 0.05 → reject H₀ → at least one group is different.

If p-value > 0.05 → fail to reject H₀ → no evidence of difference.

Used in medicine, education, A/B testing, etc.