# 🧪 4.4 Statistical Testing

This notebook introduces statistical hypothesis testing to compare nutrition data groups.

**Objectives**:
- Perform t-tests and ANOVA.
- Interpret p-values and effect sizes.
- Apply tests to `vitamin_trial.csv`.

**Context**: Statistical tests validate differences, like vitamin D levels between trial groups.

<details><summary>Fun Fact</summary>
Statistical tests are like a hippo weighing its snacks—measuring differences matters! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr, spearmanr  # For correlation calculations

print('Environment ready.')

## Data Preparation

Load `vitamin_trial.csv` and split by `Group`.

In [None]:
# Load the dataset
df = fns.get_dataset('vitamin_trial')  # Path relative to notebook

# Split by group
control = df[df['Group'] == 'Control']['Vitamin_D']
treatment = df[df['Group'] == 'Treatment']['Vitamin_D']
print(f'Control mean: {round(control.mean(), 1)}, Treatment mean: {round(treatment.mean(), 1)}')

Control mean: 10.2, Treatment mean: 15.3


## T-Test

Perform a t-test to compare Vitamin D levels between groups.

In [3]:
# Perform t-test
t_stat, p_value = ttest_ind(control, treatment, equal_var=True)
print(f'T-test: t={round(t_stat, 1)}, p-value={p_value:.1e}')  # Display results

T-test: t=12.5, p-value=1.3e-20


## Exercise 1: ANOVA Test

Perform an ANOVA test to compare `Vitamin_D` across `Outcome` groups (Normal, Improved). Interpret the results in a Markdown cell.

**Guidance**: Use `f_oneway()` with groups split by `df['Outcome']`.

**Answer**:

My ANOVA code and interpretation is...

## Conclusion

You’ve learned to apply statistical tests to compare nutrition data groups.

**Next Steps**: Explore regression modelling in 4.5.

**Resources**:
- [SciPy Stats](https://docs.scipy.org/doc/scipy/reference/stats.html)
- [Statistical Testing Guide](https://www.datacamp.com/community/tutorials/statistics-python-tutorial)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)