# 🧪 Statistical Testing with Hippos

In this session, we’ll go beyond describing and plotting—we’ll test whether group differences are **statistically significant**.

**Objectives:**
- Understand what hypothesis testing means
- Run a t-test and ANOVA
- Use chi-squared tests for categorical data
- Fit a simple linear regression model

## 📥 Load Data and Stats Tools

In [None]:
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_csv('https://raw.githubusercontent.com/ggkuhnle/data-analysis/main/data/hippos_cleaned.csv')  # update path
df.head()

## 📏 T-test – Weight by Sex

In [None]:
# Separate groups
male = df[df['Sex'] == 'Male']['Weight_kg']
female = df[df['Sex'] == 'Female']['Weight_kg']

t_stat, p_value = stats.ttest_ind(male, female)
print(f"T-statistic: {t_stat:.3f}, P-value: {p_value:.3f}")

## 📚 ANOVA – Height by Habitat

In [None]:
model = smf.ols('Height_cm ~ C(Habitat)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
anova_table

## 🔢 Chi-Squared – Sex and Habitat

In [None]:
contingency = pd.crosstab(df['Sex'], df['Habitat'])
chi2, p, dof, expected = stats.chi2_contingency(contingency)
print(f"Chi-squared: {chi2:.2f}, P-value: {p:.3f}")

## 📈 Regression – Predict Weight from Height

In [None]:
reg_model = smf.ols('Weight_kg ~ Height_cm', data=df).fit()
print(reg_model.summary())

## ✅ Summary – What You Learned
- Used `scipy.stats` for t-tests and chi-squared
- Used `statsmodels` for ANOVA and regression
- Learned that p-values help us judge whether differences are likely due to chance

Next time: we wrap it all up with a mini-project – your chance to explore your own hippo (or other) data! 🦛