# 🧪 Statistical Testing with Hippos (Enhanced)

In this enhanced session, we'll add visualisation and interpretation to our statistical testing.

**Objectives:**
- Run t-tests, ANOVA, and chi-squared tests
- Fit a linear regression model
- Use plots to interpret results
- Show 95% confidence intervals


## 📥 Load Data and Stats Tools

In [None]:
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/ggkuhnle/data-analysis/main/data/hippos_cleaned.csv')
df.head()

## 📏 T-test – Weight by Sex

In [None]:
# Separate groups
male = df[df['Sex'] == 'Male']['Weight_kg']
female = df[df['Sex'] == 'Female']['Weight_kg']

t_stat, p_value = stats.ttest_ind(male, female)
print(f"T-statistic: {t_stat:.3f}, P-value: {p_value:.3f}")

In [None]:
sns.catplot(data=df, x='Sex', y='Weight_kg', kind='strip', jitter=True)
plt.title('Hippo Weight by Sex')
plt.ylabel('Weight (kg)')
plt.show()

## 📚 ANOVA – Height by Habitat

In [None]:
model = smf.ols('Height_cm ~ C(Habitat)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
anova_table

In [None]:
sns.boxplot(data=df, x='Habitat', y='Height_cm')
plt.title('Hippo Height by Habitat')
plt.ylabel('Height (cm)')
plt.show()

## 🔢 Chi-Squared – Sex and Habitat

In [None]:
contingency = pd.crosstab(df['Sex'], df['Habitat'])
chi2, p, dof, expected = stats.chi2_contingency(contingency)
print(f"Chi-squared: {chi2:.2f}, P-value: {p:.3f}")

## 📈 Regression – Predict Weight from Height

In [None]:
reg_model = smf.ols('Weight_kg ~ Height_cm', data=df).fit()
print(reg_model.summary())

In [None]:
sns.lmplot(data=df, x='Height_cm', y='Weight_kg', ci=95, line_kws={'color': 'red'})
plt.title('Regression: Weight vs Height with 95% CI')
plt.show()

## ✅ Summary – What You Learned
- T-tests and ANOVA compare means between groups
- Chi-squared tests assess associations between categories
- Regression models predict numerical outcomes and provide confidence intervals
- Visualisation helps interpret results effectively

You are now ready for the mini-project—or a deeper dive into distributions, transformations, and model interpretation! 🦛