# 🎓 Mini-Project – Explore Your Own Dataset

This is your final session! You’ll use everything you’ve learned so far to analyse and present a small dataset.
You can use the hippo data provided or bring your own (optional!).

**What to do:**
- Load and explore the data
- Clean or tidy it (if needed)
- Summarise and describe key features
- Compare groups or test a simple hypothesis
- Create at least one visualisation
- Write a short conclusion

## 🐾 Step 1 – Load the Data

In [None]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/your-username/data-analysis/main/data/hippos_cleaned.csv')  # or upload your own
df.head()

## 🧹 Step 2 – Clean or Prepare the Data

In [None]:
# Check for missing values or inconsistent entries
df.info()
df.isna().sum()

## 📊 Step 3 – Describe and Summarise

In [None]:
# Summary statistics and grouping
df.groupby('Sex')[['Weight_kg', 'Height_cm']].mean()

## 🧪 Step 4 – Compare Groups

In [None]:
from scipy import stats

male = df[df['Sex'] == 'Male']['Weight_kg']
female = df[df['Sex'] == 'Female']['Weight_kg']

stats.ttest_ind(male, female)

## 📈 Step 5 – Visualise

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.boxplot(data=df, x='Sex', y='Height_cm')
plt.title('Height by Sex')
plt.show()

## 📝 Step 6 – Conclude

In [None]:
# Write your observations here as a comment:
# - What did you notice?
# - Any surprising results?
# - What would you explore next if you had more time?

## ✅ Optional Challenges
- Try a regression model: `Weight ~ Height`
- Compare a third variable (e.g. habitat)
- Turn your analysis into a short presentation
- Try using a different dataset (NDNS/FFQ/etc.)

## 🦛 Congratulations!
You’ve completed the Python for Data Analysis course. 🎉

You now know how to:
- Write Python code
- Clean and reshape messy data
- Summarise, plot, and analyse datasets
- Understand and test statistical differences

*Python is like a river – you’ve learned to swim in it. Now go explore!* 🐾