# 📊 Exploring Hippo Data – Summarise and Compare

In this session, we'll learn how to group data and create simple summaries. You’ll also build your first version of a classic **Table 1** (descriptive summary by group).

**Objectives:**
- Group data with `groupby()`
- Calculate means, counts, and distributions
- Compare groups (e.g. Male vs Female hippos)
- Create a 'Table 1'-style summary

## 📥 Load the Clean Hippo Dataset

In [None]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/ggkuhnle/data-analysis/main/data/hippos_cleaned.csv')  # replace with your GitHub path
df

## 👀 Quick Look at the Data

In [None]:
df.info()
df.describe()

## 📏 Average Weight by Sex

In [None]:
df.groupby('Sex')['Weight_kg'].mean()

## 🧠 Multiple Summaries by Group

In [None]:
df.groupby('Sex').agg({
    'Weight_kg': ['mean', 'std'],
    'Height_cm': ['mean', 'std'],
})

## 📋 Value Counts – Habitat Distribution

In [None]:
df['Habitat'].value_counts()

## 🔍 Crosstab – Habitat by Sex

In [None]:
pd.crosstab(df['Habitat'], df['Sex'])

## 🧾 Build a Simple Table 1

In [None]:
summary = df.groupby('Sex').agg(
    Weight_mean=('Weight_kg', 'mean'),
    Weight_sd=('Weight_kg', 'std'),
    Height_mean=('Height_cm', 'mean'),
    Height_sd=('Height_cm', 'std'),
    Count=('Name', 'count')
)
summary

## ✅ Summary – What You Learned
- Used `groupby()` and `agg()` to summarise data
- Compared average weight and height between groups
- Counted and crosstabbed categorical values
- Built a summary table (Table 1)

Next time, we’ll dive into how to **visualise** some of these differences using beautiful, informative plots! 🦛