# 🧹 Working with Tables: Meet the Hippos
Today, we’ll work with a messy real-world dataset (it includes hippos!).
We’ll learn how to:
- Read data into a table using `pandas`
- Inspect and clean data
- Handle formatting problems and inconsistent values
- Deal with a common `.str` error

## 📥 Load the Data

In [None]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/ggkuhnle/data-analysis/main/data/messy_hippos.csv')  # update this if hosted elsewhere
df

## 🔍 Quick Look

In [None]:
df.info()
df.describe(include='all')

## ✨ Clean Up Column Names

In [None]:
df.columns = df.columns.str.strip()
df = df.rename(columns={'Weight (kg)': 'Weight_kg'})
df.head()

## 🧽 Clean Text Columns

In [None]:
df['Name'] = df['Name'].str.strip().str.capitalize()
df['species'] = df['species'].str.strip().str.lower().str.replace('hippos', 'hippo')

## ⚠️ A Common Error: `.str` on Non-Strings

In [None]:
# This will raise an error because not all values are strings:
# Uncomment to try it:
# df['Weight_kg'].str.replace(',', '')

### ❗ Why this happens
`.str` only works on strings, but some of your values might already be numbers or NaN.
We can fix this by converting everything to a string **before** cleaning.

## ✅ Fix the Weight Column Properly

In [None]:
df['Weight_kg'] = pd.to_numeric(df['Weight_kg'].astype(str).str.replace(',', '').str.strip(), errors='coerce')

## 🧮 Fix the Height Column

In [None]:
df['Height_cm'] = pd.to_numeric(df['height_cm'], errors='coerce')

## 🌍 Tidy Up the Habitat

In [None]:
df['habitat'] = df['habitat'].str.strip().str.capitalize()

## 🧾 Summary of the Cleaned Dataset

In [None]:
df

## ✅ Summary
- You read and cleaned a messy dataset
- Fixed inconsistent text formatting
- Converted mixed-type number columns to numeric safely
- Learned how to debug the `.str` accessor error like a pro 🐾