# Titanic Data Explorer - Guided Practice must gitup

## Unit 2, Day 2
**Instructor:** Abishek Ganesh

---

## How This Notebook Works

**Part A: Instructor-Led Walkthrough**  
We'll work through these examples together as a class. Follow along and run each cell with me.

**Part B: Breakout Room Exercises**  
You'll work in groups to complete these exercises. Talk through the problems together!

---

### Today's Skills
By the end of this notebook, you will be able to:
1. Load a CSV file into a Pandas DataFrame
2. Explore and understand your data
3. Select specific columns and rows
4. Filter data based on conditions
5. Perform basic calculations on your data

---

# Part A: Instructor-Led Walkthrough

Follow along with Abishek as we work through these examples together.

## A.1 - Loading Data

First, we import our libraries and load the Titanic dataset.

In [None]:
# Import our data science libraries
import numpy as np
import pandas as pd

print("Libraries imported successfully!")

In [None]:
# Load the Titanic dataset into a DataFrame
df = pd.read_csv('../datasets/titanic.csv')

# Quick check - what's the shape?
print(f"Dataset loaded! Shape: {df.shape}")
print(f"That's {df.shape[0]} rows and {df.shape[1]} columns")

In [None]:
# Look at the first 5 rows
df.head()

## A.2 - Exploring Data

Before doing anything else, we need to understand what data we have.

In [None]:
# info() shows columns, data types, and non-null counts
df.info()

In [None]:
# describe() gives quick statistics on numerical columns
df.describe()

In [None]:
# isnull().sum() counts missing values in each column
df.isnull().sum()

## A.3 - Selecting Columns

We can grab specific columns from our DataFrame.

In [None]:
# Select a SINGLE column with single brackets
# This returns a Series (like a single column)
df['Name'].head()

In [None]:
# Select MULTIPLE columns with double brackets
# This returns a DataFrame (a mini-table)
df[['Name', 'Age', 'Fare']].head()

## A.4 - Selecting Rows

We use `iloc` to select rows by their position (index number).

In [None]:
# Get a single row (the first passenger, index 0)
df.iloc[0]

In [None]:
# Get a range of rows (first 5 passengers)
df.iloc[0:5]

## A.5 - Filtering Data

This is where Pandas gets powerful! We can filter rows based on conditions.

In [None]:
# Filter for passengers who survived (Survived == 1)
survivors = df[df['Survived'] == 1]

print(f"Total passengers: {len(df)}")
print(f"Survivors: {len(survivors)}")
survivors.head()

In [None]:
# Filter for female passengers
females = df[df['Sex'] == 'female']

print(f"Female passengers: {len(females)}")
females.head()

## A.6 - Basic Operations

We can perform calculations on our data.

In [None]:
# Calculate the average fare
avg_fare = df['Fare'].mean()
print(f"Average fare: ${avg_fare:.2f}")

In [None]:
# Count total survivors using sum()
# Since Survived is 0 or 1, summing gives us the count!
total_survivors = df['Survived'].sum()
print(f"Total survivors: {total_survivors}")

In [None]:
# Count with a filter using len()
female_count = len(df[df['Sex'] == 'female'])
print(f"Number of female passengers: {female_count}")

---

# Part B: Breakout Room Exercises

Now it's your turn! Work with your group to complete these exercises.

**Important:** Talk through the problems together. If you get stuck, discuss with your teammates before asking for help.

---

## Section 1: Load & Explore (Warm-Up)

Let's start fresh and load the data into a new variable.

### Exercise 1.1
Load the Titanic dataset into a DataFrame called `df_titanic`

In [None]:
# Your code here


### Exercise 1.2
How many rows and columns are in the dataset? Use `.shape`

In [None]:
# Your code here


### Exercise 1.3
What are the column names? Use `.columns`

In [None]:
# Your code here


### Exercise 1.4
How many missing values are in the 'Age' column?

In [None]:
# Your code here


### Checkpoint 1
Run this cell to check your answers!

In [None]:
# Checkpoint 1 - Run this to verify your work
print("Checking your answers...")
print()

# Check that df_titanic exists and has correct shape
assert 'df_titanic' in dir(), "Did you create a variable called df_titanic?"
assert df_titanic.shape == (891, 12), f"Shape should be (891, 12), got {df_titanic.shape}"
print("âœ… df_titanic loaded correctly!")
print(f"   Rows: 891")
print(f"   Columns: 12")
print(f"   Missing Age values: 177") 

---

## Section 2: Selecting Columns

Practice grabbing specific columns from your DataFrame.

### Exercise 2.1
Select just the 'Age' column and display the first 10 values

In [None]:
# Your code here


### Exercise 2.2
Select just the 'Fare' column and display the first 5 values

In [None]:
# Your code here


### Exercise 2.3
Select the 'Name', 'Sex', and 'Survived' columns together (as a mini DataFrame). Display the first 5 rows.

In [None]:
# Your code here


### Exercise 2.4
Create a new DataFrame called `passenger_info` with these columns: Name, Age, Sex, Pclass

In [None]:
# Your code here


### Checkpoint 2
Run this cell to check your answers!

In [None]:
# Checkpoint 2 - Run this to verify your work
print("Checking your answers...")
print()

assert 'passenger_info' in dir(), "Did you create a variable called passenger_info?"
assert passenger_info.shape == (891, 4), f"passenger_info shape should be (891, 4), got {passenger_info.shape}"
assert list(passenger_info.columns) == ['Name', 'Age', 'Sex', 'Pclass'], "Columns should be Name, Age, Sex, Pclass"

print("âœ… passenger_info created correctly!")
print(f"   Shape: {passenger_info.shape}")
print()
print("Great job on Section 2!")

---

## Section 3: Selecting Rows

Practice using `iloc` to grab specific rows.

### Exercise 3.1
Get the 10th passenger's information using `iloc` (remember: Python starts counting at 0!)

In [None]:
# Your code here


### Exercise 3.2
Get passengers at rows 20 through 24 (5 passengers total)

In [None]:
# Your code here


### Exercise 3.3
Get the LAST passenger's information using `iloc`

In [None]:
# Your code here


### Exercise 3.4
Get the first 3 rows and only columns at positions 1, 3, and 4 (Survived, Name, Sex)

In [None]:
# Your code here


### Checkpoint 3
Run this cell to check your answers!

In [None]:
# Checkpoint 3 - Run this to verify your work
print("Checking your answers...")
print()

# Check the 10th passenger (index 9)
tenth = df_titanic.iloc[9]
print(f"âœ… The 10th passenger is: {tenth['Name']}")

# Check the last passenger
last = df_titanic.iloc[-1]
print(f"âœ… The last passenger is: {last['Name']}")
print()
print("Great job on Section 3!")

---

## Section 4: Filtering Data

This is where Pandas really shines! Filter your data based on conditions.

### Exercise 4.1
Find all passengers who survived (Survived == 1). How many are there?

In [None]:
# Your code here


### Exercise 4.2
Find all female passengers. How many are there?

In [None]:
# Your code here


### Exercise 4.3
Find all first class passengers (Pclass == 1). How many are there?

In [None]:
# Your code here


### Exercise 4.4
Find all passengers under 18 years old (children). How many?

In [None]:
# Your code here


### Exercise 4.5
Find all passengers who paid more than $50 for their fare. How many?

In [None]:
# Your code here


### Checkpoint 4
Run this cell to check your answers!

In [None]:
# Checkpoint 4 - Run this to verify your work
print("Checking your answers...")
print()

survivors_count = len(df_titanic[df_titanic['Survived'] == 1])
females_count = len(df_titanic[df_titanic['Sex'] == 'female'])
first_class_count = len(df_titanic[df_titanic['Pclass'] == 1])
children_count = len(df_titanic[df_titanic['Age'] < 18])
high_fare_count = len(df_titanic[df_titanic['Fare'] > 50])

print(f"âœ… Survivors: {survivors_count}")
print(f"âœ… Female passengers: {females_count}")
print(f"âœ… First class passengers: {first_class_count}")
print(f"âœ… Children (under 18): {children_count}")
print(f"âœ… Passengers who paid > $50: {high_fare_count}")
print()
print("Great job on Section 4!")

---

## Section 5: Basic Operations

Let's do some calculations on our data!

### Exercise 5.1
What is the average age of all passengers? Use `.mean()`

In [None]:
# Your code here


### Exercise 5.2
What is the average fare paid?

In [None]:
# Your code here


### Exercise 5.3
What is the maximum fare anyone paid? Use `.max()`

In [None]:
# Your code here


### Exercise 5.4
What is the total number of survivors? Use `.sum()` on the Survived column

In [None]:
# Your code here


### Exercise 5.5
What percentage of passengers survived? 

Think about it: The Survived column has 0s and 1s. What does the mean of 0s and 1s tell you?

In [None]:
# Your code here


### Checkpoint 5
Run this cell to check your answers!

In [None]:
# Checkpoint 5 - Run this to verify your work
print("Checking your answers...")
print()

avg_age = df_titanic['Age'].mean()
avg_fare = df_titanic['Fare'].mean()
max_fare = df_titanic['Fare'].max()
total_survivors = df_titanic['Survived'].sum()
survival_rate = df_titanic['Survived'].mean() * 100

print(f"âœ… Average age: {avg_age:.1f} years")
print(f"âœ… Average fare: ${avg_fare:.2f}")
print(f"âœ… Maximum fare: ${max_fare:.2f}")
print(f"âœ… Total survivors: {total_survivors}")
print(f"âœ… Survival rate: {survival_rate:.1f}%")
print()
print("Great job on Section 5!")

---

## Section 6: Putting It Together

These exercises combine multiple skills. Work through them step by step!

### Exercise 6.1
Find the average age of survivors only.

**Steps:**
1. Filter for survivors
2. Calculate the mean of the Age column

In [None]:
# Your code here


### Exercise 6.2
Find the average fare paid by first class passengers.

**Steps:**
1. Filter for Pclass == 1
2. Calculate the mean of the Fare column

In [None]:
# Your code here


### Exercise 6.3
Count how many female passengers survived.

**Steps:**
1. Filter for female passengers
2. From those, filter for survivors (or combine both conditions)
3. Count with `len()`

In [None]:
# Your code here


### Final Checkpoint
Run this cell to check your final answers!

In [None]:
# Final Checkpoint - Run this to verify your work
print("Checking your answers...")
print()

# Calculate expected answers
survivors = df_titanic[df_titanic['Survived'] == 1]
avg_age_survivors = survivors['Age'].mean()

first_class = df_titanic[df_titanic['Pclass'] == 1]
avg_fare_first = first_class['Fare'].mean()

female_survivors = df_titanic[(df_titanic['Sex'] == 'female') & (df_titanic['Survived'] == 1)]
female_survivor_count = len(female_survivors)

print(f"âœ… Average age of survivors: {avg_age_survivors:.1f} years")
print(f"âœ… Average fare for first class: ${avg_fare_first:.2f}")
print(f"âœ… Female survivors: {female_survivor_count}")
print()
print("="*50)
print("ðŸŽ‰ Congratulations! You've completed all exercises!")
print("="*50)

---

## Wrap-Up

**Today you learned how to:**
- Load CSV data into a Pandas DataFrame
- Explore data with `head()`, `info()`, `describe()`, `isnull()`
- Select columns (single and multiple)
- Select rows with `iloc`
- Filter data with conditions
- Perform calculations with `mean()`, `sum()`, `max()`

**These are the core skills you'll use in almost every data science project!**

---

*Great work today!*