# Visualizations — Bar, Line, Pivot, and Box Plots
In this lab you will learn to create simple data visualizations using the Titanic dataset. You'll work with bar plots, line plots, pivot tables, and box plots using `matplotlib` and `pandas`. Exercises are designed for beginners with minimal programming experience and focus on practical, interpretable visualizations.

## Dataset
The Titanic dataset from seaborn or a local CSV:

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset
try:
    df = pd.read_csv('data/titanic.csv')
except Exception:
    df = sns.load_dataset('titanic')
df

## Learning objectives
- Create bar plots to compare categorical data
- Create line plots to show trends over continuous variables
- Build and visualize pivot tables
- Create box plots to display distributions by group
- Customize plots with titles, labels, and colors
- Interpret visual patterns to answer data questions

## High-level agenda
1. Setup and imports (5 min)
2. Bar plots — survival by class (10 min)
3. Line plots — fare over passenger order (10 min)
4. Pivot tables — cross-tabulation of class and sex (15 min)
5. Box plots — fare distribution by class (10 min)
6. Combining visualizations to answer questions (10 min)

## Exercise 1 — Bar plots: Counting survivors by class
Goal: Visualize the count of survivors (and non-survivors) grouped by passenger class.

A bar plot is useful for comparing values across categories. In this case, we want to see how many passengers survived in each class.

First, we need to organize the data into rows and columns. A pivot table is a tool that reorganizes data this way. Think of it like a spreadsheet where:
- Rows represent one category (e.g., class: 1, 2, 3)
- Columns represent another category (e.g., survived: 0 or 1)
- Cells show a summary (e.g., count, average, etc.)

We'll use `df.pivot_table()` to create a simple pivot table, then plot it as a bar chart.

In [None]:
# Step 1: Create a pivot table counting passengers by class and survival status
survival_by_class = df.pivot_table(
    values=None, # which column to apply the agg function on
    index='pclass',
    columns='survived',
    aggfunc='size',
    fill_value=0
)
print(survival_by_class)

In [None]:
# Step 2: Create bar plot from the pivot table
plt.figure(figsize=(8, 5))
survival_by_class.plot(kind='bar')
plt.title('Survivors by Class')
plt.xlabel('Class')
plt.ylabel('Count')
plt.legend(['Did not survive', 'Survived'])
plt.show()

- `df.pivot_table()` reorganizes data into rows and columns
- `values=None` tells it which column to count/summarize
- `index='pclass'` puts class (1, 2, 3) on rows
- `columns='survived'` puts survival status (0, 1) on columns
- `aggfunc='count'` counts how many records in each cell
- `.plot(kind='bar')` creates a bar plot from the reorganized data
- `plt.title()`, `plt.xlabel()`, `plt.ylabel()` add labels
- `plt.legend()` labels the bars

### Your turn
Modify the code to create a pivot table and bar plot of passenger survival count by sex (instead of by class). Hint: change `index='pclass'` to `index='sex'` and `columns='survived'` to `columns='pclass'`.

In [None]:
# your code

### Exercise 2 — Line plots: Fare vs passenger order
Goal: Visualize how fare values vary across the dataset using a line plot.

A line plot shows trends or patterns over a continuous or ordered variable. Here, we'll plot the fare amount for each passenger as they appear in the dataset. This helps us see if there are patterns or clusters in pricing.

In [None]:
# Create a simple line plot
plt.figure(figsize=(12, 5))
plt.plot(df['fare'].values, linewidth=1)
plt.title('Fare Amount Across Passengers')
plt.xlabel('Passenger Index')
plt.ylabel('Fare ($)')
plt.grid(True, alpha=0.3)
plt.show()
# Alternative: plot only passengers who survived
survived = df[df['survived'] == 1]
plt.figure(figsize=(12, 5))
plt.plot(survived['fare'].values, label='Survived', linewidth=2, color='green')
plt.title('Fare for Passengers Who Survived')
plt.xlabel('Passenger Index (survivors only)')
plt.ylabel('Fare ($)')
plt.legend()
plt.grid(True, alpha=0.7)
plt.show()

- `df['fare'].values` extracts the fare column as a list
- `plt.plot()` creates a line plot
- `plt.grid(True, alpha=0.3)` adds a grid for easier reading, with 30% visibility (transparency)
- `linewidth` controls the thickness of the line

### Your turn
- Create a line plot for passengers who did NOT survive. Compare it visually to the survivors plot.
- Bonus: Create two line plots on the same figure for survived (one color) and did not survive (another color).

In [None]:
# your code


### Exercise 3 — Pivot tables: Cross-tabulation of class and sex
Goal: Create a pivot table to see the distribution of passengers by class and sex, and then visualize it.

A pivot table reorganizes data to show counts or sums by two categories. Here, we'll see how many passengers of each sex were in each class. We can then create a heatmap or bar plot to visualize the pivot table.

In [None]:
# Create pivot table: count by class and sex
pivot = df.pivot_table(values='survived', index='pclass', columns='sex', aggfunc='sum', fill_value=0)
print(pivot)

In [None]:
# Visualize as heatmap (requires seaborn)
import seaborn as sns
plt.figure(figsize=(6, 4))
sns.heatmap(pivot, annot=True, fmt='d', cmap='Blues', cbar_kws={'label':'Count'})
plt.title('Passenger Count: Class vs Sex')
plt.ylabel('Class')
plt.xlabel('Sex')
plt.show()

- `df.pivot_table()` creates a pivot table
- `values='survived'` tells pivot_table which column to aggregate
- `index='pclass'` puts class on rows
- `columns='sex'` puts sex on columns
- `aggfunc='count'` counts records
- `sns.heatmap()` visualizes the pivot table as a colored grid
  - `pivot` - data to display, dataframe to plot
  - `annot=True` - show the values inside each cell
  - `fmt='d'` - Format of the displayed values ('d' = integer, '.2f' = 2 decimal places, '.0%' = percentage)
  - `cmap='Blues'` - Color palette ('Blues', 'Reds', 'RdYlGn', 'viridis', etc.)
  - `cbar_kws={'label':'Count'}` - Customize the color bar legend; 'label' adds a title to it
- `annot=True` displays the values in cells

### Your turn
- Are there more males or females? In which class?
- Create a pivot table with `aggfunc='mean'` instead of `'count'` to see average survival rate by class and sex.

In [None]:
# your code

### Your turn
- Create a bar plot (instead of heatmap) from the pivot table using `.plot(kind='bar')`.

In [None]:
# your code

### Exercise 4 — Box plots: Fare distribution by class
Goal: Create a box plot to compare fare distributions across passenger classes.

A box plot shows the distribution of a numeric variable across groups. The box shows the middle 50% of data (quartiles), and the line inside shows the median. Box plots are useful for spotting outliers and comparing distributions between groups.

![alt text](image.png)

In [None]:
# Simple box plot
plt.figure(figsize=(8, 5))
df.boxplot(column='fare', by='pclass', figsize=(8, 5))
plt.title('Fare Distribution by Class')
plt.suptitle('')  # remove automatic title
plt.xlabel('Class')
plt.ylabel('Fare ($)')
plt.show()
# Alternative: using seaborn (often looks nicer)
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x='pclass', y='fare')
plt.title('Fare Distribution by Class')
plt.xlabel('Class')
plt.ylabel('Fare ($)')
plt.show()

- `df.boxplot(column='fare', by='pclass')` creates box plots of fare grouped by class
- `sns.boxplot()` is an alternative using seaborn (more customizable)
- `x='pclass'` puts class on the x-axis
- `y='fare'` puts fare on the y-axis

### Your turn
- Which class has the highest median fare?
- Create a box plot of age by sex (instead of fare by class). What do you observe?

In [None]:
# your code

### Your turn
- Create a box plot of fare by sex. Are fares similar for male and female passengers?

In [None]:
# your code


---

## Reflection

Congratulations on completing the visualization lab! Here are some questions to reflect on:

1. What was the most surprising finding from the Titanic data?
2. Which visualization type (bar, line, heatmap, box plot) was most useful for understanding the data?
3. How would you explain survival patterns to someone who hasn't seen the data?

Feel free to explore the data further and create your own visualizations!