# Exploratory Data Analysis on the Titanic Data Set  
by: Ilyas Ustun
Tools: **pandas • matplotlib • seaborn**  
Target: 10 Pandas questions + 10 Visualization questions  
Difficulty: Beginner → Intermediate

In [None]:
# 0. Imports & Data Load
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style="whitegrid")
df = sns.load_dataset("titanic")
df.head()

---
## PART 1 – Pandas (10 questions)

### Question 1  
How many passengers are in the data set?

In [None]:
num_passengers = ...
num_passengers

### Question 2  
What percentage of passengers survived?  
(Hint: use `value_counts(normalize=True)`.)

In [None]:
survival_rate = ...
survival_rate

### Question 3  
Compute the average age of passengers grouped by sex.

In [None]:
avg_age_by_sex = ...
avg_age_by_sex

### Question 4  
Create a frequency table (cross-tab) of survival by passenger class (`pclass`).

In [None]:
surv_by_class = ...
surv_by_class

### Question 5  
Which embarkation port (`embark_town`) had the highest mortality rate?

In [None]:
mortality_by_port = (
    ...
)
mortality_by_port

### Question 6  
Fill missing ages with the median age of passengers in the same passenger class (`pclass`).  
(Hint: define a helper and use `apply` or `transform`.)

In [None]:
df['age_filled'] = ...
df['age_filled'].isna().sum()      # should be 0

### Question 7  
Create a new column `family_size` = `sibsp` + `parch` + 1 (the passenger).  
Then show the five largest families aboard.

In [None]:
df['family_size'] = ...
largest_families = ...
largest_families

### Question 8  
Using `groupby`, compute the mean fare paid per adult in each family (i.e., `fare / family_size`). Add it as a new column `fare_per_person`.

In [None]:
df['fare_per_person'] = ...
df[['fare', 'family_size', 'fare_per_person']].head()

### Question 9  
Create a pivot table that shows the average survival rate for each combination of `sex` and `pclass`.

In [None]:
pivot_surv = ...
pivot_surv

### Question 10  
Sort the data by `fare_per_person`, descending, and display the top 10 rows.

In [None]:
top_spenders = ...
top_spenders

---
## PART 2 – Visualization (10 questions)

### Visualization 1  
Make a bar chart of the absolute number of survivors vs. non-survivors.

In [None]:
plt.figure(figsize=(4,3))

plt.show()

### Visualization 2  
Draw a histogram (or KDE) of passenger ages, overlaying survived vs. not survived.

In [None]:
plt.figure(figsize=(6,4))

plt.show()

### Visualization 3  
Create a boxplot comparing fares across passenger classes.

In [None]:
plt.figure(figsize=(5,4))

plt.show()

### Visualization 4  
Build a violin plot of age by sex and survival status (use `hue`).

In [None]:
plt.figure(figsize=(6,4))

plt.show()

### Visualization 5  
Scatter plot: Age vs. Fare, colored by survival and sized by family size.

In [None]:
plt.figure(figsize=(7,5))

plt.show()

### Visualization 6  
Create a stacked percentage bar chart of survival rate within each passenger class.

In [None]:
crosstab = pd.crosstab(df['pclass'], df['survived'], normalize='index')

plt.show()

### Visualization 7  
Heatmap of the correlation matrix for numeric variables.

In [None]:
plt.figure(figsize=(6,4))

plt.show()

### Visualization 8  
FacetGrid: Plot survival status across embarkation towns with separate panels for sex.

In [None]:
g = sns.catplot(...........)
g.fig.suptitle('Embarkation Port vs. Survival by Sex', y=1.02)

### Visualization 9  
Plot the cumulative percentage of total fare revenue contributed by passengers, ordered from highest to lowest `fare` (Pareto curve).

In [None]:
df_sorted = df.sort_values('fare', ascending=False).reset_index(drop=True)

plt.show()

### Visualization 10  
Build a mosaic (or alternative) plot that shows the relationship between `class`, `sex`, and `survived`.

In [None]:
g = sns.catplot(...........)

g.fig.suptitle('Survival by Class & Sex', y=1.02)

---
### 🎯 What to submit
1. The filled-in notebook (`.ipynb`).  
2. Brief write-ups (1–3 sentences each) summarizing insights from at least five of the visualizations.

Happy analyzing!