# Titanic Survival Patterns

**What this notebook shows**
- End-to-end exploratory analysis (loading, cleaning, EDA)
- Clear visual storytelling and interpretation

**Data**
- See in-notebook references (no external files required).

In [None]:
# import all the necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

##### Now load the data file _titanic.csv_ into a Pandas object

In [None]:
df = pd.read_csv ('titanic.csv')

**Q1a (2.5 points).** Show the first 10 lines from the data file

In [None]:
df.head(10)

**Q1b (2.5 points).** Show the last 10 lines from the data file

In [None]:
df.tail(10)

**Q2 (5 points).** Output the column names and their types

In [None]:
df.dtypes

**Q3 (5 points).** Give the details of the passenger with ID 523

In [None]:
df[df["PassengerId"] == 523]

**Q4a (2.5 points).** How many female survivors were there?

In [None]:
df[(df["Sex"] == "female") & (df["Survived"] == 1)].shape[0]

**Q4b (2.5 points).** How many male survivors were there?

In [None]:
df[(df["Sex"] == "male") & (df["Survived"] == 1)].shape[0]

**Q5 (5 points).** What was the mean age of women that survived?

In [None]:
women_survivors = df[(df["Sex"] == "female") & (df["Survived"] == 1)]
women_survivors["Age"].mean()

**Q6 (5 points).** What was the average fare in the first class?

In [None]:
df[df["Pclass"] == 1]["Fare"].mean()

**Q7 (5 points).** Give a histogram of the distribution of ages of all passengers on the Titanic  

Bin the ages into the following bins and get the frequency distribution:  
- 0 to 9
- 10 to 19
- 20 to 29
- 30 to 39
- 40 to 49
- 50 to 59
- 60 to 69
- 70 and above  

After you obtain the frequency distribution, plot the histogram. Make sure you label the axes appropriately.


In [None]:
age_bins = [0, 9, 19, 29, 39, 49, 59, 69, 79, np.inf]
df["Age"].dropna().groupby(pd.cut(df["Age"].dropna(), bins=age_bins)).count()
#frequency distribution
print(df["Age"].dropna().groupby(pd.cut(df["Age"].dropna(), bins=age_bins)).count())

#histogram
plt.hist(df["Age"].dropna(), bins=age_bins, edgecolor="black")
plt.xlabel("Age")
plt.ylabel("Number of Passengers")
plt.title("Distribution of Ages on Titanic")
plt.show()

**Q8 (5 points).** How many passengers do not have an age listed?

In [None]:
df["Age"].isna().sum()

**Q9 (5 points).** Give a histogram of the distribution of fares of all passengers on the Titanic
Bin the fares into the following bins and get the frequency distribution:
- 1 to 10
- 11 to 20
- 21 to 30
- 31 to 40
- 41 to 50
- 51 to 60
- 61 to 70
- 71 to 80
- 81 to 90
- 91 to 100
- 101 and above  

Then plot the frequency in each of the bins as a histogram. Label the axes.

In [None]:
fare_bins = [1,10,20,30,40,50,60,70,80,90,100, df["Fare"].max()]
df.groupby(pd.cut(df["Fare"], bins=fare_bins)).size()
print(df.groupby(pd.cut(df["Fare"], bins=fare_bins)).size())


plt.hist(df["Fare"], bins=fare_bins, edgecolor="black")
plt.xlabel("Fares")
plt.ylabel("Number of Passengers")
plt.title("Distribution of Fares on Titanic")
plt.show()

**Q10 (5 points).** There were three passenger classes - 1, 2, and 3. Draw a histogram showing the survival numbers (by gender) in the three passenger classes
You will compute the total number of survivors in each of the six categories:  
- Passenger class 1 - Male
- Passenger class 1 - Female
- Passenger class 2 - Male
- Passenger class 2 - Female
- Passenger class 3 - Male
- Passenger class 3 - Female  

Now draw a histogram with these six numbers. Label the axes appropriately. Color code the Male and Female surviors differently.

In [None]:
df[df["Survived"] == 1].groupby(["Pclass","Sex"]).size()
print(df[df["Survived"] == 1].groupby(["Pclass","Sex"]).size())


survival_counts = df[df["Survived"] == 1].groupby(["Pclass","Sex"]).size().unstack()
survival_counts.plot(kind="bar", stacked=False)
plt.xlabel("Passenger Class")
plt.ylabel("Number of Survivors")
plt.title("Survivors by Class and Gender")
plt.xticks(rotation=0)
plt.legend(title="Gender")
plt.show()