# Titanic Data Analysis

## This notebook is used for analysing passenger data of the RMS Titanic, which unfortunately sank in the North Atlantic Ocean during its maiden voyage on 15 April 1912.

### Import necessary packages and the dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

titanic = pd.read_csv('../Titanic.csv')

---

## 1) Average price of a ticket categorised by the gender and age of the passengers.

### Create a function to round off numbers to the nearest 10.

In [None]:
def round_10(num):
	if num % 10 < 5:
		return int(num // 10) * 10
	else:
		return int(num // 10 + 1) * 10

### Create the **age_fare** DataFrame by taking the columns *[Sex, Age, Fare]* from the **titanic** DataFrame.

In [None]:
age_fare = titanic[
	['Sex', 'Age', 'Fare']
].dropna()

### Round off the age values to the nearest 10 (using the function defined previously) and update the *[Age]* column of the **age_fare** DataFrame.

In [None]:
age_fare['Age'] = age_fare['Age'].round(0).apply(round_10)

### Group the **age_fare** DataFrame by *[Age, Sex]*, calculate the average fare to 2 decimal places and update the *[Fare]* column.

In [None]:
age_fare = age_fare.groupby(['Age', 'Sex']).mean().round(2)

### Rename the index columns and indices to more appropriate names.

In [None]:
age_fare.index.names = ['Age', 'Gender']

age_fare.rename(
	index={
		'male': 'Male',
		'female': 'Female'
	},
	inplace=True
)

### Plot the **age_fare** DataFrame as a bar graph.

In [None]:
figure = age_fare.unstack().plot(
	title='Average fare by gender and age',
	ylabel='Fare (in USD)',
	y='Fare',
	kind='bar',
	rot=0
)

### Analysis

Women between the ages of **20** and **60** (rounded off to the nearest 10) paid more for their tickets on average than men of the same age.

![Age-Fare Graph](../graphs/age_fare.png)

---

## 2) Average price of ticket categorised by the gender of the passengers and the classes of seats on the ship.

### Create the **fare_gender** DataFrame by taking the *[Fare, Sex, Pclass]* columns from the **titanic** DataFrame and grouping them by *[Sex, Pclass]*.

### Also calculate the average of the *[Fare]* column to 2 decimal places and add it as a column to the **fare_gender** DataFrame. 

In [None]:
fare_gender = titanic[
	['Fare', 'Sex', 'Pclass']
].dropna().groupby(['Sex', 'Pclass']).mean().round(2)

### Rename the index columns and indices to more appropriate names.

In [None]:
fare_gender.index.names = ['Gender', 'Passenger Class']

fare_gender.rename(
	index={
		'female': 'Female',
		'male': 'Male',
		1: 'First',
		2: 'Second',
		3: 'Third'
	},
	inplace=True
)

### Plot the **fare_gender** DataFrame as a bar graph.

In [None]:
figure = fare_gender.unstack().plot(
	title='Average fare by gender and passenger class',
	ylabel='Fare (in USD)',
	y='Fare',
	kind='bar',
	rot=0
)

### Add labels to each bar in the graph.

In [None]:
for container in figure.containers:
	figure.bar_label(container)

### Analysis

- In **First** Class, the average price of a ticket for women was **$38.9** more than men.

- In **Second** and **Third** Class, the average price of a ticket was around the same for both men and women.

![Fare-Gender Graph](../graphs/fare_gender.png)

---

## 3) Number of people boarding from each city categorised by the gender of the passengers.

### Take the columns *[Embarked, Sex]* from the **titanic** DataFrame, group it by *[Embarked, Sex]* and compute the size.

### Store the resultant Series in a dictionary with the key **Count** and convert this dictionary to a DataFrame called **gender_embark**.

In [None]:
gender_embark = pd.DataFrame(
	{
		'Count': titanic[
			['Embarked', 'Sex']
		].dropna().groupby(
			['Embarked', 'Sex']
		).size()
	}
)

### Rename the index columns and indices to more appropriate names.

In [None]:
gender_embark.index.names = ['City', 'Gender']

gender_embark.rename(
	index={
		'C': 'Cherbourg',
		'Q': 'Queenstown (Cobh)',
		'S': 'Southampton',
		'female': 'Female',
		'male': 'Male'
	},
	inplace=True
)

### Plot the **gender_embark** DataFrame as a bar graph.

In [None]:
figure = gender_embark.unstack().plot(
	title='Number of people boarding from each city by gender',
	ylabel='Number of people',
	y='Count',
	kind='bar',
	rot=0
)

### Add labels to each bar in the graph.

In [None]:
for container in figure.containers:
	figure.bar_label(container)

### Analysis

- In **Cherbourg** and **Queenstown (Cobh)**, the number of male and female passengers were almost the same.

- In **Southampton**, there were **238** more male passengers than female passengers.

![Gender-Embark Graph](../graphs/gender_embark.png)

---

## 4) Number of survivors categorised by the gender of the passengers and classes of seats on the ship.

### Create the **survivors_class** DataFrame by taking the *[Sex, Pclass, 'Survived]* columns from the **titanic** DataFrame and grouping them by *[Pclass, Sex]*.

### Also calculate the average of the *[Survived]* column to 4 decimal places, multiply it by 100 to convert the values into percentages and add it as a column to the **fare_gender** DataFrame. 

In [None]:
survivors_class = titanic[
	['Sex', 'Pclass', 'Survived']
].dropna().groupby(
	['Pclass', 'Sex']
).mean().round(4) * 100

### Rename the index columns, columns and indices to more appropriate names.

In [None]:
survivors_class.index.names = ['Passenger Class', 'Gender']

survivors_class.rename(
	columns={
		'Survived': 'Survived (in %)'
	},
	index={
		'female': 'Female',
		'male': 'Male',
		1: 'First',
		2: 'Second',
		3: 'Third'
	},
	inplace=True
)

### Plot the **survivors_class** DataFrame as a bar graph.

In [None]:
figure = survivors_class.unstack().plot(
	title='Survivors by gender and class',
	xlabel='Passenger class',
	ylabel='Survived (in %)',
	y='Survived (in %)',
	kind='bar',
	rot=0
)

### Add labels to each bar in the graph.

In [None]:
for container in figure.containers:
	figure.bar_label(container)

### Analysis

- In **First** Class, **96.81%** of women and **36.89%** of men survived the accident.

- In **Second** Class, **92.11%** of women and **15.74%** of men survived the accident.

- In **Third** Class, **50%** of women and **13.54%** of men survived the accident.

![Survivors-Class Graph](../graphs/survivors_class.png)