# Analyzing the Titanic Data

## Introductionary words

This **project** is part of the **[Udacity data analyst nanodegree](https://www.udacity.com/course/data-analyst-nanodegree--nd002)**. This analysis is done by **[Guillaume Simler](https://github.com/guillaumesimler)** as part of his nanodegree's graduation.

For more infos, please have a look at the related **[githup repo](https://github.com/guillaumesimler/nanodap1)**

## Some discussions about the facts

Everybody knows about the story of the RMS Titanic, the unsinkable sunken cruiser, and her catastrophic ending. If it is not the case, please have a look at the [wikipedia page](https://en.wikipedia.org/wiki/RMS_Titanic).

As it was considered unsinkable, there were no need for life boat capacity matching at least the number of passengers and crews. So this error became quite **fatal**.
Actually you would need far more capacity as the sinking of the [Costa Concordia](https://en.wikipedia.org/wiki/Costa_Concordia_disaster) showed.

One last thing about the Titanic
> *Built by Irishmen. Sunk by Englishmen*

## 1. Loading modules & files 

In [1]:
# Import Modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Import Data 

passenger_df = pd.read_csv('titanic-data.csv')

#### Testing the data loading

In [3]:
passenger_df.head(7)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S


In [32]:
# Number of data set

len(passenger_df.index)

892

### First Analysis of the data

Several topics need to be checked after the data:
- Some passengers have no **Cabin** number. Out of the small data set, it seems to be linked with the class: the third class passengers seems to have no numbered cabine. **This needs to be checked !**
- Some persons don't have any age data filed. **It needs to be checked how frequent this happens !!**


#### Check the cabine issue

In [33]:
# Select the data for the passenger with no cabin

passengers_without_cabins = passenger_df.loc[pd.isnull(passenger_df['Cabin'])]

In [63]:
# Check if there is a link between the absence of cabin and the class

pwoc_nb = passengers_without_cabins.groupby('Pclass').size()

print 'The number of passengers by class without a cabin number is'
print pwoc_nb


The number of passengers by class without a cabin number is
Pclass
1     40
2    168
3    479
dtype: int64


In [65]:
# Check the number of passenger with cabins

pwc_nb = passenger_df.groupby('Pclass').size() - pwoc_nb

print 'The number of passengers by class without a cabin number is'
print pwc_nb

The number of passengers by class without a cabin number is
Pclass
1    176
2     16
3     12
dtype: int64


In [68]:
# Check values

(pwc_nb.sum() + pwoc_nb.sum()) == len(passenger_df.index)

True