# TITANIC SURVIVOR DATA ANALYSIS

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

**Reading DATA using Pandas**
We are using the read_csv function to read the dataset into a pandas data frame.


In [None]:
df = pd.read_csv('../input/titanic-survivor/titanic.csv')
df.head()

**DESCRIPTION OF THE ATTRIBUTES:**

*   Pclass: Passenger Class(1 = 1st; 2 = 2nd; 3 = 3rd)
*   Survival: Survival(0 = NO; 1 = YES)
*   Name: Name
*   Sex: Sex
*   sibsp: Number of siblings/Spouses Aboard
*   parch: Number of parents/ Children Aboard
*   Ticket: Ticket Number
*   Fare: Passenger Fare(British Pound)
*   Cabin: Cabin
*   Embarked: Port of Embarkation( C = Cherbourg, Q = Queenstown, S = Southhampton)













**Handeling Null Values:**

The data set contains many rows and columns for which some data is missing. In such a situation we have 2 methods to cater these missing values.
1. Dropping the entire row or column
2. Replacing the value with the mean of the column.

In [None]:
df.isnull().sum()

In [None]:
plt.plot(df.columns, [df[column].isnull().sum() for column in df]);

In [None]:
print(df.shape)                                                   

**Seperating the column which has more than 35% of its values missing**

In [None]:
drop_column = df.isnull().sum()[df.isnull().sum() > (35/100 * df.shape[0])]
drop_column


In [None]:
drop_column.index

**Dropping the column**

In [None]:
df.drop(drop_column.index, axis=1, inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df.hist(figsize = (10, 10));

**Filling the remaining null values with mean of the column**

In [None]:
df.fillna(df.mean(), inplace=True)
df.isnull().sum()

**Because Embarked has string values and we can not find mean of string values.**

In [None]:
df["Embarked"].describe()

**We are filling in the null values in Embarked with the most frequent value in the column.**

In [None]:
df['Embarked'].fillna('S', inplace=True)
df.isnull().sum()

**Finding the corelation**

In [None]:
df.corr()

*   sibsp: Number of siblings/Spouses Aboard
*   parch: Number of parents/ Children Aboard

Combining them to make a new column named family size.





In [None]:
df['FamilySize'] = df["SibSp"] + df["Parch"]
df.drop(["SibSp", "Parch"], inplace=True, axis=1)
df.corr()

**FamilySize on the ship doesn't have much corevelance with the survival rate.**


Now checking if being alone on the ship affected the survival rate.

In [None]:
df['Sex'].value_counts().plot(kind='bar', rot=0, edgecolor='black', linewidth=1.2, color=['C0', 'C1'])
plt.title('Number of Males and Females Aboard')
plt.ylabel('Total Passengers')
plt.xlabel('Passenger Gender')
plt.tight_layout()
plt.show()

In [None]:
df["Alone"] = [0 if df["FamilySize"][i] > 0 else 1 for i in df.index]
df.head()

In [None]:
df.groupby(['Alone'])['Survived'].mean()


0 = Not Alone

1 = Alone

So, if a person is alone, he/she has a lesser chance of survival.

**The reason might be that a person travelling with a family must belong to an upper class and they must have been prioritized over the other.**


In [None]:
df[['Alone', 'Fare']].corr()

So we can see that the person was not alone, the chancethat the ticket price will be high is higher.

In [None]:
plt.hist(df.Age, bins=8, linewidth=1, edgecolor='black')
plt.title('Titanic Passenger Age Distribution')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

In [None]:
df['Sex'] = [0 if df['Sex'][i] == 'male' else 1 for i in df.index]
df.groupby(['Sex'])['Survived'].mean()

**This shows that a female passenger had a higher chane of survival.**

**It shows that females were prioritized over male paggengers.**

In [None]:
df.plot(x = "Sex", y = "Survived", kind = "hist");

In [None]:
df.groupby(['Embarked'])['Survived'].mean()

**This shows that people who embarked from Cherbourg had a higher chance of survival.**



> **CONCLUSION**


*   Female passengers were prioritized over male.
*   Passengers travelling with their families had a higher chance of survival.

*   Passengers who boarded fron Cherbourg, survived more in proportion to others.
*   People belonging from upper class had a higher chance of survival. A class hierarichy must have been followe while saving the passengers.









