**Importing Libraries**

In [1]:
import pandas as pd
import numpy as np

**READING DATA**

In [2]:
df = pd.DataFrame(pd.read_csv('/content/train.csv'))
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


**CLEANING DATA**

In [3]:
df.shape

(891, 12)

In [4]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

Since Cabin has approximately **80%** of the NaN(Not a Number) values in the data, We remove the Column.

In [5]:
x = df.isnull().sum()
drop_col = x[x > (35/100 * df.shape[0])]
drop_col

Cabin    687
dtype: int64

In [6]:
drop_col.index

Index(['Cabin'], dtype='object')

In [7]:
df.drop(drop_col.index,axis=1,inplace=True)
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Embarked         2
dtype: int64

Other Columns have fewer null values only. 
The Age Column's NaN values are replaced by the mean of the all the ages.

In [8]:
df.fillna(df.mean(),inplace=True)
df.isnull().sum()

PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Embarked       2
dtype: int64

'S' (Southampton) is the most frequent value in the Embarked Column.
We replace the NaN values with (Southampton) 'S'.

In [9]:
df['Embarked'].describe()

count     889
unique      3
top         S
freq      644
Name: Embarked, dtype: object

In [10]:
df['Embarked'].fillna('S',inplace=True)

In [11]:
df.isnull().sum()

PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Embarked       0
dtype: int64

The data is now Cleaned and is ready for Analysis.

**ANALYSING DATA**

In [12]:
df.corr()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
PassengerId,1.0,-0.005007,-0.035144,0.033207,-0.057527,-0.001652,0.012658
Survived,-0.005007,1.0,-0.338481,-0.069809,-0.035322,0.081629,0.257307
Pclass,-0.035144,-0.338481,1.0,-0.331339,0.083081,0.018443,-0.5495
Age,0.033207,-0.069809,-0.331339,1.0,-0.232625,-0.179191,0.091566
SibSp,-0.057527,-0.035322,0.083081,-0.232625,1.0,0.414838,0.159651
Parch,-0.001652,0.081629,0.018443,-0.179191,0.414838,1.0,0.216225
Fare,0.012658,0.257307,-0.5495,0.091566,0.159651,0.216225,1.0


We Combine the Columns No of Sibling/Spouse Aboard (SibSp) and No of Parent/Children Aboard (Parch) into Family_Size.

In [13]:
df['FamilySize'] = df['SibSp'] + df['Parch']
df.drop(['SibSp','Parch'] , axis=1 , inplace=True)
df.corr()

Unnamed: 0,PassengerId,Survived,Pclass,Age,Fare,FamilySize
PassengerId,1.0,-0.005007,-0.035144,0.033207,0.012658,-0.040143
Survived,-0.005007,1.0,-0.338481,-0.069809,0.257307,0.016639
Pclass,-0.035144,-0.338481,1.0,-0.331339,-0.5495,0.065997
Age,0.033207,-0.069809,-0.331339,1.0,0.091566,-0.248512
Fare,0.012658,0.257307,-0.5495,0.091566,1.0,0.217138
FamilySize,-0.040143,0.016639,0.065997,-0.248512,0.217138,1.0


We see that Family_Size is not much correlated to Survival chances.

So We go to next step , Where Instead of seeing the size of the Family , We just check whether has a Family Member on Aboard or not (Alone).

For that , We Create a Column Alone which has a value 1 if he doesn't have a Family Member on Board or it has a value 0 if he has a Family Member on Board.

In [14]:
df['Alone'] = [0 if df['FamilySize'][i]>0 else 1 for i in df.index]
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,Ticket,Fare,Embarked,FamilySize,Alone
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,A/5 21171,7.25,S,1,0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,PC 17599,71.2833,C,1,0
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,STON/O2. 3101282,7.925,S,0,1
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,113803,53.1,S,1,0
4,5,0,3,"Allen, Mr. William Henry",male,35.0,373450,8.05,S,0,1


In [15]:
df.groupby(['Alone'])['Survived'].mean()

Alone
0    0.505650
1    0.303538
Name: Survived, dtype: float64

The chances of survival **increases** by 20% when the person has a **Family**

In [16]:
df[['Alone','Fare']].corr()

Unnamed: 0,Alone,Fare
Alone,1.0,-0.271832
Fare,-0.271832,1.0


The chances of ticket price getting higher is **high** if the person **had a family onboard** (not alone).

In [17]:
df['Sex'] = [ 0 if df['Sex'][i] == 'male' else 1 for i in df.index]
df.groupby(['Sex'])['Survived'].mean()

Sex
0    0.188908
1    0.742038
Name: Survived, dtype: float64

We can see that **Women** were prioritised over men ,since women had a **higher** suvival chance than men.

In [18]:
df.groupby(['Embarked'])['Survived'].mean()

Embarked
C    0.553571
Q    0.389610
S    0.339009
Name: Survived, dtype: float64

We can clearly see that those who **embarked at C** had a **higher** survival chances.

In [19]:
df['Young_Age'] = [1 if df['Age'][i] <20 else 0 for i in df.index]
df.groupby(['Young_Age'])['Survived'].mean()

Young_Age
0    0.361761
1    0.481707
Name: Survived, dtype: float64

In [20]:
df['Too_Young_Age'] = [1 if df['Age'][i] <10 else 0 for i in df.index]
df.groupby(['Too_Young_Age'])['Survived'].mean()

Too_Young_Age
0    0.366707
1    0.612903
Name: Survived, dtype: float64

We can see that Children and teenagers had a **higher** survival chances than the Older People.

**CONCLUSIONS**

* People who had a Family onBoard had a Higher Survival Chances.
* Female Passengers had Higher Survival Chances than Male Passengers.
* People who had Embarked at Southampton had a Higher Survival Chances.
* Younger People(Age<20) had higher Survival Chances than Older/Mid Aged People.
* Richer People had Higher Success Rate than Poorer People. This Hierarchy might have been followed while Saving the Passengers from the Sinking Titanic Ship.