# <center style="color:#0178BD; font-family:cursive;"> Table of Contents</center>

1) [Introduction](#introduction)
  * [1.1 Context](#context)
  * [1.2 Data Dictionary](#data-dictionary)
 
  
2) [Preparation](#preparation)
  * 2.1 [Importing Packages](#importing-packages)
  * 2.2 [Understanding the Data](#understanding-data)
  * 2.3 [Cleaning the Data](#cleaning-data)
  
  
3) [Exploratory Data Analysis](#eda)
   * 3.1 [Univariate Analysis](#univariate-analysis)
        * 3.1.1 [Survived Count Plot](#survived-countplot)
        * 3.1.2 [Other Count Plots](#other-countplots)
        * 3.1.3 [Distribution Plots](#distribution-plots)
   * 3.2 [Multivariate Analysis](#multivariate-analysis)
        * 3.2.1 [Count Plots wrt Survived](#countplots-wrt-survived)
        * 3.2.2 [Count Plots wrt Others](#countplots-wrt-others)
        * 3.2.3 [Distribution wrt Survived](#distribution-wrt-survived)
        * 3.2.4 [Age and Fare Plots](#age-fare-plots)
        * 3.2.5 [3D Scatter Plot](#3d-scatterplot)
        
        
4) [Machine Learning](#machine-learning)
   * 4.1 [Data Preprocessing](#data-preprocessing)
   * 4.2 [Model building](#model-building)
        * 4.2.1 [Logistic Regression](#logistic-regression)
        * 4.2.2 [Decision Tree Classifier](#decision-tree)
        * 4.2.3 [Random Forest Classifier](#random-forest)
        * 4.2.4 [Gradient Boosting Classifier](#gradient-boost)
        * 4.2.5 [XG Boost Classifier](#xg-boost)
        
        
5) [Acknowledgements](#acknowledgements)

<a id="introduction"></a>
# <center style="color:#0178BD; font-family:cursive;"> Introduction</center>

<a id="context"></a>
<span style="font-size:18px; color:#368DC5; font-family:cursive;"> 1.1 Context</span>

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

<a id="data-dictionary"></a>
<span style="font-size:18px; color:#368DC5; font-family:cursive;"> 1.2 Data Dictionary</span>

`Age` : Age in Years

`Sex` : Sex

`Pclass` : Ticket Class (1 = 1st, 2 = 2nd, 3 = 3rd)

`Embarked` : Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

`SibSp` : Number of siblings / spouses aboard

`Parch` : Number of parents / children aboard

`Fare` : Passenger Fare

`Survived` : Survived = 1, Died = 0




<a id="preparation"></a>
# <center style="color:#0178BD; font-family:cursive;"> Preparation</center>

<a id="importing-packages"></a>
<span style="font-size:18px; color:#368DC5; font-family:cursive;"> 2.1 Importing Packages</span>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

import warnings
from termcolor import colored
warnings.filterwarnings("ignore")

<a id="understanding-data"></a>
<span style="font-size:18px; color:#368DC5; font-family:cursive;"> 2.2 Understanding the Data</span>

In [None]:
df = pd.read_csv('../input/titanic/train.csv')

In [None]:
df

In [None]:
df.isnull().sum()

<a id="cleaning-data"></a>
<span style="font-size:18px; color:#368DC5; font-family:cursive;"> 2.3 Cleaning the Data</span>

In [None]:
## Dropping Unnecessary Columns
drop_cols = ['PassengerId','Cabin', 'Ticket', 'Name']
df.drop(drop_cols, axis = 1, inplace = True)

In [None]:
#### Handling Missing Values

## Fill missing AGE with Median
df['Age'].fillna(df['Age'].median(), inplace=True)

## Fill missing EMBARKED with Mode
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df

In [None]:
df.drop(df[(df['Fare'] > 270)].index, inplace=True)
df

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#368DC5;
           font-size:110%;
           font-family:Verdana;
           letter-spacing:0.5px">

<p style="padding: 10px;
              color:white;">
Since we have dropped the Unnecessary Columns and Filled the Missing Values ,we are now ready for Exploratory Data Analysis.
</p>
</div>

<a id="eda"></a>
# <center style="color:#0178BD; font-family:cursive;"> Exploratory Data Analysis</center>

In [None]:
## Constants
color_palettes = ['#3c79e1', '#4c6b7d', '#e5e9ec', '#1f2c2c', '#20ADD0', '#4A8670']
background_color = "#97CADB"
font = 'cursive'

<a id="univariate-analysis"></a>
## <center style="color:#0178BD; font-family:cursive;"> 3.1 Univariate Analysis</center>

<a id="survived-countplot"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.1.1 Survived Count Plot</span>

In [None]:
fig = plt.figure(figsize=(18,6))
gs = fig.add_gridspec(1,2)
gs.update(wspace=0.3, hspace=0.15)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0,1])


color_palette = ["#f56476","#ff8811","#ff0040","#ff7f6c"]
fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 
ax1.set_facecolor(background_color)

# Title of the plot
ax0.text(0.5,0.5,"Survived Count Plot\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#001B48')

ax0.set_xticklabels([])
ax0.set_yticklabels([])
ax0.tick_params(left=False, bottom=False)

# Survived Count
ax1.text(0.45,560,"Survived",fontsize=14, fontweight='bold', fontfamily='serif', color="#000000")
ax1.grid(color='#001B48', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax1, data=df, x = 'Survived',palette = color_palettes)
ax1.set_xlabel("")
ax1.set_ylabel("")
ax1.set_xticklabels(["Died(0)","Survived(1)"])

ax0.spines["top"].set_visible(False)
ax0.spines["left"].set_visible(False)
ax0.spines["bottom"].set_visible(False)
ax0.spines["right"].set_visible(False)
ax1.spines["top"].set_visible(False)
ax1.spines["left"].set_visible(False)
ax1.spines["right"].set_visible(False)

<a id="other-countplots"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.1.2 Other Count Plots</span>

In [None]:
fig = plt.figure(figsize=(18,12))
gs = fig.add_gridspec(2,2)
gs.update(wspace=0.5, hspace=0.25)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0,1])
ax2 = fig.add_subplot(gs[1,0])
ax3 = fig.add_subplot(gs[1,1])


fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 
ax1.set_facecolor(background_color) 
ax2.set_facecolor(background_color) 
ax3.set_facecolor(background_color)

# Title of the plot
ax0.spines["bottom"].set_visible(False)
ax0.spines["left"].set_visible(False)
ax0.spines["top"].set_visible(False)
ax0.spines["right"].set_visible(False)
ax0.tick_params(left=False, bottom=False)
ax0.set_xticklabels([])
ax0.set_yticklabels([])
ax0.text(0.5,0.5,
         'Count plot for various\n categorical features\n_________________',
         horizontalalignment='center',
         verticalalignment='center',
         fontsize=18, fontweight='bold',
         fontfamily='cursive',
         color="#000000")

# Sex Count
ax1.text(0.4, 650, 'Sex', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax1.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax1,data=df,x='Sex',palette=color_palettes)
ax1.set_xlabel("")
ax1.set_ylabel("")

# Pclass Count
ax2.text(0.7, 550, 'Pclass', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax2.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax2,data=df,x='Pclass',palette=color_palettes)
ax2.set_xlabel("")
ax2.set_ylabel("")

# Embarked Count
ax3.text(0.7, 700, 'Embarked', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax3.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax3,data=df,x='Embarked',palette=color_palettes)
ax3.set_xlabel("")
ax3.set_ylabel("")

for s in ["top","right","left"]:
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)

<a id="distribution-plots"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.1.3 Distribution Plots</span>

In [None]:
fig = plt.figure(figsize=(18, 30))
gs = fig.add_gridspec(8,2)
gs.update(wspace=0.5, hspace=0.5)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0,1])
ax2 = fig.add_subplot(gs[1,0])
ax3 = fig.add_subplot(gs[1,1])
ax4 = fig.add_subplot(gs[2,0])
ax5 = fig.add_subplot(gs[2,1])
ax6 = fig.add_subplot(gs[3,0])
ax7 = fig.add_subplot(gs[3,1])


fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 
ax1.set_facecolor(background_color) 
ax2.set_facecolor(background_color)
ax3.set_facecolor(background_color)
ax4.set_facecolor(background_color)
ax5.set_facecolor(background_color) 
ax6.set_facecolor(background_color) 
ax7.set_facecolor(background_color)

# Age title
ax0.text(0.5,0.5,"Distribution of Age\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax0.spines["bottom"].set_visible(False)
ax0.set_xticklabels([])
ax0.set_yticklabels([])
ax0.tick_params(left=False, bottom=False)

# Age
ax1.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.histplot(ax=ax1,x=df['Age'],color="#3c79e1",kde=True)
ax1.set_xlabel("")
ax1.set_ylabel("")

# Fare title
ax2.text(0.5,0.5,"Distribution of Fare\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax2.spines["bottom"].set_visible(False)
ax2.set_xticklabels([])
ax2.set_yticklabels([])
ax2.tick_params(left=False, bottom=False)

# Fare
ax3.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.histplot(ax=ax3,x=df['Fare'],color="#3c79e1", kde=True)
ax3.set_xlabel("")
ax3.set_ylabel("")

# SibSp title
ax4.text(0.5,0.5,"Distribution of SibSp\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax4.spines["bottom"].set_visible(False)
ax4.set_xticklabels([])
ax4.set_yticklabels([])
ax4.tick_params(left=False, bottom=False)

# SibSp
ax5.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.boxplot(ax=ax5,x=df['SibSp'],color="#34495E")
ax5.set_xlabel("")
ax5.set_ylabel("")

# Parch title
ax6.text(0.5,0.5,"Distribution of Parch\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax6.spines["bottom"].set_visible(False)
ax6.set_xticklabels([])
ax6.set_yticklabels([])
ax6.tick_params(left=False, bottom=False)

# Parch
ax7.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.boxenplot(ax=ax7,x=df['Parch'],color="#34495E")
ax7.set_xlabel("")
ax7.set_ylabel("")


for i in ["top","left","right"]:
    ax0.spines[i].set_visible(False)
    ax1.spines[i].set_visible(False)
    ax2.spines[i].set_visible(False)
    ax3.spines[i].set_visible(False)
    ax4.spines[i].set_visible(False)
    ax5.spines[i].set_visible(False)
    ax6.spines[i].set_visible(False)
    ax7.spines[i].set_visible(False)

<a id="multivariate-analysis"></a>
## <center style="color:#0178BD; font-family:cursive;"> 3.2 Multivariate Analysis</center>

<a id="countplots-wrt-survived"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.2.1 Count Plots wrt Survived</span>

In [None]:
fig = plt.figure(figsize=(18,12))
gs = fig.add_gridspec(2,2)
gs.update(wspace=0.5, hspace=0.25)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0,1])
ax2 = fig.add_subplot(gs[1,0])
ax3 = fig.add_subplot(gs[1,1])


fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 
ax1.set_facecolor(background_color) 
ax2.set_facecolor(background_color) 
ax3.set_facecolor(background_color)

# Title of the plot
ax0.spines["bottom"].set_visible(False)
ax0.spines["left"].set_visible(False)
ax0.spines["top"].set_visible(False)
ax0.spines["right"].set_visible(False)
ax0.tick_params(left=False, bottom=False)
ax0.set_xticklabels([])
ax0.set_yticklabels([])
ax0.text(0.5,0.5,
         'Count plots wrt\n Survived\n_________________',
         horizontalalignment='center',
         verticalalignment='center',
         fontsize=22, fontweight='bold',
         fontfamily='cursive',
         color="#000000")

# Sex Count
ax1.text(0.4, 480, 'Sex', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax1.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax1,data=df,x='Sex', hue='Survived',palette=color_palettes)
ax1.set_xlabel("")
ax1.set_ylabel("")

# Pclass Count
ax2.text(0.7, 390, 'Pclass', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax2.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax2,data=df,x='Pclass', hue='Survived', palette=color_palettes)
ax2.set_xlabel("")
ax2.set_ylabel("")

# Embarked Count
ax3.text(0.7, 450, 'Embarked', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax3.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax3,data=df,x='Embarked', hue='Survived',palette=color_palettes)
ax3.set_xlabel("")
ax3.set_ylabel("")

for s in ["top","right","left"]:
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)

<a id="countplots-wrt-others"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.2.2 Count Plots wrt to Others</span>

In [None]:
fig = plt.figure(figsize=(18,12))
gs = fig.add_gridspec(2,2)
gs.update(wspace=0.5, hspace=0.25)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0,1])
ax2 = fig.add_subplot(gs[1,0])
ax3 = fig.add_subplot(gs[1,1])


fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 
ax1.set_facecolor(background_color) 
ax2.set_facecolor(background_color) 
ax3.set_facecolor(background_color)

# Title of the plot
ax0.spines["bottom"].set_visible(False)
ax0.spines["left"].set_visible(False)
ax0.spines["top"].set_visible(False)
ax0.spines["right"].set_visible(False)
ax0.tick_params(left=False, bottom=False)
ax0.set_xticklabels([])
ax0.set_yticklabels([])
ax0.text(0.5,0.5,
         'Count plots wrt\n Others\n_________________',
         horizontalalignment='center',
         verticalalignment='center',
         fontsize=22, fontweight='bold',
         fontfamily='cursive',
         color="#000000")

# Embarked and Pclass
ax1.text(0.4, 380, 'Embarked and Pclass', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax1.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax1,data=df,x='Embarked', hue='Pclass',palette=color_palettes)
ax1.set_xlabel("")
ax1.set_ylabel("")

# Pclass Count
ax2.text(0.7, 370, 'Pclass and Sex', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax2.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax2,data=df,x='Pclass', hue='Sex', palette=color_palettes)
ax2.set_xlabel("")
ax2.set_ylabel("")

# Embarked Count
ax3.text(0.7, 460, 'Embarked and Sex', fontsize=14, fontweight='bold', fontfamily='cursive', color="#000000")
ax3.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.countplot(ax=ax3,data=df,x='Embarked', hue='Sex',palette=color_palettes)
ax3.set_xlabel("")
ax3.set_ylabel("")

for s in ["top","right","left"]:
    ax1.spines[s].set_visible(False)
    ax2.spines[s].set_visible(False)
    ax3.spines[s].set_visible(False)

<a id="distribution-wrt-survived"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.2.3 Distribution wrt Survived</span>

In [None]:
fig = plt.figure(figsize=(18, 30))
gs = fig.add_gridspec(8,2)
gs.update(wspace=0.5, hspace=0.5)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0,1])
ax2 = fig.add_subplot(gs[1,0])
ax3 = fig.add_subplot(gs[1,1])
ax4 = fig.add_subplot(gs[2,0])
ax5 = fig.add_subplot(gs[2,1])
ax6 = fig.add_subplot(gs[3,0])
ax7 = fig.add_subplot(gs[3,1])

fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 
ax1.set_facecolor(background_color) 
ax2.set_facecolor(background_color)
ax3.set_facecolor(background_color)
ax4.set_facecolor(background_color)
ax5.set_facecolor(background_color) 
ax6.set_facecolor(background_color) 
ax7.set_facecolor(background_color)

# Age title
ax0.text(0.5,0.5,"Distribution of Age\naccording to\n Survival\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax0.spines["bottom"].set_visible(False)
ax0.set_xticklabels([])
ax0.set_yticklabels([])
ax0.tick_params(left=False, bottom=False)

# Age
ax1.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.kdeplot(ax=ax1, data=df, x='Age',hue="Survived", fill=True,palette=["#ff8811","#3339FF"], alpha=.5, linewidth=0)
ax1.set_xlabel("")
ax1.set_ylabel("")

# Fare title
ax2.text(0.5,0.5,"Distribution of Fare\naccording to\n Survival\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='',
        color='#000000')
ax2.spines["bottom"].set_visible(False)
ax2.set_xticklabels([])
ax2.set_yticklabels([])
ax2.tick_params(left=False, bottom=False)

# Fare
ax3.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.kdeplot(ax=ax3, data=df, x='Fare',hue="Survived", fill=True,palette=["#ff8811","#3339FF"], alpha=.5, linewidth=0)
ax3.set_xlabel("")
ax3.set_ylabel("")

# SibSp title
ax4.text(0.5,0.5,"Distribution of SibSp\naccording to\n Survival\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax4.spines["bottom"].set_visible(False)
ax4.set_xticklabels([])
ax4.set_yticklabels([])
ax4.tick_params(left=False, bottom=False)

# SibSp
ax5.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.kdeplot(ax=ax5, data=df, x='SibSp',hue="Survived", fill=True,palette=["#ff8811","#3339FF"], alpha=.5, linewidth=0)
ax5.set_xlabel("")
ax5.set_ylabel("")

# Parch title
ax6.text(0.5,0.5,"Distribution of Parch\naccording to\n Survival\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax6.spines["bottom"].set_visible(False)
ax6.set_xticklabels([])
ax6.set_yticklabels([])
ax6.tick_params(left=False, bottom=False)

# Parch
ax7.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.kdeplot(ax=ax7, data=df, x='Parch',hue="Survived", fill=True,palette=["#ff8811","#3339FF"], alpha=.5, linewidth=0)
ax7.set_xlabel("")
ax7.set_ylabel("")


for i in ["top","left","right"]:
    ax0.spines[i].set_visible(False)
    ax1.spines[i].set_visible(False)
    ax2.spines[i].set_visible(False)
    ax3.spines[i].set_visible(False)
    ax4.spines[i].set_visible(False)
    ax5.spines[i].set_visible(False)
    ax6.spines[i].set_visible(False)
    ax7.spines[i].set_visible(False)

<a id="age-fare-plots"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.2.4 Age and Fare Plots </span>

In [None]:
fig = plt.figure(figsize=(18, 30))
gs = fig.add_gridspec(8,2)
gs.update(wspace=0.5, hspace=0.5)
ax0 = fig.add_subplot(gs[0,0])
ax1 = fig.add_subplot(gs[0,1])
ax2 = fig.add_subplot(gs[1,0])
ax3 = fig.add_subplot(gs[1,1])


fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 
ax1.set_facecolor(background_color) 
ax2.set_facecolor(background_color)
ax3.set_facecolor(background_color)


# ScatterPlot Title
ax0.text(0.5,0.5,"Distribution of Age and Fare\naccording to\n Survival\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax0.spines["bottom"].set_visible(False)
ax0.set_xticklabels([])
ax0.set_yticklabels([])
ax0.tick_params(left=False, bottom=False)

# Age and Fare ScatterPlot
ax1.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.scatterplot(ax=ax1, data=df, x='Age',y='Fare', hue="Survived",palette=["#ff8811","#3339FF"], alpha=.5, linewidth=0)
ax1.set_xlabel("Age")
ax1.set_ylabel("Fare")


# Density Plot Title
ax2.text(0.5,0.5,"Age and Fare\n Density Plot\n___________",
        horizontalalignment = 'center',
        verticalalignment = 'center',
        fontsize = 18,
        fontweight='bold',
        fontfamily='cursive',
        color='#000000')
ax2.spines["bottom"].set_visible(False)
ax2.set_xticklabels([])
ax2.set_yticklabels([])
ax2.tick_params(left=False, bottom=False)

# Age and Fare Density Plot
ax3.grid(color='#000000', linestyle=':', axis='y', zorder=0,  dashes=(1,5))
sns.kdeplot(ax=ax3, data=df, x='Age',y="Fare", fill=True,palette=["#ff8811","#3339FF"])
ax3.set_xlabel("Age")
ax3.set_ylabel("Fare")


for i in ["top","left","right"]:
    ax0.spines[i].set_visible(False)
    ax1.spines[i].set_visible(False)
    ax2.spines[i].set_visible(False)
    ax3.spines[i].set_visible(False)

<a id="3d-scatterplot"></a>
<span style="font-size:16px; color:#368DC5; font-family:cursive;"> 3.2.5  3D Scatter Plot</span>

In [None]:
fig = px.scatter_3d(df, x='Fare', y='Age', z='Sex',
              color='Survived',size_max=18,color_continuous_scale=['#e5e9ec', '#3c79e1'])
fig.update_layout({"template":"plotly_dark"})
fig.show()

For Males:
    
    * Most who `Survived` were below the age of 10.
    * Many `Survived` for whom `Fare` was 50$ - 150$.
    
For Females:
    
    * `Age` is not a very important factor here but `Fare` is
    * Almost all `Survived` for whom `Fare` was greater than 50$.

In [None]:
df_corr = df.corr().transpose()
df_corr

In [None]:
fig = plt.figure(figsize=(10,10))
gs = fig.add_gridspec(1,1)
gs.update(wspace=0.3, hspace=0.15)
ax0 = fig.add_subplot(gs[0,0])
fig.patch.set_facecolor(background_color) 
ax0.set_facecolor(background_color) 

# df_corr = df[['Age', 'Fare', 'SibSp', 'Parch', 'Survived']].corr().transpose()
mask = np.triu(np.ones_like(df_corr))
ax0.text(2,-0.1,"Correlation Matrix",fontsize=22, fontweight='bold', fontfamily='cursive', color="#000000")
sns.heatmap(df_corr,mask=mask,fmt=".1f",annot=True)
plt.show()

<a id="machine-learning"></a>
# <center style="color:#0178BD; font-family:cursive;"> Machine Learning</center>

<a id="data-preprocessing"></a>
## <center style="color:#0178BD; font-family:cursive;"> 4.1 Data Preprocessing</center>

In [None]:
df.head()

In [None]:
df['Sex'] = df['Sex'].apply(lambda x: 1 if x == 'male' else 0)
df['Embarked'] = df['Embarked'].map({'S' : 0, 'C': 1, 'Q': 2})

In [None]:
df.head()

In [None]:
test_df = pd.read_csv('../input/titanic/test.csv')
test_df

In [None]:
## Same stepsCategorical for test data
## Dropping Unnecessary Columns
drop_cols = ['PassengerId','Cabin', 'Ticket', 'Name']
test_df.drop(drop_cols, axis = 1, inplace = True)


#### Handling Missing Values

## Fill missing AGE and FARE with Median
test_df['Age'].fillna(test_df['Age'].median(), inplace=True)
test_df['Fare'].fillna(test_df['Fare'].median(), inplace=True)

## Fill missing EMBARKED with Mode
test_df['Embarked'].fillna(test_df['Embarked'].mode()[0], inplace=True)

In [None]:
test_df['Sex'] = test_df['Sex'].apply(lambda x: 1 if x == 'male' else 0)
test_df['Embarked'] = test_df['Embarked'].map({'S' : 0, 'C': 1, 'Q': 2})

In [None]:
test_df.sample(5)

In [None]:
test_df.isna().any()

<a id="model-building"></a>
## <center style="color:#0178BD; font-family:cursive;"> 4.2 Model Building</center>

In [None]:
## X and y

X = df.drop('Survived', axis = 1)
y = df['Survived']

In [None]:
## Train Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 42)

<a id="logistic-regression"></a>
<span style="font-size:20px; color:#368DC5; font-family:cursive;"> 4.2.1 Logistic Regression </span>

In [None]:
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
lr.fit(X_train, y_train)


from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

lr_acc = accuracy_score(y_test, lr.predict(X_test))

print(f"Training Accuracy of Logistic Regression is {accuracy_score(y_train, lr.predict(X_train))}")
print(f"Test Accuracy of Logistic Regression is {lr_acc}")

print(f"Confusion Matrix :- \n {confusion_matrix(y_test, lr.predict(X_test))}")
print(f"Classofocation Report : -\n {classification_report(y_test, lr.predict(X_test))}")

<a id="decision-tree"></a>
<span style="font-size:20px; color:#368DC5; font-family:cursive;"> 4.2.2 Decision Tree Classifier </span>

In [None]:
from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier()
dtc.fit(X_train, y_train)

dtc_acc = accuracy_score(y_test, dtc.predict(X_test))

print(f"Training Accuracy of Decision Tree Classifier is {accuracy_score(y_train, dtc.predict(X_train))}")
print(f"Test Accuracy of Decision Tree Classifier is {dtc_acc} \n")

print(f"Confusion Matrix :- \n{confusion_matrix(y_test, dtc.predict(X_test))}\n")
print(f"Classification Report :- \n {classification_report(y_test, dtc.predict(X_test))}")

<a id="random-forest"></a>
<span style="font-size:20px; color:#368DC5; font-family:cursive;"> 4.2.3 Random Forest Classifier</span>

In [None]:
from sklearn.ensemble import RandomForestClassifier

rd_clf = RandomForestClassifier()
rd_clf.fit(X_train, y_train)

rd_clf_acc = accuracy_score(y_test, rd_clf.predict(X_test))

print(f"Training Accuracy of Random Forest Classifier is {accuracy_score(y_train, rd_clf.predict(X_train))}")
print(f"Test Accuracy of Random Forest Classifier is {rd_clf_acc} \n")

print(f"Confusion Matrix :- \n{confusion_matrix(y_test, rd_clf.predict(X_test))}\n")
print(f"Classification Report :- \n {classification_report(y_test, rd_clf.predict(X_test))}")

<a id="gradient-boost"></a>
<span style="font-size:20px; color:#368DC5; font-family:cursive;"> 4.2.4 Gradient Boosting Classifier</span>

In [None]:
from sklearn.ensemble import GradientBoostingClassifier

gb = GradientBoostingClassifier()
gb.fit(X_train, y_train)

gb_acc = accuracy_score(y_test, gb.predict(X_test))

print(f"Training Accuracy of Gradient Boosting Classifier is {accuracy_score(y_train, gb.predict(X_train))}")
print(f"Test Accuracy of Gradient Boosting Classifier is {gb_acc} \n")

print(f"Confusion Matrix :- \n{confusion_matrix(y_test, gb.predict(X_test))}\n")
print(f"Classification Report :- \n {classification_report(y_test, gb.predict(X_test))}")

<a id="xg-boost"></a>
<span style="font-size:20px; color:#368DC5; font-family:cursive;"> 4.2.5 XG Boost Classifier</span>

In [None]:
from xgboost import XGBClassifier

xgb = XGBClassifier(booster = 'gbtree', learning_rate = 0.1, max_depth = 5, n_estimators = 180)
xgb.fit(X_train, y_train)

xgb_acc = accuracy_score(y_test, xgb.predict(X_test))

print(f"Training Accuracy of XgBoost is {accuracy_score(y_train, xgb.predict(X_train))}")
print(f"Test Accuracy of XgBoost is {xgb_acc} \n")

print(f"Confusion Matrix :- \n{confusion_matrix(y_test, xgb.predict(X_test))}\n")
print(f"Classification Report :- \n {classification_report(y_test, xgb.predict(X_test))}")

<a id="comparing-models"></a>
## <center style="color:#0178BD; font-family:cursive;"> 4.3 Comparing Models</center>

In [None]:
models = pd.DataFrame({
    'Model' : ['Logistic Regression', 'Decision Tree Classifier', 'Random Forest Classifier',
             'Gradient Boosting Classifier', 'XgBoost'],
    'Score' : [lr_acc, dtc_acc, rd_clf_acc, gb_acc, xgb_acc]
})


models.sort_values(by = 'Score', ascending = False)

In [None]:
plt.figure(figsize = (10, 5))
sns.set_style('darkgrid')

sns.barplot(x = 'Score', y = 'Model', data = models)
plt.show()

In [None]:
xgb_prediction = xgb.predict(test_df)
prediction = pd.DataFrame(xgb_prediction)
submission = pd.read_csv('../input/titanic/gender_submission.csv')
submission['Survived'] = prediction
submission.to_csv('Submission.csv', index = False)

<a id="acknowledgements"></a>
# <center style="color:#0178BD; font-family:cursive;"> Acknowledgements</center>

EDA Inspiration from:

https://www.kaggle.com/code/namanmanchanda/pima-indian-diabetes-eda-and-prediction/notebook

https://www.kaggle.com/code/shubhamksingh/create-beautiful-notebooks-formatting-tutorial/notebook



<a id="the-end"></a>
# <center style="color:#0178BD; font-family:cursive;"> THE END</center>