<a href="https://colab.research.google.com/github/nallagondu/datatrained_inter_public/blob/main/Titanic_Disaster_servived.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Project Description**
The Titanic Problem is based on the sinking of the ‘Unsinkable’ ship Titanic in early 1912. It gives you information about multiple people like their ages, sexes, sibling counts, embarkment points, and whether or not they survived the disaster.
Based on these features, you have to predict if an arbitrary passenger on Titanic would survive the sinking or not.

##Attribute Information
**Passenger id**- Unique Id of the passenger

**Pclass**- Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)

**Survived**- Survived (0 = No; 1 = Yes)

**Name**- Name of the passenger

**Sex**- Sex of the passenger (Male, Female)

**Age-** Age of the passenger

**Sibsp-** Number of Siblings/Spouses Aboard

**Parch**- Number of Parents/Children Aboard

**Ticket**- Ticket Number

**Fare**- Passenger Fare (British pound)

**Cabin**- Cabin

**Embarked**- Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)



Dataset Link-
https://github.com/FlipRoboTechnologies/ML-Datasets/blob/main/Titanic/titanic_train.csv



https://github.com/nallagondu/ML-Datasets/blob/main/Titanic/titanic_train.csv



In [None]:
!pip install -q requests xlrd
import pandas as pd
import requests
import numpy as np


In [None]:
# Corrected URL to the CSV file


url = "https://raw.githubusercontent.com/nallagondu/ML-Datasets/main/Titanic/titanic_train.csv"
df = pd.read_csv(url)
df


In [None]:
df.head(10)

**Prepare Dataset**

In [None]:
def preprocess(df):
  df = df.copy()

def norm_name(x):
    return " ".join([v.strip(",()[].\"'") for v in x.split(" ")])

def ticket_no(x):
    return x.split(" ")[-1]

def ticket_item(x):
    items = x.split(" ")
    if len(items) == 1:
        return "NONE"
    return "_".join(items[0:-1])


df['Name'] = df["Name"].apply(norm_name)

def preprocess(df):
    df['ticket_no'] = df['Ticket'].apply(ticket_no)
    df['ticket_item'] = df['Ticket'].apply(ticket_item)
    return df

preprocesseddata_df = preprocess(df)

preprocesseddata_df.head(20)



In [None]:
preprocesseddata_df.columns


In [None]:
features_df = list(preprocesseddata_df.columns)

In [None]:
preprocesseddata_df = preprocesseddata_df.drop('Ticket', axis=1)
#preprocesseddata_df = preprocesseddata_df.drop("PassengerId",axis=1)
preprocesseddata_df

In [None]:
preprocesseddata_df.isnull().sum()

In [None]:
preprocesseddata_df['Age'].fillna(preprocesseddata_df['Age'].median(), inplace=True)

In [None]:
preprocesseddata_df['Cabin'].fillna('Unknown', inplace=True)

In [None]:
preprocesseddata_df['Embarked'].fillna(preprocesseddata_df['Embarked'].mode()[0], inplace=True)

In [None]:
preprocesseddata_df.isnull().sum()

In [None]:
df = preprocesseddata_df.copy()
df

In [None]:
df['Age'].mean()

In [None]:
df['Age'].sum()

In [None]:
df['Survived'].count()

In [None]:
#Convert Sex into Numericla vlaues
# male  = 1 and female = 0

df['Sex'] = df['Sex'].map({'male': 1, 'female': 0 })


In [None]:
df.describe()

In [None]:
df['Age'].describe()

In [None]:
df.isnull()

In [None]:
df.dtypes

In [None]:
non_numeric = df[~df['ticket_no'].str.isnumeric()]
non_numeric

In [None]:
df = df[df['ticket_no'].str.isnumeric()]
df.head(10)

**All non numeric rows values removed  **

In [None]:
df.dtypes

In [None]:
df['ticket_no'] = df['ticket_no'].astype(int)


In [None]:
df.loc[:, 'ticket_no'] = pd.to_numeric(df['ticket_no'], errors = 'coerce').fillna(0).astype(int)

In [None]:
df.dtypes

In [None]:
df.head()

In [None]:
df.duplicated().any()

In [None]:
df.describe()

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(14,10))
qty = df['Survived'].value_counts()
sns.barplot(x=qty.index,y=qty.values,order=qty.index,palette='Dark2')
plt.title("Feature Distrubutions",fontsize = 14)
for index,value in enumerate(qty.values):
  plt.text(index,value, value, fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
sns.barplot(x="Sex", y="Survived", data= df)
df["Survived"][df["Sex"] == 0].value_counts(normalize = True)[1]*100
df["Survived"][df["Sex"] == 1].value_counts(normalize = True)[1]*100

In [None]:
sns.barplot(x="Sex", y="Age", data= df)
df["Survived"][df["Sex"] == 0].value_counts(normalize = True)[1]*100
df["Survived"][df["Sex"] == 1].value_counts(normalize = True)[1]*100

In [None]:
# Pclass based feature
sns.barplot(x="Pclass", y="Survived", data= df)
df["Survived"][df["Pclass"]].value_counts(normalize = True)[1]*100


In [None]:
df.head(10)

In [None]:
#SibSp  Feature
sns.barplot(x="SibSp", y="Survived", data= df)
df["Survived"][df["SibSp"] == 1].value_counts(normalize = True)[1]*100
df["Survived"][df["SibSp"] == 2].value_counts(normalize = True)[1]*100
df["Survived"][df["SibSp"] == 3].value_counts(normalize = True)[1]*100

In [None]:
#Parch  Feature
sns.barplot(x="Parch", y="Survived", data= df)
df["Survived"][df["Parch"] == 1].value_counts(normalize = True)[1]*100


In [None]:
df.describe()

In [None]:
facet_grid = sns.FacetGrid(df, hue = 'Survived', aspect = 4)
facet_grid.map(sns.kdeplot,'Age',fill = True)
facet_grid.set(xlim=(0,df['Age'].max()))
facet_grid.add_legend()
plt.show()

In [None]:
facet_grid = sns.FacetGrid(df, hue = 'Survived', aspect = 4)
facet_grid.map(sns.kdeplot,'Age',fill = True)
facet_grid.set(xlim=(0,df['Age'].max()))
facet_grid.add_legend()
plt.xlim(10,70)

In [None]:
facet_grid = sns.FacetGrid(df, hue = 'Survived', aspect = 4)
facet_grid.map(sns.kdeplot,'Age',fill = True)
facet_grid.set(xlim=(0,df['Age'].max()))
facet_grid.add_legend()
plt.xlim(5,70)

In [None]:
facet_grid = sns.FacetGrid(df, hue = 'Survived', aspect = 4)
facet_grid.map(sns.kdeplot,'Age',fill = True)
facet_grid.set(xlim=(0,df['Age'].max()))
facet_grid.add_legend()
plt.xlim(10,50)

In [None]:
df.info()

In [None]:
Women = df.loc[df.Sex == 0]['Survived']
womens_Sur_rate = sum(Women)/len(Women)
print(" % of womens who survived :- ",womens_Sur_rate)


In [None]:
Mens = df.loc[df.Sex == 1]['Survived']
Mens_Sur_rate = sum(Mens)/len(Mens)
print("% of Men Survived :-",Mens_Sur_rate)

**% of womens who survived  0.7420382165605095 and % of Men Survived :- 0.18848167539267016**



In [None]:
#modeling
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier,ExtraTreeClassifier
from sklearn.ensemble import RandomForestClassifier,ExtraTreesClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split



In [None]:
y = df['Survived']
features = ['Pclass','Sex', 'SibSp', 'Parch']
X = df[features]
X_test,X_train,y_test,y_train = train_test_split(X,y, test_size = 0.2,random_state = 43)


In [None]:
RFC_model = RandomForestClassifier(n_estimators=100,max_depth=5,random_state = 1)
RFC_model.fit(X,y)

In [None]:
RFC_Pred = RFC_model.predict(X_test)
output = pd.DataFrame({'PassengerID': df.loc[X_test.index,'PassengerId'], 'Survived': RFC_Pred})
print(output)