# Titanic

**The Challenge**

The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).


In [43]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the data
df = pd.read_csv('./data/train.csv')

data = df.drop(['Name', 'Ticket', 'Cabin', 'PassengerId', 'Fare', 'Cabin', 'Embarked', 'Ticket'], axis=1)

# Filling missing values in Age column
data.fillna({'Age': data['Age'].mean()}, inplace=True)

# Changing Sex column to binary, male=0 and female=1
data['Sex'] = data['Sex'].map({'male':0, 'female':1})

# Creating a new column FamilySize
data['FamilySize'] = data['SibSp'] + data['Parch']

# Creating a new column IsAlone
data['IsAlone'] = 0
data.loc[data['FamilySize'] == 0, 'IsAlone'] = 1

# scaling the data
# scaler = StandardScaler()
# data[['Age']] = scaler.fit_transform(data[['Age']])

# Splitting the data and target
X = data.drop('Survived', axis=1)
y = data['Survived']

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train.head(10)

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,FamilySize,IsAlone
331,1,0,45.5,0,0,0,1
733,2,0,23.0,0,0,0,1
382,3,0,32.0,0,0,0,1
704,3,0,26.0,1,0,1,0
813,3,1,6.0,4,2,6,0
118,1,0,24.0,0,1,1,0
536,1,0,45.0,0,0,0,1
361,2,0,29.0,1,0,1,0
29,3,0,29.699118,0,0,0,1
55,1,0,29.699118,0,0,0,1


## Linear Regression Model

In [44]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
from sklearn.metrics import confusion_matrix

# Create a logistic regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

# Predict the target values
y_pred = model.predict(X_test)

# Classification report
cr = classification_report(y_test, y_pred)
print(cr)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(f'\nConfusion matrix:\n {cm}')


              precision    recall  f1-score   support

           0       0.81      0.88      0.84       105
           1       0.80      0.72      0.76        74

    accuracy                           0.81       179
   macro avg       0.81      0.80      0.80       179
weighted avg       0.81      0.81      0.81       179

Accuracy: 0.8100558659217877

Confusion matrix:
 [[92 13]
 [21 53]]


## Random Forest Classifier

In [45]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
from sklearn.metrics import confusion_matrix

# Create a random forest model
rf_model = RandomForestClassifier()

# Train the model
rf_model.fit(X_train, y_train)

# Predict the target values
y_pred = rf_model.predict(X_test)


cr = classification_report(y_test, y_pred)
print(cr)

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')


cm = confusion_matrix(y_test, y_pred)
print(f'\nConfusion matrix:\n {cm}')


              precision    recall  f1-score   support

           0       0.83      0.87      0.85       105
           1       0.80      0.76      0.78        74

    accuracy                           0.82       179
   macro avg       0.82      0.81      0.81       179
weighted avg       0.82      0.82      0.82       179

Accuracy: 0.8212290502793296

Confusion matrix:
 [[91 14]
 [18 56]]


## SVC Linear

In [46]:
from sklearn.svm import LinearSVC

# Create a support vector machine model

svm_model = LinearSVC()

# Train the model
svm_model.fit(X_train, y_train)

# Predict the target values
y_pred = svm_model.predict(X_test)

# Classification report
cr = classification_report(y_test, y_pred)
print(cr)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(f'\nConfusion matrix:\n {cm}')


              precision    recall  f1-score   support

           0       0.81      0.88      0.84       105
           1       0.80      0.70      0.75        74

    accuracy                           0.80       179
   macro avg       0.80      0.79      0.79       179
weighted avg       0.80      0.80      0.80       179

Accuracy: 0.8044692737430168

Confusion matrix:
 [[92 13]
 [22 52]]


## XGBoost

In [47]:
import xgboost as xgb

xgb_model = xgb.XGBClassifier()

# Train the model
xgb_model.fit(X_train, y_train)

# Predict the target values
y_pred = xgb_model.predict(X_test)

# Print the classification report
cr = classification_report(y_test, y_pred)
print(cr)

# Print the accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# Print the confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(f'\nConfusion matrix:\n {cm}')


              precision    recall  f1-score   support

           0       0.82      0.89      0.85       105
           1       0.82      0.73      0.77        74

    accuracy                           0.82       179
   macro avg       0.82      0.81      0.81       179
weighted avg       0.82      0.82      0.82       179

Accuracy: 0.8212290502793296

Confusion matrix:
 [[93 12]
 [20 54]]
