<a href="https://colab.research.google.com/github/langegang/hello-world/blob/master/hello.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Survivability on the Titanic

## Prepare data

Import modules and Titanic dataset

In [None]:
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv')

In [None]:
data.replace('?', np.nan, inplace= True)
data = data.astype({"age": np.float64, "fare": np.float64})

Relating data to **Survivability**

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

fig, axs = plt.subplots(ncols=5, figsize=(30,5))
sns.violinplot(x="survived", y="age", hue="sex", data=data, ax=axs[0])
sns.pointplot(x="sibsp", y="survived", hue="sex", data=data, ax=axs[1])
sns.pointplot(x="parch", y="survived", hue="sex", data=data, ax=axs[2])
sns.pointplot(x="pclass", y="survived", hue="sex", data=data, ax=axs[3])
sns.violinplot(x="survived", y="fare", hue="sex", data=data, ax=axs[4])

Calculate correlations

In [None]:
data.replace({'male': 1, 'female': 0}, inplace= True)

In [None]:
data.corr().abs()[["survived"]]

Combine sibsp (siblings, spouse) and parch (parents, children) into one 'relatives' column, and check correlation again

In [None]:
data['relatives'] = data.apply (lambda row: int((row['sibsp'] + row['parch']) > 0), axis=1)
data.corr().abs()[["survived"]]

In [None]:
data = data[['sex','pclass','age','relatives','fare','survived']].dropna()

## Train and evaluate a model

Divide dataset into training data and testing data and normalize

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(data[['sex','pclass','age','relatives','fare']], data.survived, test_size=0.2, random_state=0)

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(x_train)
X_test = sc.fit_transform(x_test)

Create and train Naïve Bayes algorithm

In [None]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)

Try trained model against test data (output accuracy)

In [None]:
from sklearn import metrics
predict_test = model.predict(X_test)
print(metrics.accuracy_score(y_test, predict_test))

## Use a neural network to improve accuracy

In [None]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()

Add the layers of the neural network

- The first layer will be set to have a dimension of 5: sex, pclass, age, relatives, and fare.
- The last layer must output 1, since a 1-dimensional output will indicate whether a passenger would survive.
- The middle layer was kept at 5 for simplicity, although that value could have been different?
>Wtf does this stuff mean




In [None]:
model.add(Dense(5, kernel_initializer= 'uniform', activation= 'relu', input_dim = 5))
model.add(Dense(5, kernel_initializer= 'uniform', activation= 'relu'))
model.add(Dense(1, kernel_initializer= 'uniform', activation= 'sigmoid'))

In [None]:
model.summary()

Compile and train the new model

In [None]:
model.compile(optimizer= "adam", loss= 'binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32, epochs=50)

In [None]:
y_pred = model.predict_classes(X_test)
print(metrics.accuracy_score(y_test, y_pred))