# Section 1-1 - Basic Neural Network - Titanic

In the previous section, we simply iterated through randomly-generated matrices and chose the best-performing one. We build on this approach by reducing loss in a systematic way via stochastic gradient descent. In particular, we'll be using TensorFlow, an open source library developed by Google, and Keras, a high-level wrapper on top of TensorFlow.

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from time import time

np.random.seed(1337)

df = pd.read_csv('data/titanic.csv')

In [2]:
df_train = df.iloc[:712, :]

df_train = df_train.drop(['Name', 'Ticket', 'Cabin', 'Embarked'], axis=1)

age_mean = df_train['Age'].mean()
df_train['Age'] = df_train['Age'].fillna(age_mean)
df_train['Sex'] = df_train['Sex'].map({'female': 0, 'male': 1}).astype(int)

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']
scaler = StandardScaler()

X_train = scaler.fit_transform(df_train[features].values)
y_train = df_train['Survived'].values
y_train_onehot = pd.get_dummies(df_train['Survived']).values

In [3]:
df_test = df.iloc[712:, :]

df_test = df_test.drop(['Name', 'Ticket', 'Cabin', 'Embarked'], axis=1)

df_test['Age'] = df_test['Age'].fillna(age_mean)
df_test['Sex'] = df_test['Sex'].map({'female': 0, 'male': 1}).astype(int)

X_test = scaler.transform(df_test[features].values)
y_test = df_test['Survived'].values

## Benchmark

In [4]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=0, verbose=3)
model = model.fit(X_train, df_train['Survived'].values)

y_prediction = model.predict(X_test)
print "\naccuracy", np.sum(y_prediction == y_test) / float(len(y_test))

building tree 1 of 10
building tree 2 of 10
building tree 3 of 10
building tree 4 of 10
building tree 5 of 10
building tree 6 of 10
building tree 7 of 10
building tree 8 of 10
building tree 9 of 10
building tree 10 of 10

accuracy 0.832402234637


[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    0.0s finished
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:    0.0s finished


## 1-layer Neural Network

Instead of generating a linear stack of layers with Numpy, we'll be implementing our model using Keras. We initialize our model, add a layer that inputs vectors of length 6 and outputs vectors of length 2, and finally add a softmax layer. We configure the learning process in the compilation step by specifying the optimizer, loss function and performance metric.

Stochastic gradient descent acts by changing the weights gradually in the 'direction' that decreases the average loss. In other words, a particular weight would be increased if acts to decrease loss, or the weight decreased if it acts to increase loss. TensorFlow does the heavy-lifting by efficiently handling these numerical computations under the hood. A simple example of gradient descent is illustrated in the Appendix.

In [5]:
from keras.models import Sequential
from keras.layers import Dense, Activation

start = time()

model = Sequential()
model.add(Dense(input_dim=6, output_dim=2))
model.add(Activation("softmax"))

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

model.fit(X_train, y_train_onehot)

print '\ntime taken %s seconds' % str(time() - start)

Using TensorFlow backend.


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

time taken 0.753010988235 seconds


In [6]:
y_prediction = model.predict_classes(X_test)
print "\n\naccuracy", np.sum(y_prediction == y_test) / float(len(y_test))

 32/179 [====>.........................] - ETA: 0s

accuracy 0.826815642458


We notice that the loss reduces systematically as the model 'learns' from the data. The rate of loss reduction, however, seems to indicate that loss could be further reduced.

## 2-layer Neural Network

In [7]:
start = time()

model = Sequential()
model.add(Dense(input_dim=6, output_dim=100))
model.add(Dense(output_dim=2))
model.add(Activation("softmax"))

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

model.fit(X_train, y_train_onehot)

print '\ntime taken %s seconds' % str(time() - start)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

time taken 0.86990904808 seconds


In [8]:
y_prediction = model.predict_classes(X_test)
print "\n\naccuracy", np.sum(y_prediction == y_test) / float(len(y_test))

 32/179 [====>.........................] - ETA: 0s

accuracy 0.826815642458


The loss reduction 'flattens out' more compared to the 1-layer example. The accuracy, however, remains at 82.7%.

## 3-layer Neural Network

In [9]:
start = time()

model = Sequential()
model.add(Dense(input_dim=6, output_dim=100))
model.add(Dense(output_dim=100))
model.add(Dense(output_dim=2))
model.add(Activation("softmax"))

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

model.fit(X_train, y_train_onehot)

print '\ntime taken %s seconds' % str(time() - start)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

time taken 1.18338799477 seconds


In [10]:
y_prediction = model.predict_classes(X_test)
print "\n\naccuracy", np.sum(y_prediction == y_test) / float(len(y_test))

 32/179 [====>.........................] - ETA: 0s

accuracy 0.826815642458


While we're able to reduce loss on the training set a little further, the accuracy in all three cases is 82.7%. This is due to the fact that the dataset is small, and hence not much for the model to 'learn' from (or for that matter, predict on). We'll apply techniques developed so far on a much larger dataset in the next section.