# Task 3: Neural Networks

Multi-class Classification: Your goal is to predict a discrete value y (0, 1, 2, 3 or 4) based on a vector x.

Potential approaches / tools to consider: Neural networks / Deep Learning (Theano, TensorFlow, Torch, Lasagne)

In [112]:
import tensorflow 
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD
from keras.callbacks import History, Callback
from keras.utils import np_utils, to_categorical
from sklearn import model_selection
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

%matplotlib inline

#### Data Import

In [136]:
train = pd.read_hdf("data/train.h5", "train")
train_labels = train['y'].as_matrix()
train_data = train.ix[:, 1:].astype(float).as_matrix()
test_data = pd.read_hdf("data/test.h5", "test").as_matrix()

#### Split Data into Train and Validation Set 

In [137]:
X_train, X_val, y_train, y_val = model_selection.train_test_split(train_data, train_labels, 
                                                                    test_size=0.33, random_state=42)

In [138]:
y_train.shape

(30367,)

#### Convert Labels

In [139]:
labels_cat = to_categorical(train_labels, num_classes=5)
y_train_cat = to_categorical(y_train, num_classes=5)
y_val_cat = to_categorical(y_val, num_classes=5)

### Neural Network Model: Baseline Model

The function below creates a baseline neural network, a simple, fully connected network with one hidden layer that contains 100 neurons. The hidden layer uses a rectifier activation function which is a good practice. The output value with the largest value will be taken as the class predicted by the model.

The **network topology** can be summarised by: 
*100 inputs -> [100 hidden nodes] -> 5 outputs* 

In [140]:
model = Sequential()
model.add(Dense(64, input_dim=100, kernel_initializer='normal', 
                activation='relu'))
model.add(Dense(5, kernel_initializer='normal', activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])

In [141]:
model.fit(train_data, labels_cat, epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fb7d0759438>

In [142]:
model.predict(test_data, batch_size=64)

array([[  9.62454021e-01,   2.68303696e-03,   4.30534966e-03,
          6.78525481e-04,   2.98790243e-02],
       [  6.39653355e-02,   7.73741817e-03,   1.56437233e-01,
          1.07634207e-02,   7.61096597e-01],
       [  6.65887725e-04,   9.10675408e-06,   4.07338934e-03,
          2.61279172e-04,   9.94990289e-01],
       ..., 
       [  3.93874245e-03,   4.11773226e-06,   2.94927275e-04,
          9.95600462e-01,   1.61666161e-04],
       [  1.50187657e-06,   1.59144520e-06,   9.98832643e-01,
          1.16423471e-03,   3.89841981e-09],
       [  8.94114899e-04,   9.49363597e-03,   4.99720842e-01,
          3.02307599e-04,   4.89589125e-01]], dtype=float32)

### Multilayer Perceptron (MLP) for multi-class softmax classification

In [143]:
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.

model.add(Dense(64, activation='relu', input_dim=100))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

model.fit(X_train, y_train_cat, epochs=20, batch_size=128)
# score = model.evaluate(X_test, y_test_cat, batch_size=128)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7fb7f03aeb38>

In [144]:
score

[0.52640881792718075, 0.8274386573134318]

### Theory and Model Design

**Activation function** of a node defines the output of that node given an input or set of inputs.
- softmax
- elu
- softpuls
- softsign
- relu
- tanh
- sigmoid
- hard_sigmoid
- linear

Fully connected layers are defined using the *Dense* class. We can specify the number of neurons in the layer as the first argument, the initialization method as the second argument as *init* and specify the activation function using the activation argument.

In [145]:
model = Sequential()
model.add(Dense(32, input_dim=100, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [148]:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

In [151]:
history = History()
model.fit(X_train, y_train_cat, epochs=20, batch_size=128)


ValueError: Error when checking model target: expected dense_54 to have shape (None, 1) but got array with shape (30367, 5)

In [150]:
plt.plot(history.epoch, history.history["acc"])

AttributeError: 'History' object has no attribute 'epoch'

In [50]:
scores = model.evaluate(X_train, y_train_cat)



'categorical_crossentropy'

In [41]:
predictions = model.predict(X_test)

In [45]:
rounded = [np.argmax(x) for x in predictions]

14957

In [47]:
from sklearn.metrics import accuracy_score
acc = accuracy_score(y_test, rounded)

In [48]:
acc

0.92705756501972325