### Multi Layer Perceptron in KERAS

Simple MLP for MNIST image data prediction to showcase Keras basic functionality

In [64]:
import os
import struct
import numpy as np
 
def load_mnist(path, kind='train'):
    """Load MNIST data from `path`"""
    labels_path = os.path.join(path, 
                               '%s-labels-idx1-ubyte' % kind)
    images_path = os.path.join(path, 
                               '%s-images-idx3-ubyte' % kind)
        
    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II', 
                                 lbpath.read(8))
        labels = np.fromfile(lbpath, 
                             dtype=np.uint8)

    with open(images_path, 'rb') as imgpath:
        magic, num, rows, cols = struct.unpack(">IIII", 
                                               imgpath.read(16))
        images = np.fromfile(imgpath, 
                             dtype=np.uint8).reshape(len(labels), 784)
        images = ((images / 255.) - .5) * 2
 
    return images, labels

In [65]:
X_train, y_train = load_mnist('image_datasets', kind='train')
print('Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1]))

Rows: 60000, columns: 784


In [66]:
X_test, y_test = load_mnist('image_datasets', kind='t10k')
print('Rows: %d, columns: %d' % (X_test.shape[0], X_test.shape[1]))

Rows: 10000, columns: 784


Data transformations

In [67]:
#Mean centering:
mean_vals = np.mean(X_train, axis=0)
std_val = np.std(X_train)

In [68]:
X_train_centered = (X_train - mean_vals)/std_val
X_test_centered = (X_test - mean_vals)/std_val

In [69]:
del X_train, X_test

In [70]:
print(X_train_centered.shape, y_train.shape)
print(X_test_centered.shape, y_test.shape)

(60000, 784) (60000,)
(10000, 784) (10000,)


In [71]:
import tensorflow as tf
import tensorflow.contrib.keras as keras

In [72]:
np.random.seed(1)
tf.set_random_seed(1)

In [73]:
#One hot encoding target feature i.e. digit class 1-9:
y_train_onehot = keras.utils.to_categorical(y_train)
print('First 3 labels: ', y_train[:3])
print('\nFirst 3 labels: (one-hot):\n', y_train_onehot[:3])

First 3 labels:  [5 0 4]

First 3 labels: (one-hot):
 [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]


Model Creation

In [74]:
#To create a model one can easily initialize the Sequential object:
model = keras.models.Sequential()

In [75]:
#Adding INPUT layer - NB input_dim is equal to the shape of the input dataset
model.add(
    keras.layers.Dense(
        units=50,
        input_dim=X_train_centered.shape[1], 
        kernel_initializer='glorot_uniform', 
        bias_initializer='zeros', 
        activation='tanh'))

In [76]:
#Adding a hidden layer - NB the input_dim is equal to the number of units from the previous layer
model.add(
    keras.layers.Dense(
        units=50,
        input_dim=50, #changed to 50
        kernel_initializer='glorot_uniform', 
        bias_initializer='zeros', 
        activation='tanh'))

In [77]:
#Adding the output layer - NB the number of units is equal to the number of class categories 
#in the one hot encoded y vector
model.add(
    keras.layers.Dense(
        units=y_train_onehot.shape[1], #same number 
        input_dim=50, #Same number as units in previous layer
        kernel_initializer='glorot_uniform', 
        bias_initializer='zeros', 
        activation='softmax'))

In [78]:
#Finally, setting the optimizer with SGD
sgd_optimizer = keras.optimizers.SGD(lr=0.001, decay=1e-7, momentum=0.9)

In [79]:
#Last step: compiling the model:
model.compile(optimizer=sgd_optimizer, loss='categorical_crossentropy')

The categorical cross entropy is the cost function for logistic regression generalized for multiclass problems

In [80]:
#Fitting the model:
history = model.fit(X_train_centered, y_train_onehot, batch_size=100, epochs=50, verbose=1, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


The model is converging to a training loss of 0.07 and a validation loss of 0.11. Let's see if we can do better with different hyperparameters

In [81]:
#Fitting the model:
history = model.fit(X_train_centered, y_train_onehot, batch_size=50, epochs=60, verbose=1, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


The model above reached minimum validation loss around the 39th epoch. At that stage, it reached a training loss of 0.02. We are overfitting a little bit. Most importantly, it seems that the validation loss is unchanged from the previous model. 

Further hyperparameter tuning would be needed to finalize the model. We could change values for momentum, weight decay, number of hidden units, and learning rage. Now we create some model evaluation metrics.

In [82]:
y_train_pred = model.predict_classes(X_train_centered, verbose=0)

In [83]:
correct_preds = np.sum(y_train == y_train_pred, axis=0)

In [84]:
train_acc = correct_preds / y_train.shape[0]

In [85]:
print('First 3 predictions: ', y_train_pred[:3])

First 3 predictions:  [5 0 4]


In [86]:
print('Training accuracy: {}%'.format(train_acc*100))

Training accuracy: 99.605%


In [87]:
y_test_pred = model.predict_classes(X_test_centered, verbose=0)

In [88]:
correct_preds = np.sum(y_test == y_test_pred, axis=0)

In [89]:
test_acc = correct_preds / y_test.shape[0]

In [90]:
print('Test Accuracy: %.2f%%' %(test_acc*100))

Test Accuracy: 96.21%
