## Network Architecture of EMNIST Dataset

In [1]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
from keras.callbacks import callbacks
from keras import regularizers
from scipy import io as sio

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
batch_size = 128
num_classes = 26
epochs = 20

In the block below, the EMINST dataset is loaded and assigned as to variables which indicate training and test sets.

In [3]:
mat = sio.loadmat('emnist-letters.mat')
data = mat['dataset']
x_train = data['train'][0,0]['images'][0,0]
y_train = data['train'][0,0]['labels'][0,0]
x_test = data['test'][0,0]['images'][0,0]
y_test = data['test'][0,0]['labels'][0,0]

Data is reshaped to 1-dimensional 784 array, converted to 'float32' precision, and divided by the maximum value of a byte to ensure that the input features are scaled between 0.0 and 1.0.
This ensures that the default learning rate (and other hyperparameters) work reasonably well, and the cost can take reasonable (unscaled) values.

In [4]:
x_train = x_train.reshape(124800, 784)
x_test = x_test.reshape(20800, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

Convert a class vector (of integers) to a binary class matrix, which has the same number of columns as the classes. Number of rows stays the same. Output is used with the categorical crossentropy loss function .

In [5]:
y_train = keras.utils.to_categorical(y_train, num_classes+1)
y_test = keras.utils.to_categorical(y_test, num_classes+1)

Create training and validation sets after binary classification

In [6]:
val_start = x_train.shape[0] - x_test.shape[0]
x_val = x_train[val_start:x_train.shape[0],:]
y_val = y_train[val_start:x_train.shape[0]]
x_train = x_train[0:val_start,:]
y_train = y_train[0:val_start]

Early stopping parameters:-
+ `monitor` will end training upon validation loss (given by 'val_loss'); when performance measure stops improving
+ `mode` is automatically inferred from the name of the monitored quantity (given by 'monitor'); It is min for 'val_loss', max for 'val_acc'
+ `verbose` will show the epoch in which training stopped
+ `patience` is the number of epochs in which there should be improvement
+ `min_delta` specifies improvement that is a specific increment
+ `baseline` is the value the monitored quantity has to reach; Training will stop if the model doesn't show improvement over the baseline

In [7]:
early_stopping = callbacks.EarlyStopping(monitor='val_loss', 
                                         mode='auto', 
                                         verbose=1,
                                         patience=5, 
                                         min_delta=0.05, 
                                         baseline=None)

Create a linear stack of three layers - one input layer, one hidden layer, and one output layer.
In the hidden layer L2 regularization is used with 2048 neurons. (Comparison done in the **Observation** section below)

In [8]:
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(2048, activation='relu',
                kernel_regularizer=regularizers.l2(0.001)))
model.add(Dropout(0.2))
model.add(Dense(num_classes+1, activation='softmax'))

In [9]:
model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_val, y_val),
                    callbacks=[early_stopping])

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 2048)              1050624   
_________________________________________________________________
dropout_2 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 27)                55323     
Total params: 1,507,867
Trainable params: 1,507,867
Non-trainable params: 0
_________________________________________________________________

Train on 104000 samples, validate on 20800 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epo

In [10]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.4151294681659112
Test accuracy: 0.9102884531021118


## Observations

+ Running `mnist_mlp.py` produces an accuracy ranging from `98.2` to `98.4`, which is very close to the accuracy given on the github repository. 

+ Highest accuracy using the validation set was obtained with ReLU activation - marginally above 90%. (Validation set was used to run the program multiple times for verification.)

+ Fitting validation sets in the model's `validation_data`:

| Hidden Layer Activation | Accuracy | Loss | Epochs
| --- | --- | --- | --- |
| ReLU | 90.53% | 42.06% | 9 (early stopping)
| Softmax | 85.28% | 74.45% | 20
| tanh | 89.95% | 41.35% | 10 (early stopping)
| Sigmoid | 90.31% | 41.09% | 16 (early stopping)

+ Fitting test sets in the model's `validation_data`:

| Hidden Layer Activation | Accuracy | Loss | Epochs
| --- | --- | --- | --- |
| ReLU | 90.08% | 44.09% | 11 (early stopping)
| Softmax | 85.78% | 75.24% | 20
| tanh | 90.20% | 39.34% | 13 (early stopping)
| Sigmoid | 89.82% | 42.75% | 13 (early stopping)