I have made several adaptations to the mnist NN code, to improve evaluation and aid with tuning the various parameters

------- FIRST, I have incorporated Stratified Cross-validation -----------------
I have found that repeated runs of any given NN config (layers, params) give quite variable results
It is hence difficult to do any effective comparison between configurations, without doing repeated tests
To achieve this most effectively, I have hence incorporated Stratified Cross-validation

NOTE as I am using cross-validation, I have merged the mnist data into a single data-set (70,000 samples)
7-fold cross-validation is then used to evaluate the model (10,000 test samples per fold)

------- SECOND, I have also parameterised the number of hidden layers -----------

The model below applies 2 hidden layers, each concluding with a 0.2 dropout layer.
Testing various configurations (1 > 4 hidden layers, single final dropout layer etc.) has shown this to be the best performing configuration
This currently uses default hyper-parameters for the Adam optimizer (further hyper-parameter testing to follow)

In [1]:
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.layers import Input, Dense, Activation, Dropout, PReLU

from keras.optimizers import Adam
from keras.models import Model
from keras.utils import np_utils

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold

batch_size = 128
nb_classes = 10
nb_epoch = 20
n_hidden_layers = 2
seed = 7

(X, y), (X_test, y_test) = mnist.load_data()

X = X.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X = np.concatenate((X, X_test), axis=0) #merge into a single sample-set for Cross-validation
X = X.astype('float32')
X /= 255

Y = np.concatenate((y, y_test), axis=0) #merge into a single target-set for Cross-validation
y = np_utils.to_categorical(Y, nb_classes) #convert class vectors to binary class matrices
                                           #(for use by NN model - NOTE class vectors still needed for folding)
inputs = Input(shape=(784,))
x = inputs
for layers in range(n_hidden_layers):
    x = Dense(64)(x)
    x = PReLU()(x) # Non-linearity
    x = Dropout(rate=0.2)(x)

predictions = Dense(nb_classes, activation='softmax')(x)
    
model = Model(inputs=inputs, outputs=predictions)
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.0, amsgrad=False)
model.compile(optimizer='adam',#adam,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

scores = []
fold = 0
k_fold = StratifiedKFold(n_splits=7, shuffle=True, random_state=seed)
for train, test in k_fold.split(X, Y):
    model.fit(X[train], y[train],
                    batch_size=batch_size, nb_epoch=nb_epoch,
                    verbose=0, validation_data=(X[test], y[test]))
    score = model.evaluate(X[test], y[test], verbose=0)
    scores.append(score[1])
    fold += 1
    print('Score - fold', fold, ":", score[1], "--", len(test), "indices")

print('Average score:', np.mean(scores))

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                50240     
_________________________________________________________________
p_re_lu_1 (PReLU)            (None, 64)                64        
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                4160      
_________________________________________________________________
p_re_lu_2 (PReLU)            (None, 64)                64        
_________________________________________________________________
dropout_2 (Dropout)          (None, 64)                0         
__________



Score - fold 1 : 0.9743102758896441 -- 10004 indices
Score - fold 2 : 0.9799040191961608 -- 10002 indices
Score - fold 3 : 0.9865 -- 10000 indices
Score - fold 4 : 0.99 -- 10000 indices
Score - fold 5 : 0.993099309930993 -- 9999 indices
Score - fold 6 : 0.9938993899389938 -- 9999 indices
Score - fold 7 : 0.9949979991996799 -- 9996 indices
Average score: 0.9875301420222102
