<a href="https://colab.research.google.com/github/gothchico/handwritten-digit-recognition/blob/master/test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Machine Learning - Homework IV </h1>

### Sarah Seifi, ist197658, sarah.seifi@tecnico.ulisboa.pt
### Mahima Raut, ist197569, cs17b112@smail.iitm.ac.in


In this notebook, different neural network architectures are used to achieve the highest performance on the test set of the famous [MNIST](http://yann.lecun.com/exdb/mnist/) data set. 


1.   First, the use of Feed-Forward and Convolutional Neural Networks are explored.
2.   Secondly, the networks are explored using different numbers of hidden layers.
3.   As a last step, the impact of different regularization methods is assessed.



# **Importing Libraries**


* [tensorflow](https://www.tensorflow.org/): the neural network library
* [tensorflow_datasets](https://www.tensorflow.org/datasets): provides the datasets that we will use
* [numpy](https://numpy.org/): we will use it to store the data in array format for visualization
* [sklearn](https://scikit-learn.org/): provides a [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) implementation that we will use for visualization
* [matplotlib](https://matplotlib.org/): plotting library for visualization



In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

import numpy as np

import sklearn.decomposition
import matplotlib.pyplot as plt

# Load the data

The MNIST data set is widely used for introducing image classification problems. It has following characteristics: 

*   70k examples of handwritten digits
*   Image size: 28x28
*   1 channel
*   10 classes: [0-9]

In [None]:
mnist_data, mnist_info = tfds.load('mnist', with_info=True)

In [None]:
print(mnist_info)

Now, the data set is converted into NumPy arrays:

In [None]:
mnist_train_x = np.asarray([instance['image']/255 for instance in tfds.as_numpy(mnist_data['train'])])
mnist_train_y = np.asarray([instance['label'] for instance in tfds.as_numpy(mnist_data['train'])])

mnist_test_x = np.asarray([instance['image']/255 for instance in tfds.as_numpy(mnist_data['test'])])
mnist_test_y = np.asarray([instance['label'] for instance in tfds.as_numpy(mnist_data['test'])])

# Models

Here, some networks are created and trained to approach the problem posed by the MNIST dataset.

## Feed-Forward Neural Network

Feed-Forward Neural Networks (FFNN) or also known multilayer perceptrons are the foundation of most deep learning models. Networks like CNNs and RNNs are just some special cases of FFNNs. These networks are mostly used for supervised machine learning tasks where we already know the target function.

The main goal of a FFNN is to approximate some function f*. For example, a regression function y = f *(x) maps an input x to a value y. A FFNN defines a mapping y = f (x; θ) and learns the value of the parameters θ that result in the best function approximation.

The reason these networks are called feed-forward is that the flow of information takes place in the forward direction.In this, if we add feedback from the last hidden layer to the first hidden layer it would represent a recurrent neural network.

We need FFNN over linear machine learning models, because linear models are limited to only linear functions whereas neural networks aren’t.The hidden layers are used to increase the non-linearity and change the representation of the data for better generalization over the function.

---

As a first step, the FFNN is discussed. Before passing the images to the Dense layer, they are flattened:

In [None]:
mnist_baseline_model = tf.keras.Sequential(name='mnist_baseline')
mnist_baseline_model.add(tf.keras.layers.Input(mnist_info.features['image'].shape))
mnist_baseline_model.add(tf.keras.layers.Flatten(name='flatten'))
mnist_baseline_model.add(tf.keras.layers.Dense(mnist_info.features['label'].num_classes, activation='softmax', name='output'))
mnist_baseline_model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
mnist_baseline_model.summary()

The model can now be trained using part of the training set for validation, in order to control when to stop the training phase:

In [None]:
earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint('mnist_baseline_best.h5', monitor='val_accuracy', verbose=1, save_best_only=True)

mnist_baseline_model_train = mnist_baseline_model.fit(mnist_train_x, mnist_train_y, validation_split=0.2, callbacks=[earlystop,checkpoint], epochs=10000, batch_size=256)

The evaluation of the evolvement of the performance on the training and validation data is done, by plotting the results:



In [None]:
fig, (loss_ax, acc_ax) = plt.subplots(1, 2, figsize=(20,7))

loss_ax.set_title('Loss')
loss_ax.plot(mnist_baseline_model_train.history['loss'], '-r', label='Train')
loss_ax.plot(mnist_baseline_model_train.history['val_loss'], '-g', label='Validation')

acc_ax.set_title('Accuracy')
acc_ax.plot(mnist_baseline_model_train.history['accuracy'], '-r', label='Train')
acc_ax.plot(mnist_baseline_model_train.history['val_accuracy'], '-g', label='Validation')

plt.legend(loc=4)
plt.show()

As a last step, the best model for the validation data is loaded and evaluated on the test set:

In [None]:
mnist_baseline_model.load_weights('mnist_baseline_best.h5')
loss, acc = mnist_baseline_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

## Convolutional Neural Networks

To create our CNN, instead of feeding the flatenned output directly to the output layer, we will first pass it through a convolutional layer followed by a max pooling operation. Since, we are dealing with 2D data, we will use the [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) and [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D) layers. 

In the convolutional layer, the *filters* parameter defines the number of kernels or filters used in the layer. The *kernel_size* parameter defines the size of the kernels. If only one number is provided, the kernel is assumed to be square. The stride values default to one, but can be changed using the *strides* parameter. Also, we can use same padding, by setting the *padding* parameter to 'same'.

For the pooling operation, we define the size of the pooling window using the *pool_size* parameter. Similarly to the *kernel_size* parameter of the convolutional layer, if only one number is provided, the window is assumed to be square. Additionally, the *strides* parameter defaults to the size of the pooling window. That is, there is no overlap.

In [None]:
mnist_conv_model = tf.keras.Sequential(name='mnist_cnn')
mnist_conv_model.add(tf.keras.layers.Input(mnist_info.features['image'].shape))
mnist_conv_model.add(tf.keras.layers.Conv2D(filters=16, kernel_size=4, activation='relu', padding='same', name='convolution'))
mnist_conv_model.add(tf.keras.layers.MaxPool2D(pool_size=2, name='pooling'))
mnist_conv_model.add(tf.keras.layers.Flatten(name='flatten'))
mnist_conv_model.add(tf.keras.layers.Dense(mnist_info.features['label'].num_classes, activation='softmax', name='output'))
mnist_conv_model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
mnist_conv_model.summary()

The model is trained using the prior approach:

In [None]:
earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint('mnist_conv_best.h5', monitor='val_accuracy', verbose=1, save_best_only=True)

mnist_conv_model_train = mnist_conv_model.fit(mnist_train_x, mnist_train_y, validation_split=0.2, callbacks=[earlystop,checkpoint], epochs=10000, batch_size=256)

Again, the outcome is visualized:

In [None]:
fig, (loss_ax, acc_ax) = plt.subplots(1, 2, figsize=(20,7))

loss_ax.set_title('Loss')
loss_ax.plot(mnist_conv_model_train.history['loss'], '-r', label='Train')
loss_ax.plot(mnist_conv_model_train.history['val_loss'], '-g', label='Validation')

acc_ax.set_title('Accuracy')
acc_ax.plot(mnist_conv_model_train.history['accuracy'], '-r', label='Train')
acc_ax.plot(mnist_conv_model_train.history['val_accuracy'], '-g', label='Validation')

plt.legend(loc=4)
plt.show()

As a last step, the model is evaluated on the test set, and it is verified that the performance is higher than without the convolutional layers.

In [None]:
mnist_conv_model.load_weights('mnist_conv_best.h5')
loss, acc = mnist_conv_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

#Explore networks with a different number of hidden layers

As a next step, network models with varying hidden layers are explored. For this, a FFNN with one and two hidden layers, respectively, are implemented. Afterwards, a CNN with two hidden layers is illustrated.

## Feed-Forward Network with One Hidden Layer

A FFNN with one hidden layer is set up:

In [None]:
mnist_ffnn_1_model = tf.keras.Sequential(name='mnist_ffnn_1')
mnist_ffnn_1_model.add(tf.keras.layers.Input(mnist_info.features['image'].shape))
mnist_ffnn_1_model.add(tf.keras.layers.Flatten(name='flatten'))
mnist_ffnn_1_model.add(tf.keras.layers.Dense(256,name='hidden_layer1',activation='tanh'))
mnist_ffnn_1_model.add(tf.keras.layers.Dense(mnist_info.features['label'].num_classes, activation='softmax', name='output'))
mnist_ffnn_1_model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
mnist_ffnn_1_model.summary()

Now we can train the model. We will still use part of the training set for validation, in order to control when to stop the training phase:

In [None]:
earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint('mnist_ffnn_1_best.h5', monitor='val_accuracy', verbose=1, save_best_only=True)

mnist_ffnn_1_model_train = mnist_ffnn_1_model.fit(mnist_train_x, mnist_train_y, validation_split=0.2, callbacks=[earlystop,checkpoint], epochs=10000, batch_size=256)

Let's see how the performance evolved on the training and validation data:

In [None]:
fig, (loss_ax, acc_ax) = plt.subplots(1, 2, figsize=(20,7))

loss_ax.set_title('Loss')
loss_ax.plot(mnist_ffnn_1_model_train.history['loss'], '-r', label='Train')
loss_ax.plot(mnist_ffnn_1_model_train.history['val_loss'], '-g', label='Validation')

acc_ax.set_title('Accuracy')
acc_ax.plot(mnist_ffnn_1_model_train.history['accuracy'], '-r', label='Train')
acc_ax.plot(mnist_ffnn_1_model_train.history['val_accuracy'], '-g', label='Validation')

plt.legend(loc=4)
plt.show()

Now let's load the best model for the validation data and evaluate it on the test set:

In [None]:
mnist_ffnn_1_model.load_weights('mnist_ffnn_1_best.h5')
loss, acc = mnist_ffnn_1_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

Accuracy: 0.9729999899864197


## Feed-Forward Network with Two Hidden Layers

Now, a FFNN with two hidden layers is implemented:

In [None]:
mnist_ffnn_2_model = tf.keras.Sequential(name='mnist_ffnn_2')
mnist_ffnn_2_model.add(tf.keras.layers.Input(mnist_info.features['image'].shape))
mnist_ffnn_2_model.add(tf.keras.layers.Flatten(name='flatten'))
mnist_ffnn_2_model.add(tf.keras.layers.Dense(256,name='hidden_layer1',activation='tanh'))
mnist_ffnn_2_model.add(tf.keras.layers.Dense(256,name='hidden_layer2',activation='tanh'))
mnist_ffnn_2_model.add(tf.keras.layers.Dense(mnist_info.features['label'].num_classes, activation='softmax', name='output'))
mnist_ffnn_2_model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
mnist_ffnn_2_model.summary()

Again, we use callbacks for stopping the training phase and finally train the model:

In [None]:
earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint('mnist_ffnn_2_best.h5', monitor='val_accuracy', verbose=1, save_best_only=True)

mnist_ffnn_2_model_train = mnist_ffnn_2_model.fit(mnist_train_x, mnist_train_y, validation_split=0.2, callbacks=[earlystop,checkpoint], epochs=10000, batch_size=256)

Visualizing the evolution:

In [None]:
fig, (loss_ax, acc_ax) = plt.subplots(1, 2, figsize=(20,7))

loss_ax.set_title('Loss')
loss_ax.plot(mnist_ffnn_2_model_train.history['loss'], '-r', label='Train')
loss_ax.plot(mnist_ffnn_2_model_train.history['val_loss'], '-g', label='Validation')

acc_ax.set_title('Accuracy')
acc_ax.plot(mnist_ffnn_2_model_train.history['accuracy'], '-r', label='Train')
acc_ax.plot(mnist_ffnn_2_model_train.history['val_accuracy'], '-g', label='Validation')

plt.legend(loc=4)
plt.show()

Although there is significant improvement in the accuracy when 1 hidden layer is introduced in contrast with the baseline model , we do not observe substantial improvement when hidden layers are increased from 1 to 2 in an FFNN. It does take double the time for every epoch approximately on an average, when 2 hidden layers exist instead of 1:

In [None]:
mnist_baseline_model.load_weights('mnist_baseline_best.h5')
loss, acc = mnist_baseline_model.evaluate(mnist_test_x, mnist_test_y)
print('FFNN with 0 hidden layers; Accuracy: {}'.format(acc))

mnist_ffnn_1_model.load_weights('mnist_ffnn_1_best.h5')
loss, acc = mnist_ffnn_1_model.evaluate(mnist_test_x, mnist_test_y)
print('FFNN with 1 hidden layer; Accuracy: {}'.format(acc))

mnist_ffnn_2_model.load_weights('mnist_ffnn_2_best.h5')
loss, acc = mnist_ffnn_2_model.evaluate(mnist_test_x, mnist_test_y)
print('FFNN with 2 hidden layers; Accuracy: {}'.format(acc))

| Number of Hidden Layers | Result |

 0 - Only capable of representing linear separable functions or decisions.

 1 - Can approximate any function that contains a continuous mapping
from one finite space to another.

 2+ - Can represent an arbitrary decision boundary to arbitrary accuracy
with rational activation functions and can approximate any smooth
mapping to any accuracy. Might lead to overfitting in some cases.

In short, FFNN with two hidden layers can represent functions with any kind of shape. There is no reason to use FFNN with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer.

## CNN with Two Hidden Layers 

For this CNN two hidden layers were added:

In [None]:
mnist_cnn_2_model = tf.keras.Sequential(name='mnist_cnn_2')
mnist_cnn_2_model.add(tf.keras.layers.Input(mnist_info.features['image'].shape))
mnist_cnn_2_model.add(tf.keras.layers.Conv2D(filters=16, kernel_size=4, activation='relu', padding='same', name='convolution1'))
mnist_cnn_2_model.add(tf.keras.layers.MaxPool2D(pool_size=2, name='pooling1'))
mnist_cnn_2_model.add(tf.keras.layers.Conv2D(filters=25, kernel_size=3, activation='relu', padding='same', name='convolution2'))
mnist_cnn_2_model.add(tf.keras.layers.MaxPool2D(pool_size=3, name='pooling2'))
mnist_cnn_2_model.add(tf.keras.layers.Flatten(name='flatten'))
mnist_cnn_2_model.add(tf.keras.layers.Dense(mnist_info.features['label'].num_classes, activation='softmax', name='output'))
mnist_cnn_2_model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
mnist_cnn_2_model.summary()

The model is trained using the prior approach:

In [None]:
earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint('mnist_cnn_2_best.h5', monitor='val_accuracy', verbose=1, save_best_only=True)

mnist_cnn_2_model_train = mnist_cnn_2_model.fit(mnist_train_x, mnist_train_y, validation_split=0.2, callbacks=[earlystop,checkpoint], epochs=10000, batch_size=256)

Again, the outcome is visualized:

In [None]:
fig, (loss_ax, acc_ax) = plt.subplots(1, 2, figsize=(20,7))

loss_ax.set_title('Loss')
loss_ax.plot(mnist_cnn_2_model_train.history['loss'], '-r', label='Train')
loss_ax.plot(mnist_cnn_2_model_train.history['val_loss'], '-g', label='Validation')

acc_ax.set_title('Accuracy')
acc_ax.plot(mnist_cnn_2_model_train.history['accuracy'], '-r', label='Train')
acc_ax.plot(mnist_cnn_2_model_train.history['val_accuracy'], '-g', label='Validation')

plt.legend(loc=4)
plt.show()

We can finally use this model on our test set and observe the results in comparison with CNN with 1 convoluted layer and with the baseline model (no convoluted layer) :

In [None]:
mnist_baseline_model.load_weights('mnist_baseline_best.h5')
loss, acc = mnist_baseline_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

mnist_conv_model.load_weights('mnist_conv_best.h5')
loss, acc = mnist_conv_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

mnist_cnn_2_model.load_weights('mnist_cnn_2_best.h5')
loss, acc = mnist_cnn_2_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

# Regularization Methods: General L2 Regularization and Dropout

When adding layers to the network, we can also include regularization in those layers using three different parameters: kernel_regularizer, bias_regularizer, and activity_regularizer. The first applies regularization to the weights of the layer, the second to its bias, and the last to its output. Keras also has implementations of multiple regularizers. 

We take the best fitting models each from both the FFNNs and CNNs and apply L2 and dropout regularitaztion, respectively, to see if there's any improvement in the performance.

## L2 Regularization on FFNN

Taking FFNN with two hidden layers and applying L2 regularization to the weights of the hidden layer:

In [None]:
mnist_ffnn_2_l2_model = tf.keras.Sequential(name='mnist_ffnn_2_l2')
mnist_ffnn_2_l2_model.add(tf.keras.layers.Input(mnist_info.features['image'].shape))
mnist_ffnn_2_l2_model.add(tf.keras.layers.Flatten(name='flatten'))
mnist_ffnn_2_l2_model.add(tf.keras.layers.Dense(256,name='hidden_layer1',activation='tanh', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
mnist_ffnn_2_l2_model.add(tf.keras.layers.Dense(256,name='hidden_layer2',activation='tanh',kernel_regularizer=tf.keras.regularizers.l2(0.01)))
mnist_ffnn_2_l2_model.add(tf.keras.layers.Dense(mnist_info.features['label'].num_classes, activation='softmax', name='output'))
mnist_ffnn_2_l2_model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
mnist_ffnn_2_l2_model.summary()

Callbacks and Training:

In [None]:
earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint('mnist_ffnn_2_l2_best.h5', monitor='val_accuracy', verbose=1, save_best_only=True)

mnist_ffnn_2_l2_model_train = mnist_ffnn_2_l2_model.fit(mnist_train_x, mnist_train_y, validation_split=0.2, callbacks=[earlystop,checkpoint], epochs=10000, batch_size=256)

Plotting accuracy and loss error of Training and Validation set:

In [None]:
fig, (loss_ax, acc_ax) = plt.subplots(1, 2, figsize=(20,7))

loss_ax.set_title('Loss')
loss_ax.plot(mnist_ffnn_2_l2_model_train.history['loss'], '-r', label='Train')
loss_ax.plot(mnist_ffnn_2_l2_model_train.history['val_loss'], '-g', label='Validation')

acc_ax.set_title('Accuracy')
acc_ax.plot(mnist_ffnn_2_l2_model_train.history['accuracy'], '-r', label='Train')
acc_ax.plot(mnist_ffnn_2_l2_model_train.history['val_accuracy'], '-g', label='Validation')

plt.legend(loc=4)
plt.show()

We can see that the model converges faster. Also, by plotting the evolution, we can see that there are fewer oscillations:

In [None]:
mnist_baseline_model.load_weights('mnist_baseline_best.h5')
loss, acc = mnist_baseline_model.evaluate(mnist_test_x, mnist_test_y)
print('FFNN with 0 hidden layers; Accuracy: {}'.format(acc))

mnist_ffnn_1_model.load_weights('mnist_ffnn_1_best.h5')
loss, acc = mnist_ffnn_1_model.evaluate(mnist_test_x, mnist_test_y)
print('FFNN with 1 hidden layer; Accuracy: {}'.format(acc))

mnist_ffnn_2_model.load_weights('mnist_ffnn_2_best.h5')
loss, acc = mnist_ffnn_2_model.evaluate(mnist_test_x, mnist_test_y)
print('FFNN with 2 hidden layers; Accuracy: {}'.format(acc))

mnist_ffnn_2_l2_model.load_weights('mnist_ffnn_2_best.h5')
loss, acc = mnist_ffnn_2_l2_model.evaluate(mnist_test_x, mnist_test_y)
print('FFNN with 2 hidden layers and L2 regularization; Accuracy: {}'.format(acc))

## Dropout Regularization on CNN
Taking CNN with two hidden layers and applying Dropout:

In [None]:
mnist_cnn_2_drop_model = tf.keras.Sequential(name='mnist_cnn_2_drop')
mnist_cnn_2_drop_model.add(tf.keras.layers.Input(mnist_info.features['image'].shape))
mnist_cnn_2_drop_model.add(tf.keras.layers.Conv2D(filters=16, kernel_size=4, activation='relu', padding='same', name='convolution1'))
mnist_cnn_2_drop_model.add(tf.keras.layers.MaxPool2D(pool_size=2, name='pooling1'))
mnist_cnn_2_drop_model.add(tf.keras.layers.Conv2D(filters=25, kernel_size=3, activation='relu', padding='same', name='convolution2'))
mnist_cnn_2_drop_model.add(tf.keras.layers.MaxPool2D(pool_size=3, name='pooling2'))
mnist_cnn_2_drop_model.add(tf.keras.layers.Dropout(0.5, name='dropout'))
mnist_cnn_2_drop_model.add(tf.keras.layers.Flatten(name='flatten'))
mnist_cnn_2_drop_model.add(tf.keras.layers.Dense(mnist_info.features['label'].num_classes, activation='softmax', name='output'))
mnist_cnn_2_drop_model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
mnist_cnn_2_drop_model.summary()

Let's train the model:

In [None]:
earlystop = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=10, verbose=1)
checkpoint = tf.keras.callbacks.ModelCheckpoint('mnist_cnn_2_drop_best.h5', monitor='val_accuracy', verbose=1, save_best_only=True)

mnist_cnn_2_drop_model_train = mnist_cnn_2_drop_model.fit(mnist_train_x, mnist_train_y, validation_split=0.2, callbacks=[earlystop,checkpoint], epochs=10000, batch_size=256)

By looking at the evolution, we can see that the performance of the model on the training data is now lower.

In [None]:
fig, (loss_ax, acc_ax) = plt.subplots(1, 2, figsize=(20,7))

loss_ax.set_title('Loss')
loss_ax.plot(mnist_cnn_2_drop_model_train.history['loss'], '-r', label='Train')
loss_ax.plot(mnist_cnn_2_drop_model_train.history['val_loss'], '-g', label='Validation')

acc_ax.set_title('Accuracy')
acc_ax.plot(mnist_cnn_2_drop_model_train.history['accuracy'], '-r', label='Train')
acc_ax.plot(mnist_cnn_2_drop_model_train.history['val_accuracy'], '-g', label='Validation')

plt.legend(loc=4)
plt.show()

And we can assess the performance on the test set:

In [None]:
mnist_baseline_model.load_weights('mnist_baseline_best.h5')
loss, acc = mnist_baseline_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

mnist_conv_model.load_weights('mnist_conv_best.h5')
loss, acc = mnist_conv_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

mnist_cnn_2_model.load_weights('mnist_cnn_2_best.h5')
loss, acc = mnist_cnn_2_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))

mnist_cnn_2_drop_model.load_weights('mnist_cnn_2_drop_best.h5')
loss, acc = mnist_cnn_2_drop_model.evaluate(mnist_test_x, mnist_test_y)
print('Accuracy: {}'.format(acc))