# VGG type convolutional neural network (CNN) classifier

## Notes

* **Reference** : Simonyan, K. and Zisserman, A. (2015) '*Very Deep Convolutional Networks for Large-Scale Image Recognition*'. arXiv:1409.1556.


## Create a VGG type convolutional neural network (CNN) classifier in Keras

### Import required Python libraries



In [0]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Activation
from tensorflow.python.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.optimizers import SGD

### Import and shape the dataset

* typically the data should be pre-processed and shaped before being imported
* typically the dataset, comprising a set of **input:ouput pairs**, is split in a **seen** dataset used for *training* and *validating* the neural network, and an **unseen** dataset used for *testing* the performance of the trained neural network with: <br>
sklearn.model_selection.train_test_split
* the output class of each sample needs to be **one-hot encoded** for classification applications

In [0]:
print('* Importing and shaping the data *')
print()

mnist = tf.keras.datasets.mnist  # load mnist dataset from tensorflow
(input_train, output_train_class), (input_test, output_test_class) = mnist.load_data()

print('input_train (original): ', input_train.shape)
print('input_test (original): ', input_test.shape)
print()

input_train= input_train.reshape(input_train.shape[0], 28, 28, 1)  # add an extra dimension to array
input_test= input_test.reshape(input_test.shape[0], 28, 28, 1)

input_train = input_train / 255.0  # max normalise the image data[0:1]
input_test = input_test / 255.0

output_train_class_onehot = tf.keras.utils.to_categorical(output_train_class, 10)  # create one-hot encoded class
output_test_class_onehot = tf.keras.utils.to_categorical(output_test_class, 10)

output_class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']    # class names string

print('input_train : ', input_train.shape)
print('output_train_class : ', output_train_class.shape)
print('output_train_class_onehot : ', output_train_class_onehot.shape)
print()
print('input_test : ', input_test.shape)
print('output_test_class : ', output_test_class.shape)
print('output_test_class_onehot : ', output_test_class_onehot.shape)
print()
print('output_class_names : ', output_class_names)
print()

item_id = 5

print('item_id : ', item_id)
print('output_train_class [item_id] : ', output_train_class[item_id])
print('output_train_class_onehot [item_id] : ', output_train_class_onehot[item_id, :])

plt.imshow(input_train[item_id, :, :, 0], cmap=plt.cm.binary)
plt.title('input_train [' + str(item_id) + ']')
plt.grid(None)
plt.xticks([])
plt.yticks([])
plt.show()

### Define the network hyperparameters

* **hyperparameters** are the variables which determine the network structure and how the network is trained
* structural hyperparameters: number of hidden layers, number of nodes in each layer...
* training hyperparameters: learning rate, dropout ratio, number of epochs...
* hyperparameters are set before training

In [0]:
optimizer_type = SGD(lr=0.2)  # optimisation algorithm: SGD stochastic gradient decent 
loss = 'categorical_crossentropy'  # loss (cost) function to be minimised by the optimiser
metrics = ['categorical_accuracy']  # network accuracy metric to be determined after each epoch
dropout_ratio = 0.0  # % of nodes in the hidden layer to dropout during back-propagation update of the network weights
validtrain_split_ratio = 0.2  # % of the seen dataset to be put aside for validation, rest is for training
max_epochs = 40  # maxmimum number of epochs to be iterated
batch_size = 500   # batch size for the training data set
batch_shuffle = True   # shuffle the training data prior to batching before each epoch
num_hidden_nodes = 256  # number of nodes in hidden fully connected layer

### Define the network architecture

* using the Keras' *functional* model  [[Link]](https://keras.io/models/model/)
* can also use Keras' *sequential* model but limited to simpler architectures  [[Link]](https://keras.io/models/sequential/)
* can specify the type of each layer, for example dense (fully connected), convolutional, dropout etc. [[Link]](https://keras.io/layers/about-keras-layers/)
* can specify the activation function to be used in each layer, for example sigmoid, relu etc. [[Link]](https://keras.io/activations/)
* **softmax** activation, also known as *softargmax* or *normalized exponential function*, is typically used for the final layer of a classifier network to normalise its output into a probability distribution of the classes
* network weights are typically initialised with random values


In [0]:
input_shape = (28, 28, 1)
inputs = Input(shape=input_shape)

down_01 = Conv2D(filters=16, kernel_size=(3, 3), strides=(1, 1), padding='same')(inputs)
down_01 = Activation('relu')(down_01)
down_01 = Conv2D(filters=16, kernel_size=(3, 3), strides=(1, 1), padding='same')(down_01)
down_01 = Activation('relu')(down_01)

down_01_pool = MaxPooling2D((2, 2), strides=(2, 2))(down_01)   # maxpool downsampled to 14x14x16

down_02 = Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same')(down_01_pool)
down_02 = Activation('relu')(down_02)
down_02 = Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same')(down_02)
down_02 = Activation('relu')(down_02)

down_02_pool = MaxPooling2D((2, 2), strides=(2, 2))(down_02)   # maxpool downsampled to 7x7x32

flatten = Flatten()(down_02_pool)   # 1568 nodes

dense_01 = Dense(num_hidden_nodes)(flatten)
dense_01 = Activation('sigmoid')(dense_01)
dense_01 = Dropout(dropout_ratio)(dense_01)

dense_02 = Dense(10)(dense_01)
outputs = Activation('softmax')(dense_02)

### Compile the network

* compile the defined network architecture with the stated **optimizer algorithm** [[Link]](https://keras.io/optimizers/), **loss (cost) function**  [[Link]](https://keras.io/losses/), and **accuracy metrics** [[Link]](https://keras.io/metrics/) using: .compile()
* print network architecture using: .summary()
* create and save a schematic image of the network architecture using: .plot_model()
* schematic image saved to the runtime disk, remember to download to local machine before termination

In [0]:
print()
print('* Compiling the network model *')
print()

model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=optimizer_type, loss=loss, metrics=metrics)

# display a summary of the compiled neural network

print(model.summary())  
print()

# create and save a schematic image of the network architecture

from tensorflow.keras.utils import plot_model
from IPython.display import Image

print('Graphical schematic of the compiled network')
print()

plot_model(model, show_shapes=True, show_layer_names=True, to_file='model.png')
Image(filename='model.png')

### Train the neural network with the training dataset

* the **seen** dataset is split into **training** and **validation** subsets
* the training set can be broken down into **batches**
* the network weights are updated after each training batch by back propagation using the **optimiser** algorithm to minimise the **loss (cost) function**
* one **epoch** is one training cycle of all training batches
* after all training batches have been processed in an epoch, the network is tested with the validation data set and the resulting loss function and accuracy metrics are displayed
* the training data can be shuffled and rebatched before each epoch
* training continues until the stated maximum number of epochs has been reached or an early stop criteria has been satisfied, for example when the loss (cost) function begins to increase

In [0]:
print('* Training the compiled network *')
print()

history = model.fit(input_train, output_train_class_onehot, \
                    batch_size=batch_size, \
                    epochs=max_epochs, \
                    validation_split=validtrain_split_ratio, \
                    shuffle=batch_shuffle)

print()
print('Training completed')
print()

### Plot the training history of the network

* usefull for seeing the convergence of the training, oscillations of the cost function between local minima, and the presence of over fitting

In [0]:
# model loss

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss : ' + loss)
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='best')
plt.show()
plt.close()

# model accuracy metric

plt.plot(np.array(history.history[metrics[0]]))
plt.plot(np.array(history.history['val_' + metrics[0]]))
plt.title('Model accuracy metric : ' + metrics[0])
plt.ylabel('Accuracy metric')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='best')
plt.show()
plt.close()

### Evaluate the trained network performance on the unseen test dataset

* the performance of the trained network on unseen test data can be assessed using: .evaluate()

In [0]:
print('* Evaluating the performance of the trained network on the unseen test dataset *')
print()

evaluate_model = model.evaluate(x=input_test, y=output_test_class_onehot)
loss_metric = evaluate_model [0]
accuracy_metric = evaluate_model [1]

print()
print('Accuracy - ' + metrics[0] + ': %0.3f'%accuracy_metric)
print('Loss - ' + loss + ': %0.3f'%loss_metric)

### Create and display the test set classification report

* provides in-depth statistics of the test data predictions provided by the trained neural network

In [0]:
from sklearn.metrics import classification_report

output_predict_class_onehot = model.predict(input_test)
output_predict_class = np.argmax(output_predict_class_onehot, axis=1)

print('* Test set classification report *')
print()
print(classification_report(output_test_class, output_predict_class,  \
                            target_names=output_class_names))

### Display the test set confusion probability matrix

. usefull way of seeing which classes the trained network mixes up

In [0]:
print('* Confusion probability matrix *')
print()

import itertools

from sklearn.metrics import confusion_matrix


confusion_matrix = confusion_matrix(output_test_class, output_predict_class)  # confusion matrix

confusion_probability_matrix = confusion_matrix.astype('float') / \
                               confusion_matrix.sum(axis=1)[:, np.newaxis]  # row normalisation of confusion matrix
confusion_probability_matrix = confusion_probability_matrix * 100.0  # confusion probability matrix

plt.imshow(confusion_probability_matrix, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Normalised confusion matrix')    
plt.colorbar(label='%')
plt.clim(0, 100)
tick_marks = np.arange(len(output_class_names))
plt.xticks(tick_marks, output_class_names, rotation=0)
plt.yticks(tick_marks, output_class_names)
fmt = '.1f'
thresh = confusion_probability_matrix.max() / 2.0
for i, j in itertools.product(range(confusion_probability_matrix.shape[0]), range(confusion_probability_matrix.shape[1])):
    plt.text(j, i, format(confusion_probability_matrix[i, j], fmt),
             horizontalalignment='center',
             color='white' if confusion_probability_matrix[i, j] > thresh else 'black')    
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.grid(None)
plt.show()
plt.close()

### Predict the class of a given input

* might need to reshape the input to match the network input shape
* need to apply an *argmax* to the estimated probabilty distribution provided by the trained network to define the predicted class

In [0]:
print('* Predicting the class of a given input *')
print()

test_id = 58

input_predict = np.zeros(shape=(1, 28, 28, 1))  # create numpy array of required dimensions for network input

input_predict[0, :, :, 0] = input_test[test_id, :, :, 0]  # reshaping test input image

output_predict_class_onehot = model.predict(input_predict)  # softmax distribution of predicted class

output_predict_class = np.argmax(output_predict_class_onehot[0])  # predicted class of input

print('test_id : ', test_id)
print()
print('output_predict_class_onehot [test_id]: \n\n', output_predict_class_onehot)
print()
print('sum[output_predict_class_onehot [test_id]] : ', np.sum(output_predict_class_onehot))  # should be = 1.0
print()
print('output_test_class_onehot [item_id] : ', output_test_class_onehot[test_id])
print()
print('output_test_class [item_id] : ', output_test_class[test_id])
print()
print('output_predict_class [item_id] : ', output_predict_class)
print()

plt.imshow(input_test[test_id, :, :, 0], cmap=plt.cm.binary)
plt.title('input_test [' + str(test_id) + ']')
plt.grid(None)
plt.xticks([])
plt.yticks([])
plt.show()
