code derived/modified from section 6.3 of 'Introduction to Deep Learning' by Sandro Skansi and
https://www.analyticsvidhya.com/blog/2020/02/learn-image-classification-cnn-convolutional-neural-networks-3-datasets/

**Load requires libraries**

In [1]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
(train_samples, train_labels), (test_samples, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


**Load data and set it up into training & testing splits**

In [2]:
train_samples = train_samples.reshape(train_samples.shape [0], 28, 28, 1) # each digit is represented as a 28x28 image
test_samples = test_samples.reshape(test_samples.shape [0], 28, 28, 1)
train_samples = train_samples.astype('float32')
test_samples = test_samples.astype('float32')
train_samples = train_samples/255
test_samples = test_samples/255

c_train_labels = np_utils.to_categorical(train_labels, 10) # labels in one-hot encoding format
c_test_labels = np_utils.to_categorical(test_labels, 10)

**Find out the size/dimensions of training and testing splits**

In [3]:
print(train_samples.shape) # 60,000 samples, each sample in 28x28x1 dimension image i.e. it's grayscale (and not colour image)
print(test_samples.shape)
print(c_train_labels.shape)
print(c_test_labels.shape)
print(c_test_labels[1,]) # prints label for one test sample to give us an idea of how labels are represented in one-hot encoding format

(60000, 28, 28, 1)
(10000, 28, 28, 1)
(60000, 10)
(10000, 10)
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]


**Create an empty neural network and add layers to it**

In [4]:
convnet = Sequential() # empty network

# gets 98% or above accuracy on testing data with 20 epochs and mini-batch size of 128
convnet.add(Convolution2D(25, 3, 3, activation='relu', padding='valid')) # convlution layer with 25 filters, each size 3x3
convnet.add(Flatten())
convnet.add(Dense(100, activation='relu')) # fully connected hidden layer
convnet.add(Dense(10, activation='softmax')) # output layer

**Set required parameters (loss, optimiser, accuracy metric)**

In [10]:
#convnet.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
convnet.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # try alternate loss function & optimiser

**Train network (i.e. perform learning from training data)**

(specify mini-batch size and number of epochs)

In [11]:
convnet.fit(train_samples, c_train_labels, batch_size=128, epochs=20, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fcc16caac90>

**Utilise trained network/model to make predictions about testing data and conduct evaluation**

In [12]:
metrics = convnet.evaluate(test_samples, c_test_labels, verbose=1)
print()
print("%s: %.2f%%" % (convnet.metrics_names[1], metrics[1]*100))


accuracy: 98.25%


**Print and observe network predictions against actual target values**

In [13]:
predictions = convnet.predict(test_samples) # raw network predictions

In [14]:
n=100
print(c_test_labels[n], predictions[n]) # print actual sample label (one-hot encoding format and network prediction in probablity estimate format) for a selected sample
print(sum(predictions[n])) # the sum of all 10 probabilities for each sample should add up to 1

[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [1.6743588e-08 2.3836468e-11 1.5365907e-10 3.4680220e-10 7.6226511e-15
 4.2004308e-10 1.0000000e+00 3.0282728e-11 5.6916438e-10 2.7087721e-13]
1.000000018287654
