# Loading libraries & Setup

Importing Numpy, MNIST, Keras and Matplotlib.

In [1]:
# NumPy - mathematical functions on multi-dimensional arrays and matrices
import numpy as np
# MNIST - digital database of handwritten digits
import mnist

# Matplotlib - plotting library to create graphs and charts
import matplotlib.pyplot as plt
# import matplotlib.image as mpimg

# Keras - Python open-source neural-network library
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical

Using TensorFlow backend.


# Preparing training data

Importing train and test images from MNIST. Keeping original images to display some digits initially and during testing.

In [2]:
train_images = mnist.train_images()
train_labels = mnist.train_labels()

Let's look at the shape of the data

In [3]:
train_images.shape

(60000, 28, 28)

We have 60000 images and the size of image is 28 x 28 pixels, represented as 28 by 28 matrix of numbers.

Before we normalise our traning data let's grab some images for display.

In [4]:
train_images_display = train_images[:5]

Let's normalise and flatten our training images data. This is a better format too use for our Neural Network training.

In [5]:
# Normalize the images.
train_images = (train_images / 255) - 0.5

# Flatten the images - changing the dimension of the array from 28 x 28 to 1 x 784.
train_images = train_images.reshape((-1, 784))

print('Our training array has a shape ' + str(train_images.shape))
print('Each element (image) has a shape ' + str(train_images[0].shape))

Our training array has a shape (60000, 784)
Each element (image) has a shape (784,)


# Let's print some digits

Printing some digits using Matplotlib.

In [None]:
# Display images - it takes a lot of memory so skip if you can
f = plt.figure(figsize=(10,5))
columns = 5
images = train_images_display[:20]
for i, image in enumerate(images):
    plt.subplot(len(images) / columns + 1, columns, i + 1)
    # imshow displays array-like images
    plt.imshow(image)

plt.show()
f.clear()
plt.close(f)

# Building the model

Creating Artificial Neural Network model with Keras. Our Neural Network has 1 input layer, 1 hidden layer, and 1 output layer.

We are creating our model which groups layers into an object. The sequential model is suitable for most problems. It does not allow you to create models that share layers or have multiple inputs or outputs.

We are using Dense layers in our NN which are regular densely-connected NN layers. Densely-connected means that each neuron in a layer receives an input from all the neurons in the previous layer.

The input_shape specifies the dimension of the input data. In our example it is an image 28 by 28 pixels that we have flattened to (784,).

The output layer has to have the same number of nodes that we expect results. In our example we are expecting a result from 0 to 9 which means we need 10 nodes in the output.

We are using softmax activation function for the output that converts a real vector to a vector of categorical probabilities. The elements of the output vector are in range (0, 1).

How are we going to get a result from 0 to 9?. We will convert our labels during training to an array using Keras to_categorical function and our output from Neural Network will be in the same format.

This means that 1 for example will be represented as [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], 2 will be represented as [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.] and our results for each digits will look like this [1.9625609e-06 6.5977215e-09 1.7479050e-04 2.1603969e-03 1.0921193e-08 1.1998044e-05 9.2225836e-12 9.9754709e-01 2.6972772e-05 7.6780409e-05] which will give us probabilities which number is the most probable result.

Other layer in our Neural Network use ReLU activation function. ReLU is the most commonly used activation function in neural networks. You can experiment and learn more about diffeernt activation functions on Keras documentation (https://keras.io/api/layers/activations/).


In [7]:
model = Sequential([
  Dense(64, activation='relu', input_shape=(784,)),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

# Compiling the Model

Once the model is created we need to compile it. We need to specify two required arguments: optimizer and loss function.

Optimizer is an optimization algorithm that is used to optimize the training process and adjust the weights on the nodes that lead towards a solution. We will use optimizer that implements the Adam algorithm. Adam is a stochastic gradient descent method (https://keras.io/api/optimizers/adam/) and it is probably the most popular optimizer and it usually performs well.

Loss function computes the quantity that a model should aim to minimize during training (https://keras.io/api/losses/). We will be using categorical_crossentropy functioon.

We are also specifying metrics that are displayed during calling fit() function and trining our model. In our project we will "accuracy" to see how the accuracy changes after each Epoch.

In [8]:
model.compile(
  optimizer='adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Training the model (Option 1)

We will be now training out model using model.fit() function.

We provide train images to the function and the labels. Labels (dependant variables) are converted from a vector (integer) to binary class matrix. Epoch is one pass over the entire dataset and batch_size is the number of images train during one pass though a Neural Network. You can experiment with the number of epochs and batch size to opimise training results.

In [9]:
model.fit(
  train_images, # training data
  to_categorical(train_labels), # training targets
  epochs=5,
  batch_size=32,
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x7f92f3b4b810>

# Saving the model (Option 1)

Once our model is trained we can save it to a file.

In [None]:
model.save_weights('model.h5')

# Loading the model (Option 2)

Saving a model enables us to just load the model without the need of it being train again.

In our project we can train the model quickly but training of a complex Neural Network models can take hours or days and saving a model is a useful thing to do.

In [None]:
# Load the model's saved weights.
model.load_weights('model.h5')

# Preparing testing data

Let's prepare some testing data so we can test the performance of our model.

We take test images and labels from the mnist library, normalize the test images values for better model performance and flatten the images matrics.

In [10]:
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
test_images = (test_images / 255) - 0.5

# Flatten the images.
test_images = test_images.reshape((-1, 784))

# Testing the model

Here we are giving our models to the test.

In [14]:
result = model.evaluate(
  test_images,
  to_categorical(test_labels)
)

print(model.metrics_names[0] + ': ' + str(result[0]))
print(model.metrics_names[1] + ': ' + str(result[1]))

loss: 0.10312977911322378
accuracy: 0.9657999873161316


If the results are not satisfactory we can go back and reconfigure our Network and increase the number of layers or nodes.

We cal also go back to the training and model.fit() function and change the number of epochs or batch_size and see if we get better results.

# Making predictions

Finally let's make some predictions using first five elements of the testinng data

In [15]:
# Predict on the first 5 test images.
predictions = model.predict(test_images[:5])

# Print our model's predictions.
print(np.argmax(predictions, axis=1)) # [7, 2, 1, 0, 4]

# Check our predictions against the ground truths.
print(test_labels[:5]) # [7, 2, 1, 0, 4]

[7 2 1 0 4]
[7 2 1 0 4]
