# Digital recognition with the mnist dataset

This notebook will investigate the classification and identification of hand written digits using a neural network.<br/>
The mnist dataset will be first used to train the network and then test the networks performance in recognising a digit.<br/>
Once training has been completed a single image from the dataset will be passed to the network and the result will be displayed to the screen along with the actual digit expected.<br/>
![Mnist Image](https://corochann.com/wp-content/uploads/2017/02/mnist_plot-800x600.png)
<cite>Image source https://corochann.com/wp-content/uploads/2017/02/mnist_plot-800x600.png</cite>


## Packages needed for the program to run

The following packages will need to be imported for creating the network and importing the images to memory:
* The keras package used for creating the network
* The gzip package used for unzipping the dataset images and labels
* The numpy package used for altering the dataset into numpy arrays
* The sklearn pre-processing package used for classification and binary encoding each digit
* The random package used to generate a random value for the test images



In [1]:
# Importing the packages 
import keras as kr
import numpy as np
import sklearn.preprocessing as pre
import gzip

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## Building the neural network
To begin we need to initialise the network using the sequential model.<br/>
This allows us to add layers as we need them. <br/>
These layers can be tweaked to increase performance.<br/>
We will investigate this later in the notebook.



In [2]:
# Initialise the neural network
model = kr.models.Sequential()

## Adding the layers to the network
To add layers to the network the layers method from keras will be used.<br/>
There will be a dense connection between neurons meaning that every neuron from the input is connected to every neuron in the middle layer and every neuron frim the middle layer is connected to every neuron on the output layer.

![Neural Network](https://cdn-images-1.medium.com/max/800/1*jYhgQ4I_oFdxgDD-AOgV1w.png)
<cite>Image source https://cdn-images-1.medium.com/max/800/1*jYhgQ4I_oFdxgDD-AOgV1w.pngS</cite>

* In the below code segment the units attribute represents the amount of neurons that will be in the middle layer in this case we have 1000 neurons.<br/>
* The activation attribute sets the activation function in this case we are using  [relu activation](https://keras.io/activations/) the relu activation has a steeper gradient than softmax and as a result speeds up the training process without the loss of performance. 

* The final attribute is used to set the amount of input neurons the network has. In the below example the number is set to 784 as this is equal to the number of bytes each image has within the mnist dataset.


In [3]:
# Add a hidden(middle layer) with 1000 neurons and an input layer with 784.
# There are 784 input neurons as this value is equal to the total amount of bytes each image has.
model.add(kr.layers.Dense(units=1000, activation='relu', input_dim=784))


## Output layer
The output layer has ten neurons that will map to the amount of training labels that are within the dataset. The predicted results are sent from the middle layer to the output layer and compared to the actual number that has been sent in as image data.<br/>
The closer to the value one the result is the more accurate the algorithm is performing.<br/>
While this process is repeating the loss point of gradient decent converges towards the base of the slope. <br/>
The process ends when all of the epochs have completed which will be explained later in this notebook.


In [4]:
# Add ten neurons to the output layer
model.add(kr.layers.Dense(units=10, activation='softmax'))

## Building the model

The compile method is used to build the model based on each layer created along with their connections specified in the above cell.</br>
* The first argument [categorical_crossentropy](https://keras.io/losses/) creates a vector to hold the values of each digit as a binary representation, this will be set with the pre.LabelBinarizer() to be discussed further in this notebook.
* The second optimizer argument is set to [stochastic gradient descent optimizer](https://keras.io/optimizers/) This sets the learning rate, and the decay of this learning rate over time.
* The final argument [metrics](https://keras.io/metrics/) is used to output the performance to the neural network after each run of data has been sent from the central layer to the output layer.

In [5]:
# Build the graph.
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

## Opening the files in .gz format

As discussed in my previous [mnist notebook](https://github.com/kevgleeson78/Emerge-tech-assign/blob/master/Mnist%20Dataset.ipynb) the gzipped files are opened and read using the gzip package.


In [6]:
# Open the gzipped files and read as bytes.
with gzip.open('data/train-images-idx3-ubyte.gz', 'rb') as f:
    train_img = f.read()

with gzip.open('data/train-labels-idx1-ubyte.gz', 'rb') as f:
    train_lbl = f.read()

## Reading in the data into memory

Each of the 60000 images and labels are then stored into their respective variables.<br/>
We dived by 255 to convert the grey scale value to a value between one and zero.<br/>
These values are then used by the neural network in conjunction with the softmax function.


In [7]:
# read in all images and labels into memory
train_img = ~np.array(list(train_img[16:])).reshape(60000, 28, 28).astype(np.uint8) / 255.0
train_lbl =  np.array(list(train_lbl[8:])).astype(np.uint8)

## Flattening the data into a single array
The data is converted from a three dimensional array to a one dimensional array where all of the image bytes (28 *28) 784 are sequentially stored one after another.<br/>
This technique is used so each byte representing the image can have a one to one mapping to the neural networks input layer.


In [8]:
# Flatten the array so the inputs can be mapped to the input neurons
inputs = train_img.reshape(60000, 784)
inputs[0:1]

array([[1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.        , 1.        ,
        1.        , 1.        , 1.        , 1.  

## Encoding the data
The label data is encoded into a matrix of 10 x 10 this will represent the digits in binary format.
Firstly we to setup the matrix using the labelBinerizer function.<br/>
The fit function passes the training labels as an argument. AS the set of labels are from zero - nine the (encoder.fit) function generates a matrix based on these values. In this case it will be a 10 x 10 matrix.


In [9]:
# encode the labels into binary format
encoder = pre.LabelBinarizer()
# get the size of the array needed for each category
encoder.fit(train_lbl)

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

## Transforming the labels
The labels are then transformed to a binary value based on the decimal value of the label.</br>
With each number being transformed to the following:
* (0) 1000000000
* (1) 0100000000
* (3) 0010000000 

And so on until we reach the number nine which is 0000000001.<br/>


In the below example the number five has the representation of '0 0 0 0 0 1 0 0 0 0'


In [10]:
# encode each label to be used as binary outputs
outputs = encoder.transform(train_lbl)
# print out the integer value and the new representation of the number
print(train_lbl[0], outputs[0])

5 [0 0 0 0 0 1 0 0 0 0]


### Full example
Below is a full view of the matrix.

In [11]:
# print out each array
for i in range(10):
    print(i, encoder.transform([i]))

0 [[1 0 0 0 0 0 0 0 0 0]]
1 [[0 1 0 0 0 0 0 0 0 0]]
2 [[0 0 1 0 0 0 0 0 0 0]]
3 [[0 0 0 1 0 0 0 0 0 0]]
4 [[0 0 0 0 1 0 0 0 0 0]]
5 [[0 0 0 0 0 1 0 0 0 0]]
6 [[0 0 0 0 0 0 1 0 0 0]]
7 [[0 0 0 0 0 0 0 1 0 0]]
8 [[0 0 0 0 0 0 0 0 1 0]]
9 [[0 0 0 0 0 0 0 0 0 1]]


## Training the model
We are now ready to begin training the network to recognise the images.</br>
The training set of 60000 images are used and passed to the networks first layer of 784 neurons.<br/>
Model parameters:
1. The encoded training images are sent as input
2. The encoded training labels are attached as the expected output
3. Epochs is the amount of times the 60000 images will be processed 
4. The batch size sets the amount of images that will be sent to the network as one unit



In [12]:
# Start the training
# Set the model up by adding the input and output layers to the network
#The epochs value is the amount of test runs are needed
# The batch_size value is the amount of images sent at one time to the network
model.fit(inputs, outputs, epochs=20, batch_size=100)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x22452d4d908>

## Results of the training

The above result shows that the network is getting approx 93.6% of the images correct for the training set after 20 runs of the test.<br/>
As we can see with each epoch the performance is slightly increasing and the loss is converging towards zero (The optimal value).

Below we will try for a further 10 epochs to see if the performance increases. Please note that the previous cell needs to have run first so the learning data can be carried over rerun all cells to be certain.



In [13]:
# Start the training
# Set the model up by adding the input and output layers to the network
#The epochs value is the amount of test runs are needed
# The batch_size value is the amount of images sent at one time to the network
model.fit(inputs, outputs, epochs=10, batch_size=100)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x22452c8d048>

## Result

As we can see there has been an increase of almost 1.5% in performance. We will now try a further 20 epochs to see if it will further increase the accuracy.


In [14]:
# Start the training
# Set the model up by adding the input and output layers to the network
#The epochs value is the amount of test runs are needed
# The batch_size value is the amount of images sent at one time to the network
model.fit(inputs, outputs, epochs=20, batch_size=100)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x22452d4d828>

## Result
By running a further 20 epochs we have got a result of approx 96.5% a further increase of 1.5% accuracy.<br/>
This indicates that the more epochs are run with these parameters set we could get to an accuracy of 98% if enough tests were run.<br/>
For the purpose of testing the network with the test images this level of accuracy will suffice.


## Testing the network with test images

The test images and labels are unzipped and stored in memory using the same methods as the training images and labels.<br/>

A single image can then be sent to the network to see if it is identifying the number correctly.


In [15]:
# open the gzipped test images and labels
with gzip.open('data/t10k-images-idx3-ubyte.gz', 'rb') as f:
    test_img = f.read()

with gzip.open('data/t10k-labels-idx1-ubyte.gz', 'rb') as f:
    test_lbl = f.read()

# Store each image and label into memory
test_img = ~np.array(list(test_img[16:])).reshape(10000, 784).astype(np.uint8) / 255.0
test_lbl =  np.array(list(test_lbl[ 8:])).astype(np.uint8)

## Show the performance results

The below result shows that 9622 images have been identified correctly this matches the final accuracy of the training output.

In [16]:
# Show the total number of correct images identified out of 10000 test images
(encoder.inverse_transform(model.predict(test_img)) == test_lbl).sum()

9622

## Passing an image to the network
In the below example the 128th image in the dataset is passed to the network for identification.<br/>
The result is then printed out for us to examine as an array.<br/>
The index with the highest value within this array represents the number that has been picked by the network.<br/>
In this case the number identified by the network is eight as the ninth position in the array is the highest value.



In [17]:
test = model.predict(test_img[128:129])
# Print the
print(test)

[[7.7011225e-07 2.8283930e-07 1.2899119e-04 3.1284147e-03 4.0528944e-06
  2.1652140e-04 1.4169835e-07 7.7335981e-06 9.9639517e-01 1.1795169e-04]]


## Printing out the results.

We can get the array index with the highest value by using the argmax function.<br/>
This will return the index of the array with the highest value in this case it is eight.<br/>
Additionally the label for the image can be accessed using the same index as used for the test image.

In [18]:
# Get the maximum value from the machine predictions
pred_result = test.argmax(axis=1)

print("The machine prediction is : =>> ",  pred_result)
print("The actual number is : =>> ", test_lbl[128:129])

The machine prediction is : =>>  [8]
The actual number is : =>>  [8]


## Testing some more images 

Below we will test twenty more images selected at random to see if the network is performing as expected.

In [19]:
# Random int adapted from https://stackoverflow.com/questions/3996904/generate-random-integers-between-0-and-9
from random import randint
for i in range(20):
    print("Test Number : ", i+1,"\n")
    x = randint(0, 9999)
    print("The random index: ", x, "\n")
    print("The result array: ")
    test = model.predict(test_img[x:x+1])
    # Print the
    print(test, "\n")
    # Get the maximum value from the machine predictions
    pred_result = test.argmax(axis=1)

    print("The machine prediction is : =>> ",  pred_result)
    print("The actual number is : =>> ", test_lbl[x:x+1])
    print("##############################################")


Test Number :  1 

The random index:  9585 

The result array: 
[[9.9914372e-01 2.0904394e-09 5.7525869e-04 2.0829431e-05 8.3112656e-07
  4.2966708e-06 5.6888615e-07 4.2362924e-05 1.7303921e-06 2.1024706e-04]] 

The machine prediction is : =>>  [0]
The actual number is : =>>  [0]
##############################################
Test Number :  2 

The random index:  3455 

The result array: 
[[7.5665518e-09 9.9746394e-01 1.1411760e-04 1.1792567e-03 3.8292470e-05
  1.9217863e-05 1.6589125e-05 3.6634746e-04 6.1073987e-04 1.9149524e-04]] 

The machine prediction is : =>>  [1]
The actual number is : =>>  [1]
##############################################
Test Number :  3 

The random index:  6709 

The result array: 
[[1.7422532e-04 1.1418350e-06 5.2828265e-05 5.7321080e-07 5.2959567e-05
  3.7038210e-03 1.6073321e-04 1.6202266e-07 9.9534059e-01 5.1296729e-04]] 

The machine prediction is : =>>  [8]
The actual number is : =>>  [8]
##############################################
Test Number :  4

## Result
The above output shows that the network has all of the numbers correct on this run.<br/>
However as it has a 96.5% accuracy it will get 3.5 predictions wrong out of every 100 test images.
