
# Digit Recognition


## Introduction

This notebook is concerned with explaining the Digit recognition script that I wrote in relation to the MNIST dataset.
In the previous notebook, I went through how to read in the MNIST dataset and also how to save the images to the comoputers file system. I have re used some of that code in this script, so this notebook will mainly be in relation to the parts of the script that have nothing to do with that.

Artificial neural networks are essentially computing systems that were designed based on the biological neural networks found in animals. If you wish to know more about the theory behind neural networks please visit [wikipedia](https://en.wikipedia.org/wiki/Artificial_neural_network).

Each neural network has at least one input layer, one hidden layer and one output layer. Data comes in through the input layer, then an activation fucnction is executed on the data in the the hidden layer/s, and finally the data is is then output. The particular kind of neural network we will be working with is called a "fully connected", or "dense" network, because each node of each layer is connected to every node in the next layer. This will make more sense when I start to go through the program.

![](img/neural.jpeg)


## Imports

In order to be able to use the tensorflow and keras imports, I needed to set up a tensorflow environment using anaconda. For more information on this please visit [Tensor flow environment set up](https://towardsdatascience.com/setup-an-environment-for-machine-learning-and-deep-learning-with-anaconda-in-windows-5d7134a3db10).

In [1]:
import gzip ### For reading in the gzip files.
import numpy as np ### For work with arrays.
import tensorflow as tf ### For using keras.


In [3]:
import numpy as np
import keras as kr ### For building and operating the model.
import sklearn.preprocessing as pre ### For training.
import sys ### For IO ans system exit(0).
import cv2 ### For reading and writing images.
from PIL import Image ### For manipulating images(RGB to grayscale).

### Import the SaveImages file

This file is a simple python script that can save a selection of 20 images to the file system. This is using the same code as seen in the MNIST notebook so I wont go through the code again. It just has one method called saveImages which could be used later depending on the user inputs.

In [4]:
import SaveImages as save

## Define the functions

First, I am going to define any functions that I need throughout the script. Some of these functions will contail code from previous notebooks so I will just provide a brief explanation for those and provide a more indepth explanation for the other functions.

In [5]:
def readTrainImages():
    with gzip.open('data/train-images-idx3-ubyte.gz', 'rb') as f: ### Opens the train images file.
        file_content = f.read() ### Loads the bits from the file into the file_content variable.
    return file_content

def readTrainLabels():
    with gzip.open('data/train-labels-idx1-ubyte.gz', 'rb') as f: ### Use gzip to open the labels file.
        labels = f.read() ### Read the bits from the file into the 'labels' variable.
    return labels

def readTestImages():
    with gzip.open('data/t10k-images-idx3-ubyte.gz', 'rb') as f:
        test_img = f.read() ### Read in the 10000 test images
    return test_img

def readTestLabels():
    with gzip.open('data/t10k-labels-idx1-ubyte.gz', 'rb') as f:
        test_lbl = f.read() ### Read in the corresponding 10000 test labels
    return test_lbl

The above four functions read in from any of the four files in the data file. The code in these functions was explained in the MNIST dataset ntebook. These functions can now be called from anywhere in the script, any time we need access to the data in those files. 

In [8]:
def createModelRelu():
    model = kr.models.Sequential() ### Start a neural network, building it by layers.
    model.add(kr.layers.Dense(units=600, activation='linear', input_dim=784)) ### Add a hidden layer with 1000 neurons and an input layer with 784.
    model.add(kr.layers.Dense(units=400, activation='relu')) ### Using 'relu' activation function.

    model.add(kr.layers.Dense(units=10, activation='softmax')) ### Add a 10 neuron output layer.
    return model

def createModelSigmoid():
    model = kr.models.Sequential() ### Start a neural network, building it by layers.
    model.add(kr.layers.Dense(units=600, activation='linear', input_dim=784)) ### Add a hidden layer with 1000 neurons and an input layer with 784.
    model.add(kr.layers.Dense(units=400, activation='sigmoid')) ### Using 'relu' activation function.

    model.add(kr.layers.Dense(units=10, activation='softmax')) ### Add a 10 neuron output layer.
    return model 

def createModelTanh():
    model = kr.models.Sequential() ### Start a neural network, building it by layers.
    model.add(kr.layers.Dense(units=600, activation='linear', input_dim=784)) ### Add a hidden layer with 1000 neurons and an input layer with 784.
    model.add(kr.layers.Dense(units=400, activation='tanh')) ### Using 'relu' activation function.

    model.add(kr.layers.Dense(units=10, activation='softmax')) ### Add a 10 neuron output layer.
    return model

The above functions define how we want our models to be created, more specificallt, what activation function we want to use to create out model. 

### What is an activation function?

An activation function is simply a mathematical function that is performed on the data on a neuron. For more information about the mathematics behind activation functions please visit [Activation functions explained](https://towardsdatascience.com/activation-functions-and-its-types-which-is-better-a9a5310cc8f).

I have given the user the choice between three common activation functions; 'relu', 'sigmoid', and 'tanh'. The reason I have done this is so that the user can see that different activation functions produce a slightly different model by comparing the results.

The kr.models.Sequential() function informs that we will be adding layers to this model in a sequential fashion.

The two kr.layers.Dense() functions represents the hidden layer. In it we can see  that we specify the amount of neurons we want in the layer, the activation function, and  for the first one we always specify the input dimension(784 in this case). This therefore infers that the input layer has 784 neurons and is a dense layer meaning each neuron is connected to every neuron on the next layer.

the last model.add() function specifies the amount of output neurons we want. In our case it is ten because the digids can only be a value between 1 and 10, meaning 10 possible values.



### PredictAll() function

In [9]:
def predictAll(encoder, model, test_img, test_lbl):
    print((encoder.inverse_transform(model.predict(test_img)) == test_lbl).sum()) ### Print out the amount of digits it correctly predicts by comparing it to the corresponding element in the labels file

    for i in range(10):
        result = (encoder.inverse_transform(model.predict(test_img[i:i+1]))) ### Output the first 10 elements of the return array and labels array so the user can visualize it better.
        print("PREDICTION: " + str(result))
        print("ACTUAL: " + str(test_lbl[i]))

This function will be called later when the model is built and trained. This method takes in 4 parameters; encoder(Which I will talk about later), the model itself, the testImage array and the testLabel array. Once we have a handle on all of those, we can call model.predict on the testImage array and do a comparison of the return value to that of the corresponding element in the label array. This will return a boolean true/false value to which we call.sum which will add up all of the times the return is true(Because it has to do 10,000 comparisons(10,000 images)). 

The second part of that function simply outputs the the first 10 values of the labels array and the first 10 values returned by the model, this is just so that the user can see that sometimes it can get a number wrong, but it will mostly get it right.  

### Read an image from the file system

In [13]:
def readImageAsBytes(imageName): ### Function to read in a png file from file system.
    b = cv2.imread(imageName + ".png") ### Read in the file. It will be in format [28(28*3)].

    if b is None:
        print("File does not exist or is not an image file. Try again") ### Error message to tell the user why it failed
        sys.exit(0)

    gray = cv2.cvtColor(b,cv2.COLOR_BGR2GRAY) ### Change the RGB values to gray scale values so the network can process them. File is now in format [28*28].
    img = np.array(gray).reshape(784,1)/255 ### Reshape the image to an array of shape [784*1].

    return img ### Return that array.

The above method will be called in the event that the user wants to test the model against an actual mng image they have in the file system. If they select this option, the program will call the saveImages() methos I spoke about earlier, to save 20 png images of digits to their device. They can choose any one of those and pass its name into this function for processing. 

The cv2.imread is an openCV method that will read in the image from the file system. This is followed by some simple error checking, the sys.exit(0) method simply exits the program if the value for b is null or is not a png image.

The  gray = cv2.cvtColor(b,cv2.COLOR_BGR2GRAY) function changes the image format from RGB to gray scale. This means that instead of having an array of size (28*(28\*3), we have a 28\*28 array because grayscale is represented with 1 value whereas RGB is represented using 3. This is important when it comes to trying to fit this array into our model.

The img = np.array(gray).reshape(784,1)/255 then reshapes the 28\*28 array into a 784\*1 array because our model has 784 input neurons. We then divide each value by 255 so each value is a number between 0 and 1 for the activation function to process.

### PredictFile() function

In [12]:
def predictFile(b, model, encoder): ### Function to predict the contents of a file passed from the file system.
     bArray = np.array(list(b)).reshape(1, 784).astype(np.uint8) ### Reshape the array to [1*784] so it fits the network(784 input neurons).
    
     result = (encoder.inverse_transform(model.predict(bArray))) ### Create a result.
     print("PREDICTION: " + str(result)) ### Output prediction
     print("================= SINGLE FILE PREDICTED ======================")

This function will be called when the use selects to test the model against a png file on the system and has already loaded the image into the program with the readImageAsBytes(imageName) function. Thus method take three parameters and is quite similar to the predictAll() function. The three parameters are b - the image file that was read in, the model and the encoder(which I will discuss later). 

The  bArray = np.array(list(b)).reshape(1, 784).astype(np.uint8) converts the b(file) array to 1\*784 so that the model can read in each input into a separate neuron. We then call model.predict(bArray) on the bArray and output the predicted result to the console. 

### PassFile() function

In [14]:
def passFile(model, encoder): ### Function to allow the user to pass a selected file network.
    fileName = input("Please enter the name of the file you wish to test.(Just the file name, no extensions)") ### Instruct the user how to preceed.
    theImageFile = readImageAsBytes(fileName) ### Turn the image into a gray scale array of the correct format.
    predictFile(theImageFile, model, encoder) ### Predict the result.
    

This is the function that calls the two functions listed just above it. This method is called when the user selects that they would like to test the model against a file on the system. When they do that, this method is triggered asking them to input the fileName. This is then passed to the readImageAsBytes(imageName) to return an array representing the image. This array is passed to the the predictFile(b, model, encoder) method which tests the image file against the model. The model and encoder are passed into this mathod at run time and are just passed through the stack to the predictFile(b, model, encoder) me