# Digit Recognition Notebook

Author: Kevin Delassus - G00270791

### Notebook Purpose
The purpose of this notebook is to explain how the script file [digitrec.py](http://localhost:8888/edit/digitrec.py) works and also explain its performance.

### Introduction

There can be many challanges assoiated to detecting hand written digits. If we take a human for example, humans and effortly recognize digits due to humans having a primary cortex in each hemisphere of our brain. Each primary cortex contains 140 million neurons and tens of billons of connections yet human vision involves not just primary cortex, but an entire series of visual cortices doing progressively more complex image processing. Recognizing handwritten digits isn't easy. Rather, we humans are stupendously, astoundingly good at making sense of what our eyes show us. But nearly all that work is done unconsciously. And so we don't usually appreciate how tough a problem our visual systems solve.  

![image](https://alleninstitute.org/media/filer_public/74/44/74443675-6280-49a1-8362-61cecb90681c/neurons_all_16_large_blackbg-reid.jpeg)

Programming a Neural Networks makes this problem of digit detection easier to solve. Similar to how a child learns to recognise objects, we need to show an algorithm millions of pictures of different digits before it is be able to generalize the input and make predictions for images it has never seen before.

Computers see images in a different way than humans do. They can only see numbers. Every image can be represented as 2-dimensional arrays of numbers, known as pixels. 

![image](https://cdn-images-1.medium.com/max/1600/1*ccVO7341XIh7GfvzQS1IGw.png)

I used Keras to create my neural network. [Keras](https://keras.io/) is a high-level neural networks API, written in Python and capable of running on top of TensorFlow. Another great reason for using Keras is that it already contains the MNIST dataset meaning we don't need to go de-compress the train and test files.

I used a Convolutional Neural Networks approach opposed to using simple Neural Network due to it being more accurate. This will be explained in more detail below.

### Convolutional Neural Networks

A Convolutional Neural Network is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.

Convolutional Neural Networks have a different architecture than regular Neural Networks. Regular Neural Networks transform an input by putting it through a series of hidden layers. Every layer is made up of a set of neurons, where each layer is fully connected to all neurons in the layer before. Finally, there is a last fully-connected layer — the output layer — that represent the predictions.

Convolutional Neural Networks are a bit different. First of all, the layers are organised in 3 dimensions: width, height and depth. Further, the neurons in one layer do not connect to all the neurons in the next layer but only to a small region of it. Lastly, the final output will be reduced to a single vector of probability scores, organized along the depth dimension.

### digitrec.py Explained

**Please note that the code snippets below are not intended for running. Certain parts of the code have been removed to be able to explain the core parts of the script. If you intend on running the code please use the digitrec.py script file.**

#### Structure
I wanted to structure the script so that you did not need to keep re-training the model. So when you start the script it asks you whether you wish to train a model on the MNIST dataset or test an existing saved model. This is done by saving the model in json. If you decide to train a model then script it will automaticly go into the test phase.

#### Imports

In [1]:
# Imports
import keras as kr
from keras.datasets import mnist
from tkinter import filedialog
from tkinter import *
import numpy as np
import cv2
import matplotlib.pyplot as plt
from keras.models import model_from_json
import os

Using TensorFlow backend.


ModuleNotFoundError: No module named 'cv2'

#### Loading MNIST Dataset
The step below loads the MNIST dataset into different arrays. train_img which contain all the training images, train_lbl which contains all the training labels, test_img which contains all the test images and test_lbl which contains all the test labels

In [None]:
# load data from keras.datasets
(train_img, train_lbl), (test_img, test_lbl) = mnist.load_data()

#### Reshaping data
The datasets are 3D arrays. Training dataset shape is (60000, 28, 28) & Testing dataset shape is (10000, 28, 28).
The input shape that CNN expects is a 4D array (batch, height, width, channels). Channels signify whether the image is grayscale or colored. In our case, we are using grayscale images so we give 1 for channels if these are colored images we give 3(RGB). Below code for reshaping our inputs.

We are also one hot encoding the train and test images & labels. Our Datasets will have data in each pixel in between 0–255 so now we scale it to 0–1 using below code.

In [None]:
# Reshape to the expected CNN format 
train_img = train_img.reshape(train_img.shape[0], train_img.shape[1], train_img.shape[2], 1).astype('float32')
test_img = test_img.reshape(test_img.shape[0], test_img.shape[1], test_img.shape[2], 1).astype('float32')

# One hot encode train_img & test_img
train_img/=255
test_img/=255

# one hot encode
train_lbl = kr.utils.to_categorical(train_lbl, 10)
test_lbl = kr.utils.to_categorical(test_lbl, 10)

#### Building the Convolutional Neural Network
Below I am building a Convolutional Neural Network.
- The first layer of code is a hidden layer called a Convolution2D. The layer has 32 filters/output channels, which with the size of 5×5 and an activation function. This is the input layer, expecting images with the structure outlined above (height, width, channels).
- The Second layer is the MaxPooling layer. MaxPooling layer is used to down-sample the input to enable the model to make assumptions about the features so as to reduce over-fitting. It also reduces the number of parameters to learn, reducing the training time.
- The third layer is a hidden layer with 32 filters/output channels with the size of 3×3 and an activation function.
- The Forth layer is a MaxPooling layer.
- The Firth layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.
- The sixth layer converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.
- The seventh is a fully connected layer with 128 neurons.
- The eighth and final layer is a output layer with 10 neurons and it uses softmax activation function. Each neuron will give the probability of that class. It’s a multi-class classification that’s why softmax activation function if it was a binary classification we use sigmoid activation function.

In [None]:
model = kr.models.Sequential()

model.add(kr.layers.convolutional.Conv2D(32, (5, 5), input_shape=(train_img.shape[1], train_img.shape[2], 1), activation='relu'))
model.add(kr.layers.convolutional.MaxPooling2D(pool_size=(2, 2)))
model.add(kr.layers.convolutional.Conv2D(32, (3, 3), activation='relu'))
model.add(kr.layers.convolutional.MaxPooling2D(pool_size=(2, 2)))
model.add(kr.layers.Dropout(0.2))
model.add(kr.layers.Flatten())
model.add(kr.layers.Dense(128, activation='relu'))
model.add(kr.layers.Dense(10, activation='softmax'))

#### Compiling the Model
To complile the model I used categorical_crossentropy as a loss function because its a multi-class classification problem. I used Adam as Optimizer to make sure our weights optimized properly. I used accuracy as metrics to improve the performance of our neural network.

In [None]:
# Compiling of the Model
model.compile(loss='categorical_crossentropy', optimizer=kr.optimizers.Adam(), metrics=['accuracy'])

#### Training the Model
The model is going to fit over the user defined epochs and updates after every 200 images training. For the notebook we are going to specify the epochs to 10. The test data is used as the validation dataset, allowing you to see the skill of the model as it trains.

In [None]:
# Fit the model
epochsNum=10
model.fit(train_img, train_lbl, validation_data=(test_img, test_lbl), epochs=epochsNum, batch_size=200)

#### Store Model
I decided it would be a good idea to store the model on file. This give the user the option to reuse the model without having to re-train. To store the model I am using kera's model to JSON library. Two files are created. One a model.json and two a model.h5 file which is a [HDF5.](https://support.hdfgroup.org/HDF5/whatishdf5.html) Both files are stored within a folder called models.

In [None]:
# serialize model to JSON
model_json = model.to_json()
with open("models/model.json", "w") as json_file:
    json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("models/model.h5")
print("Saved model to disk")

#### Evaluation of the Model
The test dataset is used to evaluate the model and after evaluation Test loss & Test Accuracy metrics will be printed. I actived a 99% accuracy.

In [None]:
# Evaluation of the model
metrics = model.evaluate(test_img, test_lbl, verbose=0)
print("Metrics(Test loss & Test Accuracy): ")
print(metrics)

#### Making Predictions
I wanted to be able to test my own hand writen digits efficiently without having to change the script code. To do this I used [tkinter](https://wiki.python.org/moin/TkInter) to create a gui to be able to select the desired image to be tested. The image below is a snap of the GUI. To create the test images I used [GIMP](https://www.gimp.org/) which is a free open source raster graphics editor used for image retouching and editing, free-form drawing, converting between different image formats, and more specialized tasks.

![alt text](notebook-images/ImageSelectBox.png)

#### Load Saved Model
I am going to do the opposite of what we did to save the model. First I loaded the the model.json and used that to create a model. Then I loaded the weights into the new model.  

In [None]:
 # load json and create model
json_file = open('models/model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("models/model.h5")
print("Loaded model from disk")

#### Open Dialog Box
Using thinter I created a dialog to select an image to try and predict. The dialog can only accept a jpeg file. 

In [None]:
# Asking user to enter own image for testing
root = Tk()
root.testImage =  filedialog.askopenfilename(initialdir = "C:\\",title = "Select Image",filetypes = (("jpeg files","*.jpg"),("all files","*.*")))
print(root.testImage)

#### Read Image and Resize
The selected image is then read in and sized now to 28x28 using openCV. 
Like the train and test images, the predict image needs to be reshaped to the expected CNN format.
The last set in preparing the image before prediction is to one hot encode.
The image can now be passed in to be predicted.

In [None]:
# Reading image and resizing it to correct CNN format
imgFile = cv2.imread(root.testImage)
img = cv2.resize(imgFile, (28, 28))
arr = img.reshape(-1,28, 28, 1).astype('float32')

# One hot encode arr
arr/=255

#### Make Prediction on Image
Below we are making the prediction by using the model that was loaded from file, calling the predict_classes and passing in the images. A result is then stored in a prediction variable. 

In [1]:
# Making prediction
result = loaded_model.predict_classes(arr)
prediction = result[0]

NameError: name 'loaded_model' is not defined

#### Displaying Result
Finally the selected image is displayed using matplotlib.pyplot and the predicticted result is then shown above the image.

In [None]:
# Displaying prediction
print("Class: ",prediction)

# Showing image and predicted result
plt.imshow(imgFile)
plt.title(prediction)
plt.show()

#### Previous Result
![alt text](notebook-images/result.png)

## Conclusion

To conclude on this notebook, Convolutional Neural Networks are definatly one the best for creating image classification and recognition software. If offers higher performance compared to other types of neural network models. A tradition model is expected to give an average of 9% when trained on the MNIST dataset compared to 99% on a Convolutional Neural Network. This is a huge difference. 

### END