# Digit Recognition Notebook

This notebook explains what the ```digitrec.py``` file does and how it does it.

## Introduction
The file contains a class called _DigitRecognition_ and a starter script for it which allows passign values to the class through command line arguments. 

The class trains a choice of neuronetwork to recognize hand written numbers from images. Once the netwrok is trained it tries to predict a digit from a provided source.

This application can be built in into automated processes as it is completely configurable from with command line arguments.

## Image Dataset
MNIST dataset is used to train the model. More about MNIST and how the dataset is read in [here](./mnist-dataset.ipynb)

The dataset is aquired from MNIST's website or it can be placed into ```data``` folder beside ```digitrec.py``` file.
The program will detect if the files in ```data``` folder exist. If they dont exists then they will be downloaded.

Each file is opened up and stored internally as a 2D array for later use.

The images are stored as a flat array of 784 and normalized to be between 0 and 1. This allows the nerunetwork to be more accurate as it only have to deal with small numbers.

Could have used the keras provided dataset, but I decided to utilize the code developed in MNIST notebook

## Neuronetworks and classifiers

* [K-Nearest-Neighbors](https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7)
* [Keras](https://keras.io/) custom network.
* [GaussianGB](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html)
* [MLPClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)
* [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)

### K-Nearest-Neighbors(KNN)
KNN is one of the most commonly used machine learning algorithm. It can be used for classification or regression. 
The algorithm checks k amount of near values of the input and by that it decides which group the value most likely belongs to.
### Keras
Keras is a neural networks api. The library can run on top of TensorFlow,CNTK, or Tehano, The api allows the building of custom and highly configurable neuronetworks. The neuronetworks can be run on both CPU or GPU. 

The netwrok used consist of two layers:
1. A dense input layer consiting of 1568 neurons and using [relu](https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning) for actiation function
2. A dense output layer consisting of 10 neurons and using [softmax](https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax) for actiovation functions

The input data is the 60000 images from mnist in a (60000,784) shape.


The label values are converted to a binary matrix in one-hot format. This is a requirement of the loss function which is ```categorical_crossentropy```. It means instead of each value a list is placed consisting of 0s and a single 1 the one is placed at the index which equals the original value minus one e.g. if the value was 9 then the 1 is going to be at index 8.
It is workking well together with ```softmax```.

The output is set of numbers (10 of them). The highest number's index is the predicted number. To determine the highest value Numpy's [argmax](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.argmax.html) is used.

RMSProp is used as optimizer functon. It performed the faster.

The above set up was designed to be quick and accurate. It could be more accurate if it was built as convolutional neural network(CNN) rather than the traditional multilayer perceptrons(MLP). A convolutional neural network is designed for pictures as the layers can take 2D arras ([conv2d](https://keras.io/layers/convolutional/)) then flatten them out to be a single array.

I did not use CNN as my computer would have taken hours to run it, however an implementation of it is avaliable by the keras team:
* [CNN](https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py)
* [MLP](https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py)

### Experimenting with keras to find optimal set up
I tried to use ```sigmoid``` activation function for input data, but it produced slower learning.

I tired to add more layers. Neither fullyconnected or dropout layers increased accuracy it only effected learning speed manily slowed down)

The number of repetitions(epochs) highly effects the learning accuracy. If 1 epoch was used the accuracy stopped about 0.92. 
Epoch 20 is ideal as after that the netwrok doesn't seems to train higher.

The bach sie is a factor which highly effects learning speed. If the bach number is too low then then it acts like a bottle neck and slows down the learning process in exchange of a minimal accuracy exchange in the first epoch. If it is too large then the learning is quicker but the number of epochs has to be increased as the accuracy in each epoch drops. 128 was chosen to be an ideal bach size as the learning is still wuick and a smaller amout of epochs is enough.

I tested every optimization function mentioned at [this](https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f) article. RMSProp perfomed the best from all of them.

## Digit recognition from input picture
The program is able to recognise a single digit from a picture.
The picture is preprocessed for better match:
* If the size is larger than 28\*28 pixels. It is resized to be exactly 28\*28.
* The picture is converted to gray scale
* As most handwritten stuff are on white paper, the picture is inverted so the white papre is represented as 0s
* Once the picture is converted into an array it is normalised to be between 0 and 1


## Code structure
The main functionality of the program is organised into a single class. This class contains methods for:
* Dataset download/open
* Dataset unzipping from gzip format
* Processor for images and labels in the dataset
* Model choser
* Method for training a model
* Method for reading in an input image or folder
* Method to predict a number with the trained up model
* Save model into a file and load previous model

### [argparse](https://docs.python.org/3.6/library/argparse.html#module-argparse)
```argparse``` is used for reading in command line arguments for the class

## How to use
usage: digitrec.py [-h] [--model {keras,knn}] [--verbose]
                           [--checkaccuracy] [--limit] [--image IMAGE]

Optional arguments:
*  ```-h```, ```--help```    Show this help message and exit
*  ```--model {keras,knn,mlpc,gaussian,svc}``` The model to use. One of: kreas, knn. Default is keras
*  ```--verbose```           If flag exist, extra informations is provided about MNIST files
*  ```--checkaccuracy```     If flag exist, the trained model will be checked for accuracy
*  ```--limit```             If flag exist, the model will use only 1000 records to train and test. This does not apply for keras!
*  ```--savemodel```         Save the trained model into a file to speed up the application run for next time.
*  ```--loadmodel```         Load trained model from file. This will disregard the `--model` attribute
*  ```--image```             Path for an image to recognise the number from. It can take a directory path with images in it. If a direcotry path is supplied the last / has to be omitted

The output of the application can be piped into a file the following way:
```python digitrec.py > output.txt```

## Performance

MNIST data set is initialised under 0.5 second with reading and parsing. My custom kreas network trains up in approximately 100s and it is 98% accurate.

The mesurements were taken with the following config:
* i5-4750 @ 3.2GHz processor
* 8GB memory
* Samsung SSD

The application uses CPU for computations


## Room for imprevement
The program could be improved in a few ways:
* Change to GPU computation
* Change keras to [Convolutional neural network](https://en.wikipedia.org/wiki/Convolutional_neural_network)