# Digit Recognition notebook
![MNIST](https://localab.jp/wp-content/uploads/2017/07/MNIST.png)

***

![Keras](https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png)

***

![MNIST](https://achintavarna.files.wordpress.com/2017/11/mnist_2layers.png?w=634)

***

**[https://keras.io/](https://keras.io/)**

*keras*

***


# Digit Recognition Description 
Alright so! What does that python file digitrec.py actually do? It uses a convolutional nueral network (CNNs) to take in an image specified by the user and it returns what it thinks the image is based off of a model. CNNs have been extremely successful in identifying faces, objects and traffic signs. In this case we are using a CNN to identify a specific number by learning on the MNIST dataset.

## Line by Line
Intially we import all neccessary libraries and then we are getting the mnist data from keras, splitting it into training data and testing dataset arrays which have 60000 and 10000 images respectly. Each having 28 x 28 pixels.

The datasets are 3D arrays. Training dataset shape is (60000, 28, 28) & Testing dataset shape is (10000, 28, 28).

The CNN then expects a 4D array which consists of the batch size, height, width and channels (grayscale value of 1). 

We the scale down the values per pixel by a factor of 255 as each pixel ranges from 0 to 255. This will make the data much easier to deal with.

## Global variables
After that we define some global variables:
1. number_of_classes:

This value represents how many outputs our CNN has, in this case we are reading in 10 different shapes 0 to 9.

2. epochs:

The amount of times we loop over the data, One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.

3. batch_size:

Total number of training examples present in a single batch. You can’t pass the entire dataset into the neural net at once. So, you divide dataset into Number of Batches or sets or parts.

In this project we are using one-hot encoding, essentially converts all the numbers to a very basic integer binary matrix where the array only contains only one '1' and the rest of the elements are '0'.

So the number 5 would be represented as: [0,0,0,0,0,1,0,0,0,0]

## Convolution Layer

Here is a summary of the Convolution model we are using.
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 24, 24, 32)        832
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 10, 10, 32)        9248
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 32)          0
_________________________________________________________________
dropout_1 (Dropout)          (None, 5, 5, 32)          0
_________________________________________________________________
flatten_1 (Flatten)          (None, 800)               0
_________________________________________________________________
dense_1 (Dense)              (None, 128)               102528
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290
_________________________________________________________________

Total params: 113,898
Trainable params: 113,898
Non-trainable params: 0
_________________________________________________________________

    
## What does this mean?
deep learning CNN models to train and test, each input image will pass it through a series of convolution layers with filters (Kernals), Pooling, fully connected layers (FC) and apply Softmax function to classify an object with probabilistic values between 0 and 1. The below figure is a complete flow of CNN to process an input image and classifies the objects based on values.

![CNN](https://cdn-images-1.medium.com/max/800/1*XbuW8WuRrAY5pC4t-9DZAQ.jpeg)

# Convolution Layer

Convolution is the first layer to extract features from an input image. Convolution preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as image matrix and a filter or kernal

![CNN](https://cdn-images-1.medium.com/max/800/1*kYSsNpy0b3fIonQya66VSQ.png)

We will see below how the network works for an input ‘8’.

![8](https://ujwlkarn.files.wordpress.com/2016/08/conv_all.png?w=748)

The input image contains 1024 pixels (32 x 32 image) and the first Convolution layer (Convolution Layer 1) is formed by convolution of six unique 5 × 5 (stride 1) filters with the input image. As seen, using six different filters produces a feature map of depth six.

Convolutional Layer 1 is followed by Pooling Layer 1 that does 2 × 2 max pooling (with stride 2) separately over the six feature maps in Convolution Layer 1. You can move your mouse pointer over any pixel in the Pooling Layer and observe the 2 x 2 grid it forms in the previous Convolution Layer (demonstrated in Figure 19). You’ll notice that the pixel having the maximum value (the brightest one) in the 2 x 2 grid makes it to the Pooling layer.

Pooling Layer 1 is followed by sixteen 5 × 5 (stride 1) convolutional filters that perform the convolution operation. This is followed by Pooling Layer 2 that does 2 × 2 max pooling (with stride 2). These two layers use the same concepts as described above.

We then have three fully-connected (FC) layers. There are:

    120 neurons in the first FC layer
    100 neurons in the second FC layer
    10 neurons in the third FC layer corresponding to the 10 digits – also called the Output layer

Notice how in Figure 20, each of the 10 nodes in the output layer are connected to all 100 nodes in the 2nd Fully Connected layer (hence the name Fully Connected).

Also, note how the only bright node in the Output Layer corresponds to ‘8’ – this means that the network correctly classifies our handwritten digit (brighter node denotes that the output from it is higher, i.e. 8 has the highest probability among all other digits).

![output layer](https://ujwlkarn.files.wordpress.com/2016/08/final.png?w=748)

Here is a good visualizing tool: http://scs.ryerson.ca/~aharley/vis/conv/


## Pooling Layer

Pooling layers section would reduce the number of parameters when the images are too large. Spatial pooling also called subsampling or downsampling which reduces the dimensionality of each map but retains the important information. Spatial pooling can be of different types:

    Max Pooling
    Average Pooling
    Sum Pooling

Max pooling take the largest element from the rectified feature map. Taking the largest element could also take the average pooling. Sum of all elements in the feature map call as sum pooling.

![pooling](https://adeshpande3.github.io/assets/MaxPool.png)


## Dropout Layer

dropout refers to ignoring units (i.e. neurons) during the training phase of certain set of neurons which is chosen at random. By “ignoring”, I mean these units are not considered during a particular forward or backward pass. To prevent over-fitting.

## Fully Connected Layer
After finishing the previous two steps, we're supposed to have a pooled feature map by now. As the name of this step implies, we are literally going to flatten our pooled feature map into a column like in the image below.
![](https://www.superdatascience.com/wp-content/uploads/2018/08/CNN_Step3_Img1.png)

## Last Layer 
This is an output layer with 10 neurons(number of output classes) and it uses softmax activation function. Each neuron will give the probability of that class. It’s a multi-class classification that’s why softmax activation function if it was a binary classification we use sigmoid activation function.

## Compile & Fitting the model
So for compiling the model, for the loss function I used categorical_crossentropy, while using Adam for optimizer and adding the accurcy metric for better performance. Fitting the model basically means to train the model, I am using 5 epochs and 200 images to loop through. the test data is used as the validation dataset.

## Evaluation & saving
The test dataset is used to evaluate the model and after evaluation Test loss & Test Accuracy metrics will be printed. Ithen save the model locally to later use without having to run through it again.

I also test a few images in a folder called 'images'. looping over each of them and then prints what it think the image is.

## User specified image test
when the user runs the python code in the terminal, they can specify a specific image they would like to test. as long as the imag is in the images folder.

# End