#Handwritten digit detection with a DNN

In this lesson, we will do more machine learning. We will configure, train, and use a *Deep Neural Network* (DNN) to recognize handwritten digits. This is a step up from the face detection in the last lesson, because this code is able to recognize *which* digit an image of a handwriten character represents. The previous lesson only found faces; it did not recognize who the faces belong to.

The code below will use Tensor Flow and Keras, Python modules designed for machine learning applications. We will still use `skimage` to display images, so our first cell is the familiar configuration code.

In [None]:
# one-time imports and configuration
import skimage.io

import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 150

from matplotlib import pyplot as plt

For this lesson, we will use the *Modified National Institute of Standards and Technology* (MNIST) database. According to this <a href="https://en.wikipedia.org/wiki/MNIST_database">Wikipedia article</a>, the MNIST database is "... a large database of handwritten digits that is commonly used for training various image processing systems." The same article has a sample image showing some of the scanned digits:

<img src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png" alt="Sample MNIST digits">

Our first step is to access this database. Luckily, the `tensorflow` module has convenience methods to allow us to access the data, as shown in the next cell.



In [None]:
import tensorflow as tf

# reference to the mnist dataset object
mnist = tf.keras.datasets.mnist

# load the data, save the data in training and testing X and y
(X_train, y_train), (X_test, y_test) = mnist.load_data()


The `mnist.load_data()` method call downoads the dataset to our environment, and splits the data into four parts, which we have labeled `X_train`, `y_train`, `X_test`, and `y_test`. Let us explain what those four parts are. 

In machine-learning-speak, an uppercase `X` represents the things our code is trying to learn how to recognize; in this case, `X` represents 28x28 pixel grayscale images, each of which is an image of a single handwritten digit.

A lowercase `y` represents the *labels* associated with each of the items in `X`. In MNIST, `y` contains the actual digit represented by each of the images in `X`. 

We will deal with `test` versus `train` momentarily. Right now, let us look at a single image from `X` and its associated label from `y`. The next cell selects a single image from `X_train` (the one at index 120 in the `X_train` list) and displays it, using familiar `skimage` tools. 



In [None]:
img = X_train[120]
skimage.io.imshow(img)
plt.show()

You can see that the image is a (rather sloppy) handwritten '2'. You will also note that the image is inverted, with white "ink" on a black background. This is done to make training of the DNN easier; we will not delve into the reasons why here. But, if we want to see the digit image in a more familiar way, it is easy enough to do so!

The next cell shows how to invert the pixels in the image, using the `skimage.util` `invert()` function. Displaying the inverted image should result in a more familiar image of a handwritten '2'.

In [None]:
import skimage.util

# invert and display the image so it looks more natural to 
# human eyes
img = skimage.util.invert(img)
skimage.io.imshow(img)
plt.show()

The DNN we will construct is supposed to classify these kinds of images -- so, for the image above, it should recognize it as the number two. The `y` lists contain the actual numbers associated with each of the images. In the next cell we verify this by printing the value and type of `y` value associated with the handwrtten two we displayed above. 

In [None]:
print('Type of y_train[120]:', type(y_train[120]))
print('Value of y_train[120]:', y_train[120])

As we can see, the label value for the image above is an integer (an unsigned, 8-bit integer to be precise) with the value 2. 

Now, let us turn our attention to the significance of the `_test` and `_train` suffixes in the data returned by `mnist.load_data()`.

A DNN is "trained" by showing it thousands of images and their associated labels, and tweaking the values of hundreds of thousands of numbers in a complicated structure that mimics how neurons in human brains work. Eventually, the DNN "learns" the training data, so that it is able to recognize the data in the training set. 

But, we run the risk of creating a DNN that is too well trained on the training data -- that is, it might be able to recognize the elements of the training set, but not do well at all on unfamiliar data. We want our DNN to work on images it has never seen before. 

That is where the test set comes in. We reserve some of our images and labels at the outset, and do not use these for training. Instead, use use the (unfamiliar) data in the test set to evaluate the DNN after it has been trained. 

So, `X_train` and `y_train` are the training sets used to train the DNN, while `X_test` and `y_test` are used to evaluate how good our model is after it has been trained. 

The next cell looks at the sizes of each of these sets.

In [None]:
print('Size of X-train and y_train:', len(X_train), len(y_train))
print('Size of X_test and y_test:', len(X_test), len(y_test))

We can see that we have 60,000 training images and labels, and 10,000 testing images and labels. This is one of the things that makes training DNNs difficult -- the need for large quantities of labeled data. In other words, human beings had to create the labels for these 70,000 sample images, so that the labeled images can be used to train a DNN. 

The next step is to construct and train the DNN. First, we change the representation of our images from integer numbers in $\left[0, 255\right]$ to floating point numbers in $\left[0, 1\right]$. This is another numerical trick that helps the DNN work better. 

In [None]:
# Scale the feature data to be between 0 and 1. 
X_train, X_test = X_train / 255.0, X_test / 255.0


## Creating and training the DNN

Now, we define the structure of our DNN. A DNN is structured in a series of layers. Inputs (small images of handwritten digits in our case) go into the input later, and on the other end, the classification of the input (a number in $\left[0, 9\right]$) appears. The particulars of the layers are well beyond the scope of this lesson, but you can see from the code below that we have four layers in our DNN, and that the input has $78 \times 78 = 784$ input nodes, one for each pixel in an image we're trying to classify. There are two "hidden" layers that form sort of a funnel -- see how the number of nodes goes from 784 to 512 to 256? The final layer has 10 output nodes, one for each of the digits in $\left[0, 9\right]$. Conceptually, one of those output nodes will be "activated" after we feed an image into the DNN, showing us which digit the network thinks was contained in the image. 

In [None]:

# Define the DNN model:
#    784-node input layer
#    512-node hidden layer with ReLU activation
#    256-node hidden layer with ReLU activation
#    10-node output layer using softmax activation
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dense(256, activation=tf.nn.relu),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

Our next step is to convert the model into something that can be quickly trained, by compiling it. In this step we include parameters that specify how the DNN will be trained and how its performance during training will be evaluated. Again, the specifics of the choices in the next code cell are beyond our scope here. 

In [None]:
# Compile the model into an executable form
# optimizer = method of optimizing the weights
# loss = function used to evaluate error
# metrics = what measure loss is computed over
model.compile(optimizer='sgd', 
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

Now, we get to train the model! This will take a while, probably several minutes. As the model trains, you will be able to watch its degree of accuracy -- how frequently it is able to correctly classify training images -- increase.

In [None]:
# Train the model; an epoch is a complete training pass
# through the entire dataset. 
model.fit(X_train, y_train, epochs=10)

When we ran the training, we consistently created DNNs with over 97% accuracy on the training data. But, how well does the DNN do on the test data? We can use the test images and labels to see, as shown in the next cell.

In [None]:
# code to evaluate the trained DNN, using the test data
print(model.metrics_names, model.evaluate(X_test, y_test))

Our model achieved almost 97% accuracy on the test data! 

## Using the trained DNN on a new image

Now, let's try to put the trained DNN to use. In the next cell, we access an image of a hand-drawn digit -- a number drawn on paper and then photographed -- and display it. 

In [None]:
# load and display original image
img = skimage.io.imread('https://i.imgur.com/cV7uAMx.jpg')
skimage.io.imshow(img)
plt.show()

We have to turn this into an image suitable for our DNN - we need to crop to select only the '8' from the image, convert it to grayscale, invert the image, and resize so the image is 28x28 pixels in size. The next cell does this, based on some coordinates we derived from an image editing program.

In [None]:
# crop to include just the 8
img = img[1117:1761, 501:1157, :]

# convert to grayscale
import skimage.color
img = skimage.color.rgb2gray(img)

# resize image to 28 x 28 pixels
from skimage.transform import resize
img = resize(img, (28, 28), anti_aliasing = True)

# invert the image
img = skimage.util.invert(img)

Let's see how our image looks now!

In [None]:
# display our new image
skimage.io.imshow(img)
plt.show()

Now we can use the DNN, and see if it properly categorizes the image! We do that by calling the `model.predict()` method call. The parameter we pass in is our image, which we have to reshape just a bit in order to match the data we trained the DNN on. 

In particular, if we execute the command `X_train.shape`, we see the output `(60000, 28, 28)`, which means the training images were actually stored in a 3D array, where each "sheet" of the array is a complete image. We need to make our image into a similar shape, by calling `img.reshape(1, 28, 28)`. That makes the image into a 3D array, where the first -- and only -- sheet has our image. 

The `model.predict()` call will return an array of predictions, in this case, showing the probability that the image is associated with a particular digit. In the case of this exercise, the indices of the probabilities correspond to the digits. The next cell shows how to fetch the predicted digit out of that array.

In [None]:
predictions = model.predict(img.reshape(1, 28, 28))
print('Predictions array:', predictions)
print('DNN thinks your image is:', predictions.argmax())

---
> **Your turn: recognize one of your handwritten digits**
> 
> Now it's your turn! First, write a single digit on a piece of paper,
> photograph it with a digital camera or your smartphone, and then 
> upload it to the files area of your Google Colab instance. 
> 
> Once you have the image uploaded, write code to read the image in,
> crop it to have only the digit in the image, conver it to grayscale,
> resize it, and invert it, using the techniques shown above.
>
> Then, ask the `model` object to classify your handwritten digit and 
> see if it makes the correct decision. 
---

In [None]:
# TODO: read in, crop, grayscale, resize, and invert your own digit image


In [None]:
# TODO: use the model to classify your digit
