# Handwritten Digits Classification

In this project we'll try to classify grayscale images of handwritten digits (28 x 28 pixels) into thier 10 categories (0 through 9). We will use the MNIST dataset which has a set of 60,000 training images, and 10,000 test images. 

In [18]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings("ignore")

### Loading the MNIST dataset in Keras

In [19]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [20]:
 train_images.shape

(60000, 28, 28)

So, here we have a 3D tensor of integers. More precisely, it's an array of 60,000 matrices of 28 x 28 integers. Each such matrix is a grayscale image, with coefficients between 0 and 255.

In [21]:
len(train_images)

60000

In [22]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [23]:
test_images.shape

(10000, 28, 28)

In [24]:
len(test_images)

10000

Now that we have loaded the data, the workflow will be as follows: 
1. First, we’ll feed the neural network the training data (train_images and train_labels).
2. The network will then learn to associate images and labels. 
3. Finally, we’ll ask the network to produce predictions for test_images, and we’ll verify whether these predictions match the labels from test_labels.

### The network architecture

In [25]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

The core building block of neural networks is the layer, a data-processing
module that you can think of as a filter for data. Some data goes in, and it
comes out in a more useful form.

Here, our network consists of a sequence of two Dense layers, which are
densely connected (also called fully connected) neural layers. The second
(and last) layer is a 10-way softmax layer, which means it will return an
array of 10 probability scores (summing to 1). Each score will be the
probability that the current digit image belongs to one of our 10 digit
classes.

##### To make the network ready for training, we need to pick three more things, as part of the compilation step:
1. A loss function— how the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

2. An optimizer— the mechanism through which the network will update itself based on the data it sees and its loss function.
3. Metrics to monitor during training and testing— here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

### Compilation Step

In [26]:
network.compile(optimizer='rmsprop',
                 loss='categorical_crossentropy',
                metrics=['accuracy'])

Before training, we’ll preprocess the data by reshaping it into the shape the
network expects and scaling it so that all values are in the [0, 1] interval.
Previously, our training images, for instance, were stored in an array of
shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We
transform it into a float32 array of shape (60000, 28 * 28) with values
between 0 and 1.

### Preparing the image data

In [27]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels.
### Preparing the labels

In [28]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We’re now ready to train the network, which in Keras is done via a call to
the network’s fit method—we fit the model to its training data.
### Train the network

In [31]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x13fecfb10>

We quickly reach an accuracy of 0.999 (99.9%) on the training data. Now
let’s check how the model performs on the test set.

In [32]:
test_loss, test_acc = network.evaluate(test_images, test_labels)



In [33]:
print('test_acc:', test_acc)

test_acc: 0.9822999835014343


The test set accuracy turns out to be 98.2%—that’s quite a bit lower than
the training set accuracy. This gap between training accuracy and test
accuracy is an example of overfitting (i.e. the fact that machine-learning
models tend to perform worse on new data than on their training data).

---