# THE MNIST DATABASE of handwritten digits

The MNIST database of handwritten digits, available from [here](http://yann.lecun.com/exdb/mnist/), has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Four files are available on this site:
```
train-images-idx3-ubyte.gz:  training set images (9912422 bytes) 
train-labels-idx1-ubyte.gz:  training set labels (28881 bytes) 
t10k-images-idx3-ubyte.gz:   test set images (1648877 bytes) 
t10k-labels-idx1-ubyte.gz:   test set labels (4542 bytes)
```
These files are not in any standard image format. You have to write your own (very simple) program to read them. However, [keras](https://www.tensorflow.org/api_docs/python/tf/keras) has helper function, [``keras.datasets.mnist.load_data``](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist/load_data), that allows you to load this dataset easily. 


In [1]:
import tensorflow as tf
from tensorflow import keras

Let's load and prepare the MNIST dataset:


In [2]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

``x_train`` and ``x_test`` are ``uint8`` arrays of grayscale image data with shapes ``(num_samples, 28, 28)``. ``y_train`` and ``y_test`` are ``uint8`` arrays of digit labels (integers in range 0-9) with shapes ``(num_samples,)``.

Let's normalize the image samples from integers (in range 0-255) to floating-point numbers (in range 0.0-1.0):


In [3]:
x_train, x_test = x_train / 255.0, x_test / 255.0

Now, let's build the ``Sequential`` model by stacking layers:


In [4]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10)
])
print(model)

<tensorflow.python.keras.engine.sequential.Sequential object at 0x145723490>


For each example the model returns a vector of "[logits](https://developers.google.com/machine-learning/glossary#logits)" scores, one for each class.

Now, we choose an optimizer and loss function for training:


In [5]:
model.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

The [``losses.SparseCategoricalCrossentropy``](https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy) loss takes a vector of logits and a ``True`` index and returns a scalar loss for each example. This loss is equal to the negative log probability of the true class: It is zero if the model is sure of the correct class.

Now, let's train our model. The ``fit`` method adjusts the model parameters to minimize the loss:


In [6]:
model.fit(x_train, y_train, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1458686d0>

Now, that our model is trained, it's time to evaluate its performance on an unseen dataset. The ``evaluate`` method checks the models performance on the test set.


In [7]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 0s - loss: 0.0661 - accuracy: 0.9811


[0.06610903143882751, 0.9811000227928162]

You can see that the image classifier is now trained to ~98% accuracy on this dataset.

If you want your model to return a probability instead of logits, you can wrap the trained model, and attach the softmax to it:


In [8]:
probability_model = keras.Sequential([
  model,
  keras.layers.Softmax()
])

In [9]:
probability_model(x_test[:1]).numpy().sum()

1.0

## Convolutional Neural Networks

Now, let's change our model to take advantage of convolutional neural networks. But, we need to add a channels dimension before feeding the data to [``keras.layers.Conv2D``](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D).


In [10]:
x_train = x_train[..., tf.newaxis].astype("float32")
x_test = x_test[..., tf.newaxis].astype("float32")
print(x_train.shape, x_test.shape)

(60000, 28, 28, 1) (10000, 28, 28, 1)


Now, let's build the ``Sequential`` model by stacking layers:


In [11]:
conv_model = keras.models.Sequential([
    keras.layers.Conv2D(32, 3, activation='relu'),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10)
])
print(conv_model)

<tensorflow.python.keras.engine.sequential.Sequential object at 0x145fa0970>


Now, we choose an optimizer and loss function for training for the new model:


In [12]:
conv_model.compile(optimizer='adam', 
                   loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
                   metrics=['accuracy'])

Now, let's train our new model.


In [13]:
conv_model.fit(x_train, y_train, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x145fc9eb0>

Now, that our model is trained, let's evaluate its performance on the test dataset.


In [14]:
conv_model.evaluate(x_test, y_test, verbose=2)

313/313 - 1s - loss: 0.0842 - accuracy: 0.9825


[0.08415018022060394, 0.9825000166893005]

You can see that the image classifier is now trained to ~98% accuracy on this dataset.
