<a href="https://colab.research.google.com/github/lovellbrian/ELEC4630/blob/master/MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Chollet p27

You don’t need to try to reproduce this example on your machine just now. If you wish
to, you’ll first need to set up Keras, which is covered in section 3.3.
The MNIST dataset comes preloaded in Keras, in the form of a set of four Numpy
arrays.

In [28]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images and train_labels form the training set, the data that the model will
learn from. The model will then be tested on the test set, test_images and test_labels.

The images are encoded as Numpy arrays, and the labels are an array of digits, ranging
from 0 to 9. The images and labels have a one-to-one correspondence.
Let’s look at the training data:

In [29]:
train_images.shape

(60000, 28, 28)

And here’s the test data:

In [30]:
test_images.shape

(10000, 28, 28)

In [31]:
len(test_labels)

10000

In [32]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

The workflow will be as follows: First, we’ll feed the neural network the training data,
train_images and train_labels. The network will then learn to associate images and
labels. Finally, we’ll ask the network to produce predictions for test_images, and we’ll
verify whether these predictions match the labels from test_labels.
Let’s build the network—again, remember that you aren’t expected to understand
everything about this example yet.

In [33]:
from keras import models
from keras import layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

To make the network ready for training, we need to pick three more things, as part
of the compilation step:
* A loss function—How the network will be able to measure its performance on
the training data, and thus how it will be able to steer itself in the right direction.
* An optimizer—The mechanism through which the network will update itself
based on the data it sees and its loss function.
* Metrics to monitor during training and testing—Here, we’ll only care about accuracy
(the fraction of the images that were correctly classified).
The exact purpose of the loss function and the optimizer will be made clear throughout
the next two chapters.

In [34]:
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

Before training, we’ll preprocess the data by reshaping it into the shape the network
expects and scaling it so that all values are in the [0, 1] interval. Previously, our training
images, for instance, were stored in an array of shape (60000, 28, 28) of type
uint8 with values in the [0, 255] interval. We transform it into a float32 array of
shape (60000, 28 * 28) with values between 0 and 1.

In [35]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels, a step that’s explained in chapter 3.

In [36]:
from tensorflow.keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We’re now ready to train the network, which in Keras is done via a call to the network’s
fit method—we fit the model to its training data:

In [37]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f2a4ebf3ad0>

Two quantities are displayed during training: the loss of the network over the training
data, and the accuracy of the network over the training data.
We quickly reach an accuracy of 0.989 (98.9%) on the training data. Now let’s
check that the model performs well on the test set, too:

In [39]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

test_acc: 0.9783999919891357


The test-set accuracy turns out to be 97.8%—that’s quite a bit lower than the training
set accuracy. This gap between training accuracy and test accuracy is an example of
overfitting: the fact that machine-learning models tend to perform worse on new data
than on their training data. Overfitting is a central topic in chapter 3.
This concludes our first example—you just saw how you can build and train a neural
network to classify handwritten digits in less than 20 lines of Python code. In the
next chapter, I’ll go into detail about every moving piece we just previewed and clarify
what’s going on behind the scenes. You’ll learn about tensors, the data-storing objects
going into the network; tensor operations, which layers are made of; and gradient
descent, which allows your network to learn from its training examples