#  Deep Learning with Python (Francois Chollet)

## 2.1. A first look at a neural network

In [1]:
# 2.1 Loading the MNIST dataset in Keras
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


*train_images* and *train_labels* form the *training set*, the data that the model will learn from.
The model will then be tested on the test set, *test_images* and *test_labels*.
The images are encoded as Numpy arrays, and the labels are an array of digits, ranging from 0 to 9. The images and labels have a one-to-one correspondence.

In [2]:
train_images.shape

(60000, 28, 28)

In [3]:
len(train_labels)

60000

In [4]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

And here's the test data

In [5]:
test_images.shape

(10000, 28, 28)

In [6]:
len(test_labels)

10000

In [7]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

The workflow will be as follows: <br>
First, we'll feed the neural network the training data, *train_image* and *train_labels*.
The <span class="mark">network will then learn to associate images and labels.</span> Finally, we'll ask the <span class="mark">network to produce predictions for test_images</span>, and we'll verify whether these predictions match the labels from *test_labels.*

Let's build the network-again, remember that you aren't expected to understand everything about this example yet.

In [8]:
# 2.2 The network architecture
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation = 'relu', input_shape = (28*28,)))
network.add(layers.Dense(10, activation = 'softmax'))

The core building block of neural networks is the *<span class="mark">layer</span>*, a data-processing module that you can think of as a filter for data. <span class="mark">Some  data goes in, and it comes out in a more useful form.</span> Specifically, layers extract *representation* out of the data fed into them - hopefully, representations that are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layer that will implement a form of progressive *data distillation*. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters -- the layers.

Here, our network consists of a sequence of two *Dense* layers, which are densely connected (also called *fully connected*) neural layers. The second (and last) layer is a 10-way *softmax* layer. which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

To make the network ready for training, we need to pick three more things, as part of the *compilation* step:

- **A loss function** - How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

- **An optimizer** - The mechanism through which the network will update itself based on the data it sees and its loss function.

- **Metrics to monitor duting training and testing** - Here, we'll only care about accuracy (the fraction of the images that were correctly classified).

The exact purpose of the loss function and the optimizer will be made clear throughout the next tow chapters.

In [9]:
# 2.3 The compilation step
network.compile(optimizer='rmsprop', 
                loss='categorical_crossentropy', 
                metrics = ['accuracy'])

Before training, we'll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. Previously, our training images, for instance, were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We transform it into a float 32 array of shape (60000, 28 * 28) with values between 0 and 1

In [10]:
# 2.4 Preparing the image data
train_images = train_images.reshape((60000, 28*28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28*28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels

In [11]:
# 2.5 Preparing the labels
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We're now ready to train the network, which in Keras is done via a call to the network's **fit** method - we fit the model to its training data:

In [12]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x146f0c18208>

Two quantities are displayed during training : **the loss of the network** over the training data, and **the accuracy of the network** over the training data.
We quickly reach an accuracy of 0.989 (98.9%) on the training data. Now let's check that the model performs well on the test set, too:

In [13]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc : ', test_acc)

test_acc :  0.9772


The test-set accuracy turns out to be 97.7% -that's quite a bit lower thatn the training set accuracy. This gap between training accuracy and test accuracy is an example of **overfitting**: the fact that machine-learning models tend to perform worse on new data thatn on their training data.

This concludes our first example - you just saw how you can build and train a neural network to classify handwritten digits in less than 20 lines of Python code. In the next chapter, I'll go into detail about every moving piece we just previewed and clarify what's going on behind the scences. You'll learn about tensors, the data-stroing objects going into the network; tensor operations, which layers are made of; and gradient descent, which allow your network to learn from its training examples.