# Loading the MNIST dataset in Keras #

In [1]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


This was the input data:

In [2]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

In [3]:
len(train_labels)

60000

In [4]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [5]:
len(test_labels)

10000

In [6]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

## Workflow will be as follows: ##

* 1) We'll feed the neural network the training data, **train_images** and **train_labels**.
* 2) The network will then learn to associate images and labels.
* 3) Finally, we'll ask the network to produce predictions for **test_images** and , and we'll verify whether these predictions match the labels from test_labels.

Let's build the network again

## The network architecture ##

In [7]:
from keras.models import Sequential
from keras import layers

network = Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

Here our network consists of a sequence of 2 **Dense** layers (or fully connected layers)

The 2nd and last layer is a 10-way **softmax** layer;
   * which means it will return an array of 10 probability scores(summing to 1)
   * Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

**The layer** is the core building block of neural networks, a data-processing module that you can think of as a filter for data(transforms the data into a more useful form)
    * layers extract representations out of the data fed into them
    
**Most of deep learning consists of** chaining together simple layers that will implement a form of progressive **data distillation** 
    * a deep learining model is like a sieve for data processing
    * made of a succession of increasingly refined data filters-the layers.

## The compliation step ##

We need three more things as part of the compilation step:

* A loss function - How the network will be able to measure it's performance on the training data, and thus now it will be able to steer itself in the right direction.

* An optimizer- The mechanism through which the network will update itself based on the data it sees and its loss function.

* Metrics to monitor during training and testing - Here, we'll only care about accuracy(the fraction of the images that were correctly classified)

In [8]:
network.compile(optimizer='rmsprop',
               loss='categorical_crossentropy',
               metrics=['accuracy'])
                

**Before training:**
* Preprocess the data by **reshaping** it into the shape the network expects & **scaling** it so that all values are in the [0, 1] interval. (previously training images were in array shape (60000, 28, 28) type uint8 with values in the [0, 255] interval.

* We transform it into a **float32** array of **shape** (60000, 28 * 28) with values between 0 and 1.

## Preparing the image data ##

In [9]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

We also need to categorically encode the labels

## Preparing the labels ##

In [10]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

We're now ready to train the network, which in Keras is done via a call to the network's **fit** method-we **fit** the model to is training data:

In [11]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0xb29109f60>

Two quantities are displayed during training:

* the loss of network over the training data
* the accuracy of the network over the training data

We quickly reach an accuracy of 0.908(90.8%) on the training data

Now let's check that the model performs well on the test set, too:

In [12]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

test_acc: 0.9128


Test-set accuracy turns out to be 91.2%, If it were quite a bit lower than the training set accuracy;

This gap between training accuracy and test accuracy is and example of **overfitting**, the fact that machine learning models tend to perform worse on new data than on their training data.