# Going through end-to-end for MNIST

MNIST is a great way to experiment with & revise CNNs because

- it's very fast to train because it has 28x28 greyscale images,
- there are extensive benchmarks on what are the best approaches to MNIST.

In [1]:
from theano.sandbox import cuda
cuda.use('gpu2')

ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.


In [2]:
%matplotlib inline
import utils; reload(utils)
from utils import *
from __future__ import division, print_function

Using Theano backend.


## Setup

In [3]:
#batch_size=64
batch_size=4

### Load MNIST data

In [4]:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))

### Pre-processing: 2 Steps

(1) Add the extra (empty) dimension: because Keras expects there to be a number of channels

In [5]:
X_test = np.expand_dims(X_test,1)
X_train = np.expand_dims(X_train,1)

In [6]:
X_train.shape

(60000, 1, 28, 28)

In [7]:
y_train[:5]

array([5, 0, 4, 1, 9], dtype=uint8)

(2) One-hot encode the labels

In [8]:
y_train = onehot(y_train)
y_test = onehot(y_test)

In [9]:
y_train[:5]

array([[ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

In [10]:
mean_px = X_train.mean().astype(np.float32)
std_px = X_train.std().astype(np.float32)

In [11]:
def norm_input(x): return (x-mean_px) / std_px

## Linear model

A linear model
- normalize & flatten the input (treat it as a single vector).
- has one Dense layer with 10 outputs

In [12]:
def get_lin_model():
    model = Sequential([
        Lambda(norm_input, input_shape=(1,28,28)),
        Flatten(),
        Dense(10, activation='softmax')
        ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [13]:
lm = get_lin_model()

In [14]:
gen = image.ImageDataGenerator()
batches = gen.flow(X_train, y_train, batch_size=batch_size)
test_batches = gen.flow(X_test, y_test, batch_size=batch_size)

** How to train a model **

The best way to train a model is to start by doing one epoch with quite low learning rate (LR).
- The default learning rate is 0.001.

Nearly all of the time, using the default LR and doing a single epoch is enough to get it started. 

In [15]:
lm.fit_generator(batches, batches.N, nb_epoch=1,
                validation_data=test_batches, nb_val_samples=test_batches.N)

Epoch 1/1


<keras.callbacks.History at 0x7fb58e3d8ad0>

Set the LR really high.
- So LR = 0.1 is the highest you want to go. 
- Do another epoch.

In [17]:
lm.optimizer.lr=0.1

In [18]:
lm.fit_generator(batches, batches.N, nb_epoch=1,
                validation_data=test_batches, nb_val_samples=test_batches.N)

Epoch 1/1


<keras.callbacks.History at 0x7fb58b003e90>

Gradually you reduce the LR  by an order of magnitude at a time.
- Set LR to 0.01, do a few epochs.
- Keep doing that until you start overfitting.

In [19]:
lm.optimizer.lr=0.01

In [20]:
lm.fit_generator(batches, batches.N, nb_epoch=4,
                validation_data=test_batches, nb_val_samples=test_batches.N)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7fb5a6ade390>

## Single dense layer