### Keras MNIST

This notebook was inspired by the work on http://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/. The goal here is not to invent anything but to get used to Keras' tool.
With this notebook, we're going to do character recognition on the MNIST database. 
We will train a convolutionnal neural network, and try the Dropout layers.

At the theory level, here is a really good explanation on how CNNs work: http://neuralnetworksanddeeplearning.com/chap6.html

### CPU or GPU

To make the model run on the gpu, diverse techniques exist : https://keras.io/getting-started/faq/#how-can-i-run-keras-on-gpu <br />
We chose the manual one as it is rather easy and prevent us from forgetting our settings.



In [61]:
import os   
#useless with NO GPU
os.environ['THEANO_FLAGS'] = "device=gpu"  
import theano 


In [62]:
from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 1.602674 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu


### Get the data 

In [80]:
# fix random seed for reproducibility, REALLY IMPORTANT
seed = 7
numpy.random.seed(seed)


Keras comes with some datasets that can be loaded directly into the notebook, they can be found here: https://keras.io/datasets/

In [63]:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [64]:
X_train.shape

(60000, 28, 28)

In keras, the layers can expect various input format. 
So before using an layer we have to check the documentation to know in which format we need to reshape the input data.<br /> Here we will use the Convolution2D layer that take in input channels × width × height, channel is 1 here because the images are in grey scale.

In [65]:
from keras import backend as K
K.set_image_dim_ordering('th')

In [66]:
X_train_3D = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test_3D = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')
X_train_3D.shape

(60000, 1, 28, 28)

In [67]:
# normalize inputs from 0-255 to 0-1
X_train_3D = X_train_3D / 255
X_test_3D = X_test_3D / 255

In [68]:
# This one can be useful if we want to use a Dense layer as the first hidden layer 
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train_flattened = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test_flattened = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
X_train_flattened.shape

(60000, 784)

In [69]:
# normalize inputs from 0-255 to 0-1
X_train_flattened = X_train_flattened / 255
X_test_flattened = X_test_flattened / 255

In [70]:
X_train_3D[1]

array([[[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
          0.  

Now, we also need to transform the outputs as 1-hot encoded vectors

In [71]:
y_train[1]

0

In [72]:
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)


In [73]:
y_test[:3]

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [74]:
y_train.shape

(60000, 10)

### Create the models 

In [75]:
numClasses= y_train.shape[1]

In [76]:
from keras.models import Sequential 
from keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten, Dropout

#### First model 

This model is a simple network with one convolutionnal layer and an output layer.

In [77]:
def first_model():
    model = Sequential()
    model.add(Convolution2D(32, 3, 3, activation="relu", border_mode="valid",input_shape=(1, 28, 28),dim_ordering="th"))
    model.add(Flatten())
    model.add(Dense(10, init='uniform', activation='sigmoid'))
    #To debug it is really useful to print the output size of each layer as such:
    model.summary()
    return model

In [78]:
firstModel= first_model()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
convolution2d_9 (Convolution2D)  (None, 32, 26, 26)    320         convolution2d_input_6[0][0]      
____________________________________________________________________________________________________
flatten_5 (Flatten)              (None, 21632)         0           convolution2d_9[0][0]            
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 10)            216330      flatten_5[0][0]                  
Total params: 216650
____________________________________________________________________________________________________


In [79]:
firstModel.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#### Second model (with pooling and dropout)

The goal of this new network is to add pooling and dropout layers to improve the model as much as possible

In [81]:
def second_model():
    model= Sequential()
    model.add(Convolution2D(30, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Convolution2D(15, 3, 3, activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(numClasses, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model 

In [82]:
secondModel= second_model()

### Fit the model 

#### First model

In [87]:
firstModel.fit(X_train_3D, y_train, validation_data=(X_test_3D, y_test), nb_epoch=10, batch_size=200, verbose=2)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
27s - loss: 0.3278 - acc: 0.9117 - val_loss: 0.1238 - val_acc: 0.9643
Epoch 2/10
27s - loss: 0.1036 - acc: 0.9711 - val_loss: 0.0835 - val_acc: 0.9744
Epoch 3/10
28s - loss: 0.0694 - acc: 0.9804 - val_loss: 0.0667 - val_acc: 0.9790
Epoch 4/10
28s - loss: 0.0564 - acc: 0.9841 - val_loss: 0.0643 - val_acc: 0.9802
Epoch 5/10
30s - loss: 0.0466 - acc: 0.9865 - val_loss: 0.0592 - val_acc: 0.9818
Epoch 6/10
32s - loss: 0.0386 - acc: 0.9889 - val_loss: 0.0579 - val_acc: 0.9814
Epoch 7/10
30s - loss: 0.0336 - acc: 0.9902 - val_loss: 0.0585 - val_acc: 0.9815
Epoch 8/10
29s - loss: 0.0293 - acc: 0.9914 - val_loss: 0.0599 - val_acc: 0.9825
Epoch 9/10
32s - loss: 0.0258 - acc: 0.9928 - val_loss: 0.0619 - val_acc: 0.9819
Epoch 10/10
31s - loss: 0.0225 - acc: 0.9938 - val_loss: 0.0646 - val_acc: 0.9816


<keras.callbacks.History at 0x10ce53f50>

#### Second model 

In [83]:
secondModel.fit(X_train_3D, y_train, validation_data=(X_test_3D, y_test), nb_epoch=10, batch_size=200, verbose=2)


Train on 60000 samples, validate on 10000 samples
Epoch 1/10
45s - loss: 0.3785 - acc: 0.8793 - val_loss: 0.0811 - val_acc: 0.9742
Epoch 2/10
51s - loss: 0.0929 - acc: 0.9707 - val_loss: 0.0464 - val_acc: 0.9855
Epoch 3/10
47s - loss: 0.0684 - acc: 0.9788 - val_loss: 0.0369 - val_acc: 0.9876
Epoch 4/10
46s - loss: 0.0544 - acc: 0.9828 - val_loss: 0.0339 - val_acc: 0.9888
Epoch 5/10
48s - loss: 0.0466 - acc: 0.9857 - val_loss: 0.0286 - val_acc: 0.9898
Epoch 6/10
46s - loss: 0.0400 - acc: 0.9875 - val_loss: 0.0297 - val_acc: 0.9899
Epoch 7/10
45s - loss: 0.0371 - acc: 0.9878 - val_loss: 0.0234 - val_acc: 0.9918
Epoch 8/10
45s - loss: 0.0326 - acc: 0.9896 - val_loss: 0.0280 - val_acc: 0.9910
Epoch 9/10
46s - loss: 0.0311 - acc: 0.9903 - val_loss: 0.0224 - val_acc: 0.9923
Epoch 10/10
48s - loss: 0.0268 - acc: 0.9916 - val_loss: 0.0231 - val_acc: 0.9920


<keras.callbacks.History at 0x10d80a510>

### Test the model 

In [88]:
loss_and_metrics1 = firstModel.evaluate(X_test_3D, y_test, batch_size=32)



In [89]:
print loss_and_metrics1
print("First model Error: %.2f%%" % (100-loss_and_metrics1[1]*100))

[0.064574387505860073, 0.98160000000000003]
First model Error: 1.84%


In [86]:
loss_and_metrics2 = secondModel.evaluate(X_test_3D, y_test, verbose=0)
print loss_and_metrics2
print("Second model Error: %.2f%%" % (100-loss_and_metrics2[1]*100))

[0.023132567941093293, 0.99199999999999999]
Second model Error: 0.80%


<b> Results </b> <br />
First model with 10 epochs and validation [0.064574387505860073, 0.98160000000000003] <br />
Second model with 10 epochs and validation :[0.023132567941093293, 0.99199999999999999]<br />
This results must be over evaluated as the tests sets are also the vali

We could also make some predictions

In [None]:
classes = firstModel.predict_classes(Xtest, batch_size=32)