### Keras MNIST

This notebook was inspired by the work on http://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/. The goal here is not to invent anything but to get used to Keras' tool.
With this notebook, we're going to do character recognition on the MNIST database. 
We will train a convolutionnal neural network, and try the Dropout layers.

At the theory level, here is a really good explanation on how CNNs work: http://neuralnetworksanddeeplearning.com/chap6.html

### CPU or GPU

To make the model run on the gpu, diverse techniques exist : https://keras.io/getting-started/faq/#how-can-i-run-keras-on-gpu <br />
We chose the manual one as it is rather easy and prevent us from forgetting our settings.



In [1]:
import os   
#useless with NO GPU
os.environ['THEANO_FLAGS'] = "device=gpu"  
import theano 


ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.


In [2]:
from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 1.584268 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
  1.62323285]
Used the cpu


### Get the data 

Keras comes with some datasets that can be loaded directly into the notebook, they can be found here: https://keras.io/datasets/

In [3]:
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Using Theano backend.


In [4]:
X_train.shape

(60000, 28, 28)

In [5]:
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

In keras, the layers can expect various input format. 
So before using an layer we have to check the documentation to know in which format we need to reshape the input data.<br /> Here we will use the Convolution2D layer that take in input channels × width × height, channel is 1 here because the images are in grey scale.

In [6]:
X_train_3D = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test_3D = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')
X_train_3D.shape

(60000, 1, 28, 28)

In [7]:
# This one can be useful if we want to use a Dense layer as the first hidden layer 
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train_flattened = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test_flattened = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
X_train_flattened.shape

(60000, 784)

Now, we also need to transform the outputs as 1-hot encoded vectors

In [8]:
y_train[1]

0

In [9]:
from keras.utils import np_utils
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)


In [10]:
y_train.shape

(60000, 10)

### Create the models 

In [11]:
numClasses= y_train.shape[1]

In [12]:
from keras.models import Sequential 
from keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten

#### First model 

This model is a simple network with one convolutionnal layer and an output layer.

In [13]:
def first_model():
    model = Sequential()
    model.add(Convolution2D(32, 3, 3, activation="relu", border_mode="valid",input_shape=(1, 28, 28),dim_ordering="th"))
    model.add(Flatten())
    model.add(Dense(10, init='uniform', activation='sigmoid'))
    #To debug it is really useful to print the output size of each layer as such:
    model.summary()
    return model

In [14]:
firstModel= first_model()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
convolution2d_1 (Convolution2D)  (None, 32, 26, 26)    320         convolution2d_input_1[0][0]      
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 21632)         0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            216330      flatten_1[0][0]                  
Total params: 216650
____________________________________________________________________________________________________


In [15]:
firstModel.compile(loss='MSE', optimizer='adam', metrics=['accuracy'])

#### Second model (with dropout)

The goal of this new network is to add pooling and dropout layers to improve the model as much as possible

In [16]:
def second_model():
    model.add(Convolution2D(30, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Convolution2D(15, 3, 3, activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(numClasses, activation='softmax'))

In [17]:
secondModel= second_model()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
convolution2d_2 (Convolution2D)  (None, 32, 26, 26)    320         convolution2d_input_2[0][0]      
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 21632)         0           convolution2d_2[0][0]            
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 10)            216330      flatten_2[0][0]                  
Total params: 216650
____________________________________________________________________________________________________


In [18]:
secondModel.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Fit the model 

In [19]:
firstModel.fit(X_train_3D, y_train, nb_epoch=150, batch_size=10, verbose=1)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<keras.callbacks.History at 0x10938ec90>

### Test the model 

In [22]:
loss_and_metrics1 = firstModel.evaluate(X_test_3D, y_test, batch_size=32)



In [23]:
loss_and_metrics1

[0.071653639280796044, 0.4209]

For the first model we have those results :[0.071653639280796044, 0.4209]

We could also make some predictions

In [None]:
classes = firstModel.predict_classes(Xtest, batch_size=32)