# CNNs and MNIST dataset with Keras

In [1]:
import numpy as np
np.random.seed(100)
import keras
import tensorflow
import sys
(keras.__version__, tensorflow.__version__, sys.version)

Using TensorFlow backend.


('2.0.2',
 '1.0.1',
 '3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]')

## Prepare the dataset

In [2]:
from keras.datasets import mnist
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [3]:
# Normalize data to values in range 0-1
x_train = x_train/255.0
x_test = x_test/255.0

In [4]:
print("shape of X_train =", x_train.shape)
print("shape of y_train =", y_train.shape)

shape of X_train = (60000, 28, 28)
shape of y_train = (60000,)


We must add color channel at the end: tf - channel ordering we need following shape (Number of samples, width, height, num channels)

In [5]:
x_test = np.expand_dims(x_test,3)
x_train = np.expand_dims(x_train,3)
print("shape of x_train =", x_train.shape)
print("shape of x_test =", x_test.shape)

shape of x_train = (60000, 28, 28, 1)
shape of x_test = (10000, 28, 28, 1)


Convert labels to onehot encodings

In [6]:
from keras.utils.np_utils import to_categorical
def onehot(x):
    return to_categorical(x)

y_train = onehot(y_train)
y_test = onehot(y_test)

In [7]:
print("shape(y_train) =", y_train.shape)
print("shape(y_test)  =", y_test.shape)

shape(y_train) = (60000, 10)
shape(y_test)  = (10000, 10)


Create sample test

In [8]:
x_train_sample = x_train[:8000]
y_train_sample = y_train[:8000]

x_test_sample = x_test[:1000]
y_test_sample = y_test[:1000]

## Create simple VGG like model

In [18]:
from keras.layers import  *
from keras.models import  *
from keras.optimizers import Adam

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))

adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_13 (Conv2D)           (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 24, 24, 32)        9248      
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
batch_normalization_10 (Batc (None, 12, 12, 32)        128       
_________________________________________________________________
dropout_10 (Dropout)         (None, 12, 12, 32)        0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 10, 10, 64)        18496     
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 8, 8, 64)          36928     
__________

Run a single epoch on a small dataset

In [19]:
model.fit(x=x_train_sample, y=y_train_sample, 
          epochs=1, 
          batch_size=32, 
          validation_data=(x_test_sample, y_test_sample), 
          verbose=2, shuffle=True)

Train on 8000 samples, validate on 1000 samples
Epoch 1/1
6s - loss: 0.4014 - acc: 0.8777 - val_loss: 0.1520 - val_acc: 0.9490


<keras.callbacks.History at 0x7f67bc7b9dd8>

Run more epochs

In [20]:
model.fit(x=x_train_sample, y=y_train_sample, 
          epochs=15, 
          batch_size=32, 
          validation_data=(x_test_sample, y_test_sample), 
          verbose=2, shuffle=True)

Train on 8000 samples, validate on 1000 samples
Epoch 1/15
4s - loss: 0.2012 - acc: 0.9411 - val_loss: 0.2357 - val_acc: 0.9330
Epoch 2/15
4s - loss: 0.1732 - acc: 0.9465 - val_loss: 0.1066 - val_acc: 0.9620
Epoch 3/15
4s - loss: 0.1706 - acc: 0.9509 - val_loss: 0.3152 - val_acc: 0.9270
Epoch 4/15
4s - loss: 0.1571 - acc: 0.9531 - val_loss: 0.1194 - val_acc: 0.9670
Epoch 5/15
4s - loss: 0.1419 - acc: 0.9579 - val_loss: 0.2027 - val_acc: 0.9410
Epoch 6/15
4s - loss: 0.1316 - acc: 0.9596 - val_loss: 0.1458 - val_acc: 0.9600
Epoch 7/15
4s - loss: 0.1432 - acc: 0.9566 - val_loss: 0.0723 - val_acc: 0.9780
Epoch 8/15
4s - loss: 0.1178 - acc: 0.9658 - val_loss: 0.5884 - val_acc: 0.8180
Epoch 9/15
4s - loss: 0.1025 - acc: 0.9690 - val_loss: 0.0929 - val_acc: 0.9690
Epoch 10/15
4s - loss: 0.0895 - acc: 0.9746 - val_loss: 0.0710 - val_acc: 0.9770
Epoch 11/15
4s - loss: 0.0809 - acc: 0.9740 - val_loss: 0.0525 - val_acc: 0.9850
Epoch 12/15
4s - loss: 0.0886 - acc: 0.9744 - val_loss: 0.1045 - val_a

<keras.callbacks.History at 0x7f67c46e1e48>

Let's train the model on the whole dataset 

In [22]:
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001), metrics=['accuracy'])
model.fit(x=x_train, y=y_train, 
          epochs=10, 
          batch_size=32, 
          validation_data=(x_test, y_test), 
          verbose=2, shuffle=True)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
38s - loss: 0.0623 - acc: 0.9820 - val_loss: 0.0256 - val_acc: 0.9920
Epoch 2/10
35s - loss: 0.0522 - acc: 0.9839 - val_loss: 0.0221 - val_acc: 0.9926
Epoch 3/10
34s - loss: 0.0457 - acc: 0.9863 - val_loss: 0.0200 - val_acc: 0.9935
Epoch 4/10
35s - loss: 0.0410 - acc: 0.9876 - val_loss: 0.0192 - val_acc: 0.9933
Epoch 5/10
35s - loss: 0.0383 - acc: 0.9885 - val_loss: 0.0182 - val_acc: 0.9941
Epoch 6/10
35s - loss: 0.0353 - acc: 0.9893 - val_loss: 0.0180 - val_acc: 0.9942
Epoch 7/10
35s - loss: 0.0318 - acc: 0.9900 - val_loss: 0.0166 - val_acc: 0.9943
Epoch 8/10
34s - loss: 0.0319 - acc: 0.9900 - val_loss: 0.0156 - val_acc: 0.9948
Epoch 9/10
35s - loss: 0.0300 - acc: 0.9908 - val_loss: 0.0161 - val_acc: 0.9956
Epoch 10/10
35s - loss: 0.0284 - acc: 0.9916 - val_loss: 0.0154 - val_acc: 0.9953


<keras.callbacks.History at 0x7f67b6f36c88>

We have managed obtain 99.2% accuracy on the validation dataset after the very first epoch, however you may try to train for more epochs to get event better result