Managing imports

In [0]:
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils

Loading the MNIST Dataset

In [0]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Reshaping the data to the form (batch, height, width, channels) form.
Here, channels = 1 as the image is in greyscale. Had it been colour, we would have set it to '3'.

In [0]:
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1).astype('float32')

Normalizing the pixel values from range 0-255 to range 0-1

In [0]:
X_train = X_train/255
X_test = X_test/255

Since this is a multi-class classification problem with 10 classes, we will be using one-hot encoding for each class.
For example, the output for class 0 will be [1, 0, 0, 0, 0, 0, 0, 0, 0,].

In [0]:
number = 10
y_train = np_utils.to_categorical(y_train, number)
y_test = np_utils.to_categorical(y_test, number)

Now, we will be defining our model.

*(As our dataset contains images, we will be using Conv2D and MaxPooling2D functions)*

1.   Convolution layer with 1024 filters, each with size 5X5, and activation function 'relu'. The expected input shape also needs to be passed as an argument since this is the first hidden layer.
2.   Max Pooling layer. Max Pooling layer is used to down-sample the input to enable the model to make assumptions about the features so as to reduce over-fitting. It also reduces the number of parameters to learn, reducing the training time.
3. One more convolution layer with 512 filters, each with size 4X4, and activation function 'relu'.
4. One more Max Pooling layer.
5.  Another final convolution layer with 256 filters, each with size 3X3, and activation function 'relu'.
6. Final Max Pooling layer.
7. The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 30% of neurons in the layer in order to reduce overfitting.
8. Next layer converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.
9. Next layer is a fully connected layer with 128 neurons.
10. Next layer is another fully connected layer with 64 neurons.
11. The last layer is output layer with 10 neurons(number of output classes) and it uses softmax activation function. Each neuron will give the probability of that class. It’s a multi-class classification problem, that is why softmax activation function is used. Had it been a binary classification problem, we would have used sigmoid activation function.


In [0]:
model = Sequential()
model.add(Conv2D(1024, (5, 5), input_shape = (X_train.shape[1], X_train.shape[2], 1), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(512, (4, 4), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (3, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dense(64, activation = 'relu'))
model.add(Dense(number, activation = 'softmax'))

Now, we will be looking at the summary of our model.

In [25]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_16 (Conv2D)           (None, 24, 24, 1024)      26624     
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 12, 12, 1024)      0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 9, 9, 512)         8389120   
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 4, 4, 512)         0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 2, 2, 256)         1179904   
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 1, 1, 256)         0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 1, 1, 256)         0         
__________

Next, we will be compiling our model using categorical cross-entropy as a loss function as it is a multi-class classification problem.

Adam optimizer is used to ensure that the weights are optimized properly. I have tried other optimizers as well, but Adam gives the best results.

Accuracy will be the metric based on which the performance of our neural network will be improved.

In [0]:
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

Now, we will be training our model.

The model is going to fit over 10 epochs and is going to update after every 50 images training. 

The test data is used as the validation dataset.

In [27]:
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=50)

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f2276daa240>

Now, we will be compiling and running our model and check the training loss and accuracy as well as the test loss and accuracy.

In [28]:
metrics_train = model.evaluate(X_train, y_train, verbose=0)
print("Metrics(Train loss & Train Accuracy): ")
print(metrics_train)

metrics_test = model.evaluate(X_test, y_test, verbose=0)
print("Metrics(Test loss & Test Accuracy): ")
print(metrics_test)

Metrics(Train loss & Train Accuracy): 
[0.005230440739199806, 0.9985833333333334]
Metrics(Test loss & Test Accuracy): 
[0.022045926705585042, 0.9946]
