# Building a Basic CNN: The MNIST Dataset

In this notebook, we will build a simple CNN-based architecture to classify the 10 digits (0-9) of the MNIST dataset. The objective of this notebook is to become familiar with the process of building CNNs in Keras.

We will go through the following steps:
1. Importing libraries and the dataset
2. Data preparation: Train-test split, specifying the shape of the input data etc.
3. Building and understanding the CNN architecture 
4. Fitting and evaluating the model

Let's dive in.

## 1. Importing Libraries and the Dataset

Let's load the required libraries. From Keras, we need to import two main components:
1. `Sequential` from `keras.models`: `Sequential` is the keras abstraction for creating models with a stack of layers (MLP has multiple hidden layers, CNNs have convolutional layers, etc.). 
2. Various types of layers from `keras.layers`: These layers are added (one after the other) to the `Sequential` model

The keras `backend` is needed for keras to know that you are using tensorflow (not Theano) at the backend (the backend is <a href="https://keras.io/backend/">specified in a JSON file</a>). 


In [92]:
!pip install tensorflow-gpu



In [93]:
import numpy as np
import random
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D

Let's load the MNIST dataset from `keras.datasets`. The download may take a few minutes.

In [94]:
# load the dataset into train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [95]:
x_train
y_train

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [96]:
x_train.shape

(60000, 28, 28)

In [97]:
x_train=x_train.reshape(x_train.shape[0],(x_train.shape[1]*x_train.shape[2]))
x_train.shape

(60000, 784)

In [98]:
x_test=x_test.reshape(x_test.shape[0],(x_test.shape[1]*x_test.shape[2]))
x_test.shape

(10000, 784)

In [99]:
x_rand_train = np.zeros((x_train.shape[0],1))
x_rand_train.shape

(60000, 1)

In [100]:
%%time
x_train_new = np.append(x_train,x_rand_train, axis=1)
for i in range(x_train_new.shape[0]):
  x_train_new[i][784] = np.random.randint(0,10)


CPU times: user 303 ms, sys: 116 ms, total: 419 ms
Wall time: 423 ms


In [101]:
x_train_new.shape

(60000, 785)

In [102]:
#%%time
#r1 = np.random.randint(0,10,(60000))
#x_train_new[:,-1]=r1
#x_train_new[:,-1]

In [103]:
x_train_new.shape

(60000, 785)

In [104]:
x_rand_test = np.zeros((x_test.shape[0],1))
x_rand_test.shape

(10000, 1)

In [105]:
x_test.shape

(10000, 784)

In [106]:
#Adding a new column of the number that should be added to the digit extrated from the image for test dataset
x_test_new = np.append(x_test,x_rand_test, axis=1)
for i in range(x_test_new.shape[0]):
  x_test_new[i][784] = np.random.randint(0,10)

In [107]:
#Adding a new column of the number that should be the output after summation
y_train_new=[]
for i in range(y_train.shape[0]):
  summation = x_train_new[i][784] + y_train[i]
  y_train_new.append([y_train[i],summation])
y_train_new = np.array(y_train_new)
y_train_new.shape

(60000, 2)

In [108]:
#to verify the logic for one datapoint
y_train_new[0]

array([5., 5.])

In [109]:
#Adding a new column of the number that should be the output after summation for test dataset
y_test_new=[]
for i in range(y_test.shape[0]):
  summation = x_test_new[i][784] + y_test[i]
  y_test_new.append([y_test[i],summation])
y_test_new = np.array(y_test_new)
y_test_new.shape

(10000, 2)

In [110]:
print("train data")
print(x_train_new.shape)
print(y_train_new.shape)
print("\n test data")
print(x_test_new.shape)
print(y_test_new.shape)

train data
(60000, 785)
(60000, 2)

 test data
(10000, 785)
(10000, 2)


So we have 60,000 training and 10,000 test images each of size 28 x 28. Note that the images are grayscale and thus are stored as 2D arrays.<br> 

Also, let's sample only 20k images for training (just to speed up the training a bit).

In [111]:
# sample only 20k images for training
idx = np.random.randint(x_train_new.shape[0], size=20000) # sample 20k indices from 0-60,000
x_train_new = x_train_new[idx, :]
y_train_new = y_train_new[idx,:]
print(x_train_new.shape)
print(y_train_new.shape)

(20000, 785)
(20000, 2)


## 2. Data Preparation

Let's prepare the dataset for feeding to the network. We will do the following three main steps:<br>

#### 2.1 Reshape the Data
First, let's understand the shape in which the network expects the training data. 
Since we have 20,000 training samples each of size (28, 28, 1), the training data (`x_train`) needs to be of the shape `(20000, 28, 28, 1)`. If the images were coloured, the shape would have been `(20000, 28, 28, 3)`.

Further, each of the 20,000 images have a 0-9 label, so `y_train` needs to be of the shape `(20000, 10)` where each image's label is represented as a 10-d **one-hot encoded vector**.

The shapes of `x_test` and `y_test` will be the same as that of `x_train` and `y_train` respectively.

#### 2.2 Rescaling (Normalisation)
The value of each pixel is between 0-255, so we will **rescale each pixel** by dividing by 255 so that the range becomes 0-1. Recollect <a href="https://stats.stackexchange.com/questions/185853/why-do-we-need-to-normalize-the-images-before-we-put-them-into-cnn">why normalisation is important for training NNs</a>.

#### 2.3 Converting Input Data Type: Int to Float
The pixels are originally stored as type `int`, but it is advisable to feed the data as `float`. This is not really compulsory, but advisable. You can read <a href="https://datascience.stackexchange.com/questions/13636/neural-network-data-type-conversion-float-from-int">why conversion from int to float is helpful here</a>.


In [112]:
# specify input dimensions of each image
img_rows, img_cols = 28, 28
#input_shape = (img_rows, img_cols, 1)

# batch size, number of classes, epochs
batch_size = 128
num_classes_1 = 10
num_classes_2 = 19
epochs = 12

Let's now reshape the array `x_train` from shape `(20000, 28, 28)`to `(20000, 28, 28, 1)`. Analogously for `x_test`.

In [113]:
x_train_new.shape

(20000, 785)

In [114]:
y_train_new[0]

array([0., 3.])

Now let's reshape `y_train` from `(20000,)` to `(20000, 10)`. This can be conveniently done using the keras' `utils` module.

In [115]:
# Getting two train outputs
y_train_new_1=[]
y_train_new_2=[]
for i in range(y_train_new.shape[0]):
  y_train_new_1.append(y_train_new[i][0])
  y_train_new_2.append(y_train_new[i][1])

print(y_train_new_1[2])
print(y_train_new_2[2])

0.0
0.0


In [116]:
# Getting two test outputs
y_test_new_1=[]
y_test_new_2=[]
for i in range(y_test_new.shape[0]):
  y_test_new_1.append(y_test_new[i][0])
  y_test_new_2.append(y_test_new[i][1])

print(y_test_new_1[1])
print(y_test_new_2[1])

2.0
4.0


In [117]:
# convert class labels (from digits) to one-hot encoded vectors

y_train_new_1 = tf.keras.utils.to_categorical(y_train_new_1, num_classes_1)
y_train_new_2 = tf.keras.utils.to_categorical(y_train_new_2, num_classes_2)

#Printing examples
print(y_train_new_1[0])
print(y_train_new_2[0])

[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [118]:
y_test_new_1 = tf.keras.utils.to_categorical(y_test_new_1, num_classes_1)
y_test_new_2 = tf.keras.utils.to_categorical(y_test_new_2, num_classes_2)

Finally, let's convert the data type of `x_train` and `x_test` from int to float and normalise the images.

In [119]:
# originally, the pixels are stored as ints
x_train_new.dtype

dtype('float64')

In [120]:
x_train_new= x_train_new.astype('float32')
x_train_new /= 255

In [121]:
x_test_new= x_test_new.astype('float32')
x_test_new /= 255

## 3. Building the Model

In [122]:
# Input Parameters
n_input = 784 # number of features
n_hidden_1 = 300
n_hidden_2 = 100
n_hidden_3 = 100
n_hidden_4 = 200
num_digits_1 = 10
num_digits_2 = 19

In [123]:
import tensorflow as tf
Inp = tf.keras.Input(shape=(784,), name="Inp")
x = Dense(n_hidden_1, activation='relu', name = "Hidden_Layer_1")(Inp)
x = Dense(n_hidden_2, activation='relu', name = "Hidden_Layer_2")(x)
x = Dense(n_hidden_3, activation='relu', name = "Hidden_Layer_3")(x)
x = Dense(n_hidden_4, activation='relu', name = "Hidden_Layer_4")(x)
output = Dense(num_digits_1, activation='softmax', name = "Output_Layer")(x)

In [124]:
# Our model would have '6' layers - input layer, 4 hidden layer and 1 output layer
model = tf.keras.Model(Inp, output)
model.summary() 

Model: "model_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Inp (InputLayer)             [(None, 784)]             0         
_________________________________________________________________
Hidden_Layer_1 (Dense)       (None, 300)               235500    
_________________________________________________________________
Hidden_Layer_2 (Dense)       (None, 100)               30100     
_________________________________________________________________
Hidden_Layer_3 (Dense)       (None, 100)               10100     
_________________________________________________________________
Hidden_Layer_4 (Dense)       (None, 200)               20200     
_________________________________________________________________
Output_Layer (Dense)         (None, 10)                2010      
Total params: 297,910
Trainable params: 297,910
Non-trainable params: 0
_____________________________________________________

In [125]:
# Insert Hyperparameters
learning_rate = 0.01
training_epochs = 100
batch_size = 20
sgd = tf.keras.optimizers.SGD(lr=learning_rate)

  "The `lr` argument is deprecated, use `learning_rate` instead.")


In [126]:
# We rely on the plain vanilla Stochastic Gradient Descent as our optimizing methodology
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

In [127]:
#history1 = model.fit(x_train_new, y_train_new_2,
 #                    batch_size = batch_size,
  #                   epochs = training_epochs,
   #                  verbose = 2,
    #                 validation_data=(x_test_new, y_test_new_2))

In [128]:
x_train_new.shape
x_train_new[:,:784].shape

(20000, 784)

In [129]:
x_test_new.shape
x_test_new[:,:784].shape
y_test_new_1.shape

(10000, 10)

In [130]:
history1 = model.fit(x_train_new[:,:784], y_train_new_1,
                     batch_size = batch_size,
                     epochs = training_epochs,
                     verbose = 2,
                     validation_data=(x_test_new[:,:784], y_test_new_1))

Epoch 1/100
1000/1000 - 4s - loss: 0.9960 - accuracy: 0.7217 - val_loss: 0.3634 - val_accuracy: 0.8980
Epoch 2/100
1000/1000 - 3s - loss: 0.3199 - accuracy: 0.9078 - val_loss: 0.2722 - val_accuracy: 0.9208
Epoch 3/100
1000/1000 - 3s - loss: 0.2366 - accuracy: 0.9311 - val_loss: 0.2436 - val_accuracy: 0.9313
Epoch 4/100
1000/1000 - 3s - loss: 0.1891 - accuracy: 0.9447 - val_loss: 0.2325 - val_accuracy: 0.9288
Epoch 5/100
1000/1000 - 3s - loss: 0.1539 - accuracy: 0.9541 - val_loss: 0.1943 - val_accuracy: 0.9450
Epoch 6/100
1000/1000 - 4s - loss: 0.1273 - accuracy: 0.9635 - val_loss: 0.1813 - val_accuracy: 0.9458
Epoch 7/100
1000/1000 - 3s - loss: 0.1072 - accuracy: 0.9683 - val_loss: 0.1632 - val_accuracy: 0.9524
Epoch 8/100
1000/1000 - 3s - loss: 0.0904 - accuracy: 0.9748 - val_loss: 0.1613 - val_accuracy: 0.9542
Epoch 9/100
1000/1000 - 3s - loss: 0.0771 - accuracy: 0.9790 - val_loss: 0.1541 - val_accuracy: 0.9562
Epoch 10/100
1000/1000 - 3s - loss: 0.0648 - accuracy: 0.9839 - val_loss:

In [131]:
y_train_predict = model.predict(x_train_new[:,:784])
y_train_predict.shape

(20000, 10)

In [132]:
y_train_predict[0]

array([9.99805510e-01, 1.09994566e-11, 1.16917169e-08, 7.05988490e-13,
       1.46751208e-10, 1.92718639e-04, 1.76096421e-06, 9.93966454e-12,
       2.22715429e-10, 2.57098336e-11], dtype=float32)

In [133]:
y_train_predict /= 255

In [134]:
y_train_predict[0]

array([3.92080611e-03, 4.31351223e-14, 4.58498690e-11, 2.76858225e-15,
       5.75494947e-13, 7.55759345e-07, 6.90574220e-09, 3.89790773e-14,
       8.73393859e-13, 1.00822874e-13], dtype=float32)

In [135]:
x_train_sum = np.zeros((y_train_predict.shape[0],y_train_predict.shape[1]+1))
x_train_sum.shape

(20000, 11)

In [136]:
y_test_sum=y_test_new_1/255
y_test_sum[0]

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.00392157, 0.        , 0.        ],
      dtype=float32)

In [137]:
x_test_sum = np.zeros((y_test_sum.shape[0],y_test_sum.shape[1]+1))
x_test_sum.shape

(10000, 11)

In [138]:
y_train_predict.shape

(20000, 10)

In [139]:
x_train_new[:,784:].shape

(20000, 1)

In [140]:
x_train_sum = np.append(y_train_predict,x_train_new[:,784:], axis=1)
x_train_sum.shape

(20000, 11)

In [141]:
y_test_new_1[0]

array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], dtype=float32)

In [142]:
x_test_sum = np.append(y_test_sum,x_test_new[:,784:], axis=1)
x_test_sum.shape

(10000, 11)

In [143]:
x_test_sum[0]

array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.00392157, 0.        , 0.        ,
       0.        ], dtype=float32)

In [144]:
#verifying one of the outputs
x_train_sum[0]

array([3.92080611e-03, 4.31351223e-14, 4.58498690e-11, 2.76858225e-15,
       5.75494947e-13, 7.55759345e-07, 6.90574220e-09, 3.89790773e-14,
       8.73393859e-13, 1.00822874e-13, 1.17647061e-02], dtype=float32)

In [145]:
Inp = tf.keras.Input(shape=(11,), name="Inp")
x = Dense(n_hidden_1, activation='relu', name = "Hidden_Layer_1")(Inp)
x = Dense(n_hidden_2, activation='relu', name = "Hidden_Layer_2")(x)
#x = Dense(n_hidden_3, activation='relu', name = "Hidden_Layer_3")(x)
#x = Dense(n_hidden_4, activation='relu', name = "Hidden_Layer_4")(x)
output_sum = Dense(num_digits_2, activation='softmax', name = "output_sum")(x)

In [146]:
model = tf.keras.Model(Inp, output_sum)
model.summary() 

Model: "model_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Inp (InputLayer)             [(None, 11)]              0         
_________________________________________________________________
Hidden_Layer_1 (Dense)       (None, 300)               3600      
_________________________________________________________________
Hidden_Layer_2 (Dense)       (None, 100)               30100     
_________________________________________________________________
output_sum (Dense)           (None, 19)                1919      
Total params: 35,619
Trainable params: 35,619
Non-trainable params: 0
_________________________________________________________________


In [147]:
# Insert Hyperparameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
adam = tf.keras.optimizers.Adam(lr=learning_rate)

  "The `lr` argument is deprecated, use `learning_rate` instead.")


In [148]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [149]:
history1 = model.fit(x_train_sum, y_train_new_2,
                     batch_size = batch_size,
                     epochs = training_epochs,
                     verbose = 2,
                     validation_data=(x_test_sum, y_test_new_2))

Epoch 1/15
200/200 - 1s - loss: 2.8168 - accuracy: 0.0999 - val_loss: 2.7846 - val_accuracy: 0.1003
Epoch 2/15
200/200 - 1s - loss: 2.7123 - accuracy: 0.0993 - val_loss: 2.5613 - val_accuracy: 0.1093
Epoch 3/15
200/200 - 1s - loss: 2.3774 - accuracy: 0.1535 - val_loss: 2.2288 - val_accuracy: 0.1732
Epoch 4/15
200/200 - 1s - loss: 2.1073 - accuracy: 0.2389 - val_loss: 1.9924 - val_accuracy: 0.2461
Epoch 5/15
200/200 - 1s - loss: 1.8917 - accuracy: 0.3338 - val_loss: 1.7911 - val_accuracy: 0.3473
Epoch 6/15
200/200 - 1s - loss: 1.7053 - accuracy: 0.4375 - val_loss: 1.6168 - val_accuracy: 0.5576
Epoch 7/15
200/200 - 1s - loss: 1.5436 - accuracy: 0.5602 - val_loss: 1.4673 - val_accuracy: 0.5749
Epoch 8/15
200/200 - 1s - loss: 1.4061 - accuracy: 0.6185 - val_loss: 1.3434 - val_accuracy: 0.6919
Epoch 9/15
200/200 - 1s - loss: 1.2842 - accuracy: 0.7334 - val_loss: 1.2284 - val_accuracy: 0.8296
Epoch 10/15
200/200 - 1s - loss: 1.1764 - accuracy: 0.7933 - val_loss: 1.1255 - val_accuracy: 0.8495

In [150]:
y_sum_predict = model.predict(x_test_sum)

In [151]:
y_sum_predict

array([[1.6711248e-19, 2.5723186e-15, 6.0204014e-10, ..., 3.8134782e-30,
        2.4433338e-36, 0.0000000e+00],
       [5.4183317e-08, 1.9327555e-05, 8.6119464e-03, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [1.2020197e-11, 1.6657062e-08, 7.1039300e-05, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       ...,
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 9.8409673e-06,
        1.8590388e-08, 7.4781382e-11],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.3156481e-03,
        1.9168036e-05, 1.9376029e-07],
       [9.7480702e-16, 5.4837558e-12, 2.2373673e-07, ..., 3.3311490e-36,
        0.0000000e+00, 0.0000000e+00]], dtype=float32)

In [153]:
y_test_new_2[0]

array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0.], dtype=float32)

In [154]:
import pandas as pd
df_y = pd.DataFrame(columns=['actual','predicted'])
df_y.actual = np.argmax(y_test_new_2,axis=-1)
df_y.predicted = np.argmax(y_sum_predict,axis=-1)
df_y

Unnamed: 0,actual,predicted
0,7,7
1,4,4
2,5,5
3,2,2
4,13,13
...,...,...
9995,5,5
9996,7,7
9997,13,13
9998,14,14
