Here, we are going to use MNIST database.

The <strong>MNIST database</strong>, short for Modified National Institute of Standards and Technology database, is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning.

The MNIST database contains 60,000 training images and 10,000 testing images of digits written by high school students and employees of the United States Census Bureau.

In [1]:
#import packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras

In [2]:
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical

When working with *convolutional neural networks* in particular, we will need additional packages.

In [3]:
 # to add convolutional layers
from keras.layers.convolutional import Conv2D

 # to add pooling layers
from keras.layers.convolutional import MaxPooling2D

 # to flatten data for fully connected layers
from keras.layers import Flatten

***1. Convolution kernel*** is a filter that is used to extract the features from the images.\
***2. keras.layers.convolutional.Conv2D*** creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.\
***3. PyTorch Tensor*** is basically the same as a numpy array: it does not know anything about deep learning or computational graphs or gradients, and is just a generic n-dimensional array to be used for arbitrary numeric computation.\
***4. Downsampling***  means reducing the number of parameters ensuring higher computational speeds. It also makes output tolerant to small transitional changes in the input.\
***5. keras.layers.convolutional.MaxPooling2D*** downsamples the input along its spatial dimensions (height and width) by taking the maximum value over an input window.\
***6. Flattening*** is converting the data into a 1-dimensional array for inputting it to the next layer.\
***7. keras.layers.Flatten*** flattens the input without affecting the batch size.

The Keras library conveniently includes the MNIST dataset as part of its API.\
So, let's load the MNIST dataset from the Keras library. The dataset is readily divided into a training set and a test set.

In [4]:
# import the data
from keras.datasets import mnist

# load the data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [5]:
# reshape to be [samples][pixels][width][height]
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype('float32')

Now, let's convert the target variable into binary categories.

In [6]:
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

num_classes = y_test.shape[1] # number of categories


***1. to_categorical*** Using this, a numpy array (or) a vector which has integers that represent different categories, can be converted into a numpy array (or) a matrix which has binary values and has columns equal to the number of categories in the data.

In [7]:
num_classes

10

Next, let's define a function that creates our model.


### Convolutional NN with one set of convolutional and pooling layers.

In [8]:
def convolutional_model():
    
    # create model
    model = Sequential()
    model.add(Conv2D(16, (5, 5), strides=(1, 1), activation='relu', input_shape=(28, 28, 1)))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    
    model.add(Flatten())
    model.add(Dense(100, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    
    # compile model
    model.compile(optimizer='adam', loss='categorical_crossentropy',  metrics=['accuracy'])
    return model

***1. Sequential()*** groups a linear stack of layers into a ***tf.keras.Model***.
It provides training and inference features on this model.\
***2. .add()*** method is used tpo add layers to our NN model.\
***3. Conv2D(16, (5, 5), strides=(1, 1), activation='relu', input_shape=(28, 28, 1)) :*** Insise Conv2D(),



$\;\;\;\;\;\;$**a.** *parameter* ***16*** is known as *filter* and it's value is the number of output filters in convolution.\
$\;\;\;\;\;\;$**b.** *parameter* ***(5, 5)*** is the kernel size which is an integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a $\;\;\;\;\;\;$$\;\;\;$single integer to specify the same value for all spatial dimensions.\
$\;\;\;\;\;\;$**c.** *parameter* ***strides = (1, 1)*** specifies the strides of the convolution along with height ans width. A *Stride* is a component of convolutional neural networks $\;\;\;\;\;\;$$\;\;\;$tuned for the compression of images and video data.\
$\;\;\;\;\;\;$**d.** *parameter* ***activation = 'relu'*** means that the *Rectified Linear Unit* will be used as activation function.\
$\;\;\;\;\;\;$**e.** *parameter* ***input_shape = (28, 28, 1)*** is the dimension of input we provide and this is only used while creating the first layer.

***4. MaxPooling2D(pool_size=(2, 2), strides=(2, 2)) :***  Inside MaxPooling2D(),

$\;\;\;\;\;\;$**a.** *parameter* ***pool_size = (2, 2)*** is used to specify the size of input window.\
$\;\;\;\;\;\;$**b.** *parameter* ***strides = (2, 2)*** is used to shift window in each dimension.\


***5. Dense()*** parameter is used to denote that the we will have regular deeply connected neural network layer.\
***6. Dense(50, activation='relu', input_shape=(n_cols,))) :*** Inside Dense(), \
$\;\;\;\;\;\;$**a.** *parameter* ***50*** is used to denote that a single hidden layer will have 50 units.\
$\;\;\;\;\;\;$**b.** *parameter* ***strides = (1, 1)*** 

Finally, let's call the function to create the model, and then let's train it and evaluate it.


In [9]:
# build the model
model = convolutional_model()

# fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# evaluate the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: {} \n Error: {}".format(scores[1], 100-scores[1]*100))

Epoch 1/10
300/300 - 9s - loss: 0.6881 - accuracy: 0.9133 - val_loss: 0.1368 - val_accuracy: 0.9615 - 9s/epoch - 31ms/step
Epoch 2/10
300/300 - 8s - loss: 0.0911 - accuracy: 0.9742 - val_loss: 0.1112 - val_accuracy: 0.9684 - 8s/epoch - 28ms/step
Epoch 3/10
300/300 - 9s - loss: 0.0500 - accuracy: 0.9848 - val_loss: 0.0982 - val_accuracy: 0.9740 - 9s/epoch - 29ms/step
Epoch 4/10
300/300 - 10s - loss: 0.0322 - accuracy: 0.9899 - val_loss: 0.0983 - val_accuracy: 0.9763 - 10s/epoch - 32ms/step
Epoch 5/10
300/300 - 9s - loss: 0.0218 - accuracy: 0.9930 - val_loss: 0.0950 - val_accuracy: 0.9778 - 9s/epoch - 31ms/step
Epoch 6/10
300/300 - 9s - loss: 0.0149 - accuracy: 0.9948 - val_loss: 0.0938 - val_accuracy: 0.9801 - 9s/epoch - 31ms/step
Epoch 7/10
300/300 - 9s - loss: 0.0118 - accuracy: 0.9961 - val_loss: 0.0880 - val_accuracy: 0.9809 - 9s/epoch - 30ms/step
Epoch 8/10
300/300 - 9s - loss: 0.0159 - accuracy: 0.9945 - val_loss: 0.1180 - val_accuracy: 0.9777 - 9s/epoch - 31ms/step
Epoch 9/10
300

###############################################################################################################################

### Convolutional NN with two sets of convolution and pooling layers.

Let's create another convolutional model so that it has two convolutional and pooling layers instead of just one layer of each.


In [10]:
def convolutional_model_two():
    
    # create model
    model = Sequential()
    model.add(Conv2D(16, (5, 5), activation='relu', input_shape=(28, 28, 1)))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    
    model.add(Conv2D(8, (2, 2), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    
    model.add(Flatten())
    model.add(Dense(100, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    
    # Compile model
    model.compile(optimizer='adam', loss='categorical_crossentropy',  metrics=['accuracy'])
    return model

Now, let's call the function to create our new convolutional neural network, and then let's train it and evaluate it.


In [11]:
# build the model
model_two = convolutional_model_two()

# fit the model
model_two.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# evaluate the model
scores = model_two.evaluate(X_test, y_test, verbose=0)
print("Accuracy: {} \n Error: {}".format(scores[1], 100-scores[1]*100))

Epoch 1/10
300/300 - 11s - loss: 4.0288 - accuracy: 0.6773 - val_loss: 0.3713 - val_accuracy: 0.8973 - 11s/epoch - 38ms/step
Epoch 2/10
300/300 - 11s - loss: 0.2913 - accuracy: 0.9216 - val_loss: 0.2289 - val_accuracy: 0.9379 - 11s/epoch - 36ms/step
Epoch 3/10
300/300 - 11s - loss: 0.1893 - accuracy: 0.9473 - val_loss: 0.1668 - val_accuracy: 0.9545 - 11s/epoch - 36ms/step
Epoch 4/10
300/300 - 11s - loss: 0.1414 - accuracy: 0.9599 - val_loss: 0.1361 - val_accuracy: 0.9622 - 11s/epoch - 36ms/step
Epoch 5/10
300/300 - 11s - loss: 0.1150 - accuracy: 0.9671 - val_loss: 0.1145 - val_accuracy: 0.9655 - 11s/epoch - 36ms/step
Epoch 6/10
300/300 - 11s - loss: 0.0956 - accuracy: 0.9722 - val_loss: 0.1072 - val_accuracy: 0.9690 - 11s/epoch - 36ms/step
Epoch 7/10
300/300 - 11s - loss: 0.0830 - accuracy: 0.9759 - val_loss: 0.0963 - val_accuracy: 0.9737 - 11s/epoch - 37ms/step
Epoch 8/10
300/300 - 11s - loss: 0.0720 - accuracy: 0.9782 - val_loss: 0.0939 - val_accuracy: 0.9716 - 11s/epoch - 36ms/step
