#### MNIST Convolutional Neural network Capstone Project
            Saurav Bakshi (Team 4)
            


**Task 1**

Implement a simple CNN with the following parameters fixed. 
A Convolutional layer will have 32 neurons (feature maps) and a 5x5 feature detector.

**Key hyperparameters**
1. Learning Rate = 0.01
2. Activation = **Sigmoid** for Neural Network layers and **Softmax** for output layer
3. Optimizer = Stochastic Gradient Descent (SGD)
4. Epochs = 5
5. Batch Size = 128
6. Metrics = Accuracy
7. Loss Function = Categorical Cross Entropy

Assumption - The dataset will be split into Train, Validation and Test. However for the task 1, validation data will not be used.

**Hypothesis 1 - The CNNs model should still result in a high accuracy even with basic hyperparameters**

In [1]:
import numpy as np
import random as python_random
import os
import tensorflow as tf
print(tf.__version__)
os.environ["PYTHONHASHSEED"] = "0"
np.random.seed(123)
python_random.seed(123)
tf.random.set_seed(123)


import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from mlxtend.evaluate import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import auc, roc_auc_score, roc_curve, accuracy_score


from tensorflow.keras.datasets import mnist
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.models import Sequential




2.3.0


#### Data Capture and preprocessing

In [2]:
print("[INFO] Reading MNIST...")
((train_images, train_labels), (testX, testY)) = mnist.load_data()


[INFO] Reading MNIST...


In [3]:
class_types = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]

In [4]:
# scale data to the range of [0, 1]
train_images = train_images.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

In [5]:
# sample out validation data from trainX
(trainX, valX, trainY, valY) = train_test_split(train_images, train_labels,
                                                test_size=0.2, stratify=train_labels, random_state=42)

In [6]:
# reshape data for input to the convolutional models
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
valX = valX.reshape((valX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28 , 1))

In [7]:
# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
valY = lb.transform(valY)
testY = lb.transform(testY)

In [8]:
def build_task1_model(activation, n_classes):
    # Task 1
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [9]:
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_epochs = 5
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"

In [10]:
print(trainX.shape, trainY.shape)

(48000, 28, 28, 1) (48000, 10)


In [11]:
%%timeit
t1_model = build_task1_model(activation, n_classes)
t1_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)
H = t1_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
5.58 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [12]:
t1_model = build_task1_model(activation, n_classes)
t1_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)
H = t1_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)
t1_model.summary()

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
activation_16 (Activation)   (None, 26, 26, 32)        0         
_________________________________________________________________
flatten_8 (Flatten)          (None, 21632)             0         
_________________________________________________________________
dense_8 (Dense)              (None, 10)                216330    
_________________________________________________________________
activation_17 (Activation)   (None, 10)                0         
Total params: 216,650
Trainable params: 216,650
Non-trainable params: 0
_________________________________________________________________


In [13]:
# evaluate the network
predictions = t1_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.8912


#### Task 1 Findings

1. Model Accuracy is 89.12%
2. The average time taken to train the model on a non-gpu PC is - 5.58s on MNIST data

### Task 2
- Increase the complexity of the CNN by adding multiple convolution and dense layers. 
- Add one more convolutional layer with 32 neurons (feature maps) and a 5x5 feature detector. 
- Add a dense layer with 128 nodes.

**Key hyperparameters**
1. Learning Rate = 0.01
2. Activation = **Sigmoid** for Neural Network layers and **Softmax** for output layer
3. Optimizer = Stochastic Gradient Descent (SGD)
4. Epochs = 5
5. Batch Size = 128
6. Metrics = Accuracy
7. Loss Function = Categorical Cross Entropy

Assumption - The dataset will be split into Train, Validation and Test. However for the task 1, validation data will not be used.

**Hypothesis 2 - When we increase the complexity of the Convolutional Neural Network, the time to train increases but the accuracy improves.**

In [15]:
def build_task2_model(activation, n_classes):
    # Task 2
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation(activation))
    model.add(Dense(128))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [16]:
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_epochs = 5
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"

In [17]:
t2_model = build_task2_model(activation, n_classes)

In [18]:
t2_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)

In [20]:
%%timeit
H = t2_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
16.4 s ± 69.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [21]:
H = t2_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)
t2_model.summary()

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_9 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
activation_18 (Activation)   (None, 26, 26, 32)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 24, 24, 32)        9248      
_________________________________________________________________
activation_19 (Activation)   (None, 24, 24, 32)        0         
_________________________________________________________________
dense_9 (Dense)              (None, 24, 24, 128)       4224      
_________________________________________________________________
activation_20 (Activation)   (None, 24, 24, 128)       0         
_________________________________________________________________
flat

In [22]:
# evaluate the network
predictions = t2_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.9226


#### Task 2 Findings

1. Model Accuracy is 92.26% - This prove the hypothesis to be true as the prediction performance has slightly increased.
2. The average time taken to train the model is - 16.4s on MNIST data which 3 times more than the Task 1 training time.

#### Task 3 

#### Improving the models built in Task 1 and Task 2
**Hypothesis T3_H1 - Increasing number of epochs can increase accuracy?**

In [23]:
# Hyperparameters for Hypothesis T3_H1
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"
# n_epochs = 5 initial number of epochs
# all other hyperparameters remaining the same , number of epochs is increased to 10
n_epochs = 10

In [24]:
# train the task 1 model
H = t1_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [25]:
# evaluate the network
predictions = t1_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.913


In [26]:
# train the task 2 model
H = t2_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [27]:
# evaluate the network
predictions = t2_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.9253


Number of epochs = 10

Task 1 model Accuracy : 91.31%

Task 2 model Accuracy : 92.53% 

#### Improving the models built in Task 1 and Task 2
**Hypothesis T3_H2 - Reducing the number of batches can increase accuracy?**

In [28]:
# Hyperparameters for Hypothesis T3_H1 - switching back the number of epochs to 5
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"
n_epochs = 5

Reducing the number of batches to 64

In [29]:
# train the task 1 model

H = t1_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [30]:
# evaluate the network
predictions = t1_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.9111


In [31]:
# train the task 2 model

H = t2_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [32]:
# evaluate the network
predictions = t2_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.9212


**Reducing the batch size also shows some improvement in the Accuracy of the model.**


Task 1 model (Baseline) Accuracy - 89.15%
Task 1 model Accuracy after reducing the batch size to 64 - 91.11%

Task 2 model (Baseline) Accuracy - 91.58%
Task 2 model Accuracy after reducing the batch size to 64 - 92.12%

Q2 - Using only convolutional layers will hyperparameter optimization have any impact help in increasing the accuracy?
- Number of layers
- Number of nodes
- learning rate


#### Improving the models built in Task 1 and Task 2
**Hypothesis T3_H3 - Increasing the number of of layers can increase accuracy?**

In [33]:
def build_task3h3_model(activation, n_classes):
    # Task 2
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation(activation))
    #adding one more conv2d layer
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation(activation))
    model.add(Dense(128))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [35]:
# Hyperparameters for Hypothesis T3_H3
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"
n_epochs = 5

In [37]:
t3h3_model = build_task3h3_model(activation, n_classes)
t3h3_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)

In [39]:
# train the task 1 model

H = t3h3_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=batch_size)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [40]:
# evaluate the network
predictions = t3h3_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.0958


Hypothesis T3H3 is not true.
We can see that adding one more Conv2D layer has actually degraded the performances. It needs to be tested if adding a pooling layers would improve the accuracy

#### Improving the models built in Task 1 and Task 2
**Hypothesis T3_H4 - Increasing the number of nodes in Conv2D can increase accuracy?**

In [41]:
def build_task3h4_model(activation, n_classes):
    # Task 2
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    #increasing the number of nodes to 64
    model.add(Conv2D(64, (3, 3)))
    model.add(Activation(activation))
    model.add(Dense(128))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [42]:
# Hyperparameters for Hypothesis T3_H4
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"
n_epochs = 5

In [43]:
t3h4_model = build_task3h4_model(activation, n_classes)
t3h4_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)

In [44]:
# train the task 3 model

H = t3h4_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=batch_size)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [51]:
# evaluate the network
predictions = t3h4_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.0958


Hypothesis T3H4 is not true.
We can see that adding more nodes to Conv2D layer has actually degraded the performance. It needs to be tested if adding a pooling layers would improve the accuracy or more epochs can improve the accuracy.

#### Improving the models built in Task 1 and Task 2
**Hypothesis T3_H5 - Increasing the number of nodes in Dense can increase accuracy?**

In [46]:
def build_task3h5_model(activation, n_classes):
    # Task 2
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation(activation))
    #increasing the number of nodes to 256
    model.add(Dense(256))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [48]:
# Hyperparameters for Hypothesis T3_H4
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"
n_epochs = 5

In [49]:
t3h5_model = build_task3h5_model(activation, n_classes)
t3h5_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)

In [50]:
H = t3h5_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=batch_size)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [52]:
# evaluate the network
predictions = t3h5_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.0958


Hypothesis T3H5 is not true.
We can see that adding one more nodes to Dense layer has actually degraded the performances. It needs to be tested if adding a pooling layers would improve the accuracy or it may require more epochs

#### Improving the models built in Task 1 and Task 2
**Hypothesis T3_H6 - Reducing the learning rate can improve accuracy?**

In [53]:
def build_task3h6_model(activation, n_classes):
    # Task 2
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation(activation))
    model.add(Dense(128))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [54]:
# Hyperparameters for Hypothesis T3_H4
activation = "sigmoid"
# reducing the learning rate to 0.001
learning_rate = 0.001
opt = SGD(learning_rate)
batch_size = 128
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"
n_epochs = 5

In [55]:
t3h6_model = build_task3h6_model(activation, n_classes)
t3h6_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)

In [56]:
H = t3h6_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=batch_size)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [57]:
# evaluate the network
predictions = t3h6_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.0958


Hypothesis T3H6 is not true.
We can see that reducing learning rate has actually degraded the performances. It needs to be tested if adding a pooling layers would improve the accuracy or it may require more epochs

#### Improving the models built in Task 1 and Task 2
**Hypothesis T3_H7 - Increasing the learning rate can improve accuracy?**

In [58]:
def build_task3h7_model(activation, n_classes):
    # Task 2
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation(activation))
    model.add(Dense(128))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model


In [59]:
# Hyperparameters for Hypothesis T3_H4
activation = "sigmoid"
# Increasing the learning rate to 0.1
learning_rate = 0.1
opt = SGD(learning_rate)
batch_size = 128
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"
n_epochs = 5

In [60]:
t3h7_model = build_task3h7_model(activation, n_classes)
t3h7_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)

In [61]:
H = t3h7_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=batch_size)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [62]:
# evaluate the network
predictions = t3h7_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.1135


Hypothesis T3H7 is not true.
We can see that reducing learning rate has actually degraded the performances. It needs to be tested if adding a pooling layers would improve the accuracy or it may require more epochs