#### MNIST Convolutional Neural network Capstone Project
            Saurav Bakshi (Team 4)
            


**Task 1**

Implement a simple CNN with the following parameters fixed. 
A Convolutional layer will have 32 neurons (feature maps) and a 5x5 feature detector.

**Key hyperparameters**
1. Learning Rate = 0.01
2. Activation = **Sigmoid** for Neural Network layers and **Softmax** for output layer
3. Optimizer = Stochastic Gradient Descent (SGD)
4. Epochs = 5
5. Batch Size = 128
6. Metrics = Accuracy
7. Loss Function = Categorical Cross Entropy

Assumption - The dataset will be split into Train, Validation and Test. However for the task 1, validation data will not be used.

**Hypothesis 1 - The CNNs model should still result in a high accuracy even with basic hyperparameters**

In [None]:
import numpy as np
import random as python_random
import os
import tensorflow as tf
print(tf.__version__)
os.environ["PYTHONHASHSEED"] = "0"
np.random.seed(123)
python_random.seed(123)
tf.random.set_seed(123)


import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from mlxtend.evaluate import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import auc, roc_auc_score, roc_curve, accuracy_score


from tensorflow.keras.datasets import mnist
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.models import Sequential




#### Data Capture and preprocessing

In [8]:
print("[INFO] Reading MNIST...")
((train_images, train_labels), (testX, testY)) = mnist.load_data()


[INFO] Reading MNIST...


In [9]:
class_types = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]

In [10]:
# scale data to the range of [0, 1]
train_images = train_images.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

In [11]:
# sample out validation data from trainX
(trainX, valX, trainY, valY) = train_test_split(train_images, train_labels,
                                                test_size=0.2, stratify=train_labels, random_state=42)

In [21]:
# reshape data for input to the convolutional models
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
valX = valX.reshape((valX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28 , 1))

In [13]:
# convert the labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
valY = lb.transform(valY)
testY = lb.transform(testY)

In [14]:
def build_task1_model(activation, n_classes):
    # Task 1
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [15]:
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_epochs = 5
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"

In [16]:
print(trainX.shape, trainY.shape)

(48000, 28, 28, 1) (48000, 10)


In [25]:
%%timeit
t1_model = build_task1_model(activation, n_classes)
t1_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)
H = t1_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
30 s ± 152 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)


In [18]:
t1_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
activation (Activation)      (None, 26, 26, 32)        0         
_________________________________________________________________
flatten (Flatten)            (None, 21632)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                216330    
_________________________________________________________________
activation_1 (Activation)    (None, 10)                0         
Total params: 216,650
Trainable params: 216,650
Non-trainable params: 0
_________________________________________________________________


In [19]:
# evaluate the network
predictions = model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

Model Accuracy 0.8913


#### Task 1 Findings

1. Model Accuracy is 89.13%
2. The average time taken to train the model on a non-gpu PC is - 30s on MNIST data

### Task 2
- Increase the complexity of the CNN by adding multiple convolution and dense layers. 
- Add one more convolutional layer with 32 neurons (feature maps) and a 5x5 feature detector. 
- Add a dense layer with 128 nodes.

**Key hyperparameters**
1. Learning Rate = 0.01
2. Activation = **Sigmoid** for Neural Network layers and **Softmax** for output layer
3. Optimizer = Stochastic Gradient Descent (SGD)
4. Epochs = 5
5. Batch Size = 128
6. Metrics = Accuracy
7. Loss Function = Categorical Cross Entropy

Assumption - The dataset will be split into Train, Validation and Test. However for the task 1, validation data will not be used.

**Hypothesis 2 - When we increase the complexity of the Convolutional Neural Network, the time to train increases but the accuracy improves.**

In [26]:
def build_task2_model(activation, n_classes):
    # Task 2
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation(activation))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation(activation))
    model.add(Dense(128))
    model.add(Activation(activation))
    model.add(Flatten())
    model.add(Dense(n_classes))
    model.add(Activation("softmax"))
    return model

In [27]:
activation = "sigmoid"
learning_rate = 0.01
opt = SGD(learning_rate)
batch_size = 128
n_epochs = 5
n_classes = len(class_types)
metrics = ["accuracy"]
loss = "categorical_crossentropy"

In [28]:
t2_model = build_task2_model(activation, n_classes)

In [29]:
t2_model.compile(loss=loss, optimizer=opt,
            metrics=metrics)

In [30]:
%%timeit
H = t2_model.fit(trainX, trainY,
              epochs=n_epochs, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
 41/375 [==>...........................] - ETA: 39s - loss: 0.2839 - accuracy: 0.9150

KeyboardInterrupt: 

In [None]:
t2_model.summary()

In [None]:
# evaluate the network
predictions = t2_model.predict(testX, batch_size=128)
acc_score = accuracy_score(testY.argmax(axis=1),
                           predictions.argmax(axis=1))
print(f"Model Accuracy {acc_score}")

#### Task 2 Findings

1. Model Accuracy is 89.13%
2. The average time taken to train the model on a non-gpu PC is - 30s on MNIST data