# SANDEEP BHAGAT - MINST DATA 

**STEP 1 : Data Loading & Prepration**

In [3]:
from keras.datasets import mnist
from matplotlib import pyplot
from tensorflow.keras.utils import to_categorical

# Load mnist dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

n_train_img = x_train.shape[0]
n_test_images = x_test.shape[0]
x_dim, y_dim = x_test.shape[1:]

x_train = x_train.reshape((n_train_img, x_dim, y_dim, 1))
x_train = x_train.astype('float32') / 255

x_test = x_test.reshape((n_test_images, x_dim, y_dim, 1))
x_test = x_test.astype('float32') / 255

y_train = to_categorical(y_train)
y_test =  to_categorical(y_test)

print("Training data shape: ", x_train.shape)  
print("Test data shape", x_test.shape) 

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
Training data shape:  (60000, 28, 28, 1)
Test data shape (10000, 28, 28, 1)


# TASK 1 - Build a Simple Neural Network (CNN) - 32 neurons (feature maps) and a 5x5 feature detectors





**Hypothesis **: Expecting Higher Accuracy (more than 75%)

In [4]:
from keras.layers import Dense  
from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers import Flatten

model = Sequential()
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
model.add(Flatten())
model.add(Dense(units=y_test.shape[1], activation='softmax'))
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 24, 24, 32)        832       
_________________________________________________________________
flatten_1 (Flatten)          (None, 18432)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                184330    
Total params: 185,162
Trainable params: 185,162
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

In [7]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [8]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.3053633717060089
Test accuracy: 0.9117000102996826


***Hypothesis for this task 1 is correct - more than 75%***

# TASK 2 - Increase the complexity of CNN by adding multiple convolution and dense layers - one convolutional layer with 32 neurons & 5x5 feature detector and a dense layer with 128 nodes.

In [9]:
model = Sequential()
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
model.add(Flatten())
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(y_test.shape[1], activation='softmax'))
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 24, 24, 32)        832       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 20, 20, 32)        25632     
_________________________________________________________________
flatten_2 (Flatten)          (None, 12800)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               1638528   
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
Total params: 1,666,282
Trainable params: 1,666,282
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

In [11]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [12]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 1.2022727409362792
Test accuracy: 0.7851999998092651




*   Overall test accuracy has decreased from 90% to 78%


*   Significient increase in Training time is increased - taking 3 ms each 




Next iteration, will try to use MaxPooling to increase training speed - MaxPooling2D(pool_size=2)

In [20]:
from keras.layers import MaxPooling2D
model = Sequential()
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Flatten())
model.add(Dense(128, activation='sigmoid'))
MaxPooling2D(pool_size=2)
model.add(Dense(y_test.shape[1], activation='softmax'))
MaxPooling2D(pool_size=2)
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 24, 24, 32)        832       
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 20, 20, 32)        25632     
_________________________________________________________________
flatten_5 (Flatten)          (None, 12800)             0         
_________________________________________________________________
dense_8 (Dense)              (None, 128)               1638528   
_________________________________________________________________
dense_9 (Dense)              (None, 10)                1290      
Total params: 1,666,282
Trainable params: 1,666,282
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

In [22]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [23]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.8912634126663208
Test accuracy: 0.800599992275238


Not seen much improvement by MaxPooling2D however some improvement observed in Loss.
Next tyring to add epochs to observe how it is affecting the loss.

In [27]:
history = model.fit(x_train, y_train, batch_size=128, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [28]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.30378393540382387
Test accuracy: 0.911899983882904


Not seen much improvement by MaxPooling2D in execution time however there is significient improvement in accuracy (89% vs 91%) & loss (0.89 vs 0.30)

In [36]:
from keras.layers import MaxPooling2D
model = Sequential()
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Flatten())
model.add(Dense(128, activation='sigmoid'))
MaxPooling2D(pool_size=2)
model.add(Dense(y_test.shape[1], activation='softmax'))
MaxPooling2D(pool_size=2)
model.summary()

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_13 (Conv2D)           (None, 24, 24, 32)        832       
_________________________________________________________________
flatten_9 (Flatten)          (None, 18432)             0         
_________________________________________________________________
dense_16 (Dense)             (None, 128)               2359424   
_________________________________________________________________
dense_17 (Dense)             (None, 10)                1290      
Total params: 2,361,546
Trainable params: 2,361,546
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

In [38]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [39]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.5401181463241577
Test accuracy: 0.8673999905586243


Reduction of 1 CNN layer result - faster training, minor effect of loss & accuracy

> Indented block



In [49]:
from keras.layers import MaxPooling2D
model = Sequential()
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Flatten())
model.add(Dense(128, activation='sigmoid'))
MaxPooling2D(pool_size=2)
model.add(Dense(y_test.shape[1], activation='softmax'))
MaxPooling2D(pool_size=2)
model.summary()

Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_23 (Conv2D)           (None, 24, 24, 32)        832       
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 20, 20, 32)        25632     
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 16, 16, 32)        25632     
_________________________________________________________________
flatten_13 (Flatten)         (None, 8192)              0         
_________________________________________________________________
dense_24 (Dense)             (None, 128)               1048704   
_________________________________________________________________
dense_25 (Dense)             (None, 10)                1290      
Total params: 1,102,090
Trainable params: 1,102,090
Non-trainable params: 0
___________________________________________

In [0]:
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

In [52]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [53]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 2.3032748836517336
Test accuracy: 0.11349999904632568


Addition of 1 CNN layer have adverse effect on all parameters - test time (3M ->4M) , accuracy (80% => 11%) , Loss (0.89 -> 2.3)

# Task 3 - Improve models in Task 1 & Task 2
# Using only convolutional layers, will hyper-parameter optimization (no of layer, no of the nodes, learning rate, etc) help in increasing the accuracy? If yes, implement the changes and report your results. 

In [0]:
from tensorflow.keras import optimizers

In [0]:
lr = optimizers.schedules.ExponentialDecay(initial_learning_rate=0.5, decay_steps=10000, decay_rate=0.9)
opt = optimizers.SGD(learning_rate=lr)

In [12]:
model = Sequential()
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
model.add(Conv2D(32, (5,5), activation='sigmoid', input_shape=(x_dim, y_dim,1)))
model.add(Flatten())
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(y_test.shape[1], activation='softmax'))
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 24, 24, 32)        832       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 20, 20, 32)        25632     
_________________________________________________________________
flatten_3 (Flatten)          (None, 12800)             0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               1638528   
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1290      
Total params: 1,666,282
Trainable params: 1,666,282
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accuracy'])

In [14]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [15]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.09087087419256568
Test accuracy: 0.9728000164031982


# **Final Results**

Simple Neural Result -> accuracy 74%, Loss 1.48

Basic CNN Result -> accuracy 90%, Loss 35%

Taks 2 CNN Result -> accuracy 80%, Loss 8%

Taks 3 CNN Result ->  accuracy 97%, Loss : 9%


# Task : 4 Change any parameter/architecture to improve the quality metrics

In [0]:
from tensorflow.keras import optimizers
from keras.layers import MaxPooling2D

In [0]:
lr = optimizers.schedules.ExponentialDecay(initial_learning_rate=0.1, decay_steps=10000, decay_rate=0.9)
opt = optimizers.SGD(learning_rate=lr)

In [26]:
model = Sequential()
model.add(Conv2D(32, (5,5), activation='relu', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Conv2D(32, (5,5), activation='relu', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Flatten())
model.add(Dense(y_test.shape[1], activation='softmax'))
model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_10 (Conv2D)           (None, 24, 24, 32)        832       
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 20, 20, 32)        25632     
_________________________________________________________________
flatten_6 (Flatten)          (None, 12800)             0         
_________________________________________________________________
dense_8 (Dense)              (None, 10)                128010    
Total params: 154,474
Trainable params: 154,474
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(optimizer=opt,loss='categorical_crossentropy',metrics=['accuracy'])

In [28]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [29]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.047628800696111286
Test accuracy: 0.9843000173568726


In [40]:
from tensorflow import keras
from tensorflow.keras import layers
# opt = keras.optimizers.Adam(learning_rate=0.01)
model = Sequential()
model.add(Conv2D(32, (5,5), activation='relu', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Conv2D(32, (5,5), activation='relu', input_shape=(x_dim, y_dim,1)))
MaxPooling2D(pool_size=2)
model.add(Flatten())
model.add(Dense(y_test.shape[1], activation='softmax'))
model.summary()

Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_20 (Conv2D)           (None, 24, 24, 32)        832       
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 20, 20, 32)        25632     
_________________________________________________________________
flatten_11 (Flatten)         (None, 12800)             0         
_________________________________________________________________
dense_13 (Dense)             (None, 10)                128010    
Total params: 154,474
Trainable params: 154,474
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [42]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [35]:
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.07575280610376649
Test accuracy: 0.9783999919891357


# Finding & Summary


*   The gradient descent problem was present in both fully-connected NN and CNN, as soon as they had more than 1 layer.
*   Optimizing the hyper parameters, especially the learning rate, improved the training significantly.
*   A key difference was the training time: The CNN (without any Max. Pooling  layers) had a much larger training time (~3min per epoch) compared to the FC models (~3s per layer).
*   Maybe max. pooling can increase the training performance of the CNN, because by theory the CNN should perform much better then the FC on an image recognition task like this.
*   Also a reduction of the layer size for the CNN might make it quicker while remaining as performant as the FC model.










