<a href="https://colab.research.google.com/github/mehrotrasan16/Keras-Deep-Learning/blob/01-MNIST-CNN-99.29/01_keras_mnist_baseline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# TensorFlow and tf.keras
import tensorflow as tf
import numpy as np
from tensorflow import keras

#tf version should be 2.2 or higher
tf.__version__

'2.3.0'

In [None]:
keras.__version__

'2.4.0'

In [None]:
tf.test.is_gpu_available()

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.


True

In [None]:
#get data
(train_images, train_labels), (test_images, test_labels) = \
      keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [None]:
#scale model
train_images = train_images / 255.0
test_images = test_images / 255.0

In [None]:
#setup model
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(10, activation = tf.nn.softmax)
])

In [None]:
#compile model
model.compile(optimizer='sgd',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [None]:
#train model
model.fit(train_images, train_labels, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fc37d93dac8>

In [None]:
#evaluate
test_loss, test_acc = model.evaluate(test_images,  test_labels)
print('\nTest accuracy:', test_acc)


Test accuracy: 0.9078999757766724


In [None]:
# Predict on the first 5 test images.
predictions = model.predict(test_images[:5])

In [None]:
#Raw predictions
predictions.shape,predictions


((5, 10),
 array([[2.21342721e-04, 1.09743360e-06, 1.71652689e-04, 2.24272953e-03,
         4.42058081e-05, 1.10166693e-04, 2.70977330e-06, 9.93551016e-01,
         2.19001129e-04, 3.43613280e-03],
        [1.58161353e-02, 3.36178142e-04, 8.30395758e-01, 3.23872976e-02,
         1.15933904e-06, 2.55346242e-02, 8.19568485e-02, 4.08923142e-07,
         1.35638425e-02, 7.79689981e-06],
        [2.75393977e-04, 9.52982366e-01, 1.50614483e-02, 6.84671942e-03,
         8.62881890e-04, 2.08819425e-03, 4.37100045e-03, 5.17000630e-03,
         1.08089820e-02, 1.53307989e-03],
        [9.96784687e-01, 1.88348590e-08, 2.67934112e-04, 6.91862369e-05,
         7.07319032e-07, 1.30594219e-03, 1.04554673e-03, 2.86947645e-04,
         1.58192124e-04, 8.07901597e-05],
        [2.44874647e-03, 4.88645674e-05, 1.10137910e-02, 9.68728971e-04,
         8.74821544e-01, 1.58484804e-03, 7.43442960e-03, 1.95053741e-02,
         1.61125790e-02, 6.60610721e-02]], dtype=float32))

In [None]:
# Print our model's predictions
print(np.argmax(predictions, axis=1))

[7 2 1 0 4]


In [None]:
# Check our predictions against the ground truths
print(test_labels[:5]) # [7, 2, 1, 0, 4]

[7 2 1 0 4]


## Fully Connected Layers : Comparison Table

| Approach | Accuracy(%) |
|---|---|
| Base Model | 90.78 |
| Epoch inc to 15| 91.72 |
| ADAM Optimizer | 92.65 |
| 128 - Dense layer | 97.79 |
| 2 128 - Dense layers | 97.87   |
| Best of learning rate loop: 0.003 | 97.44  |
| with Learning Rate Decay | 97.86  |
| with Dropout layers | 97.86  |

**Conclusion**:
With Fully Connected DNNs, we seem to hit a cap of 97.86 no matter what we do. To get this to 99.3% we must incolve Conv2D , CNNs with Dropout layers. 

# Convolutional Neural Networks 

In [None]:
train_images = train_images.reshape(train_images.shape[0],28,28,1)
test_images = test_images.reshape(test_images.shape[0],28,28,1)

In [None]:
import math
def lr_decay(epoch):
    return 0.01 * math.pow(0.6,epoch)

lr_decay_callback = keras.callbacks.LearningRateScheduler(lr_decay,verbose=True)

# 98.48

In [None]:
#setup model
cnnmodel1 = keras.Sequential([
    keras.layers.Conv2D(kernel_size=3,filters=12,activation='relu',padding='same',input_shape=(28,28,1)),
    keras.layers.MaxPooling2D(pool_size=(2,2)),
    keras.layers.Conv2D(kernel_size=6,filters=24,activation='relu',padding='same',strides=3),
    keras.layers.MaxPooling2D(pool_size=(2,2)),
    keras.layers.Conv2D(kernel_size=6,filters=32,activation='relu',padding='same',strides=3),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation = tf.nn.softmax)    
])

In [None]:
#compile model
cnnmodel1.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [None]:
#train model
cnnmodel1.fit(train_images, train_labels, epochs=15,callbacks=[lr_decay_callback])


Epoch 00001: LearningRateScheduler reducing learning rate to 0.01.
Epoch 1/15

Epoch 00002: LearningRateScheduler reducing learning rate to 0.006.
Epoch 2/15

Epoch 00003: LearningRateScheduler reducing learning rate to 0.0036.
Epoch 3/15

Epoch 00004: LearningRateScheduler reducing learning rate to 0.0021599999999999996.
Epoch 4/15

Epoch 00005: LearningRateScheduler reducing learning rate to 0.001296.
Epoch 5/15

Epoch 00006: LearningRateScheduler reducing learning rate to 0.0007775999999999998.
Epoch 6/15

Epoch 00007: LearningRateScheduler reducing learning rate to 0.0004665599999999999.
Epoch 7/15

Epoch 00008: LearningRateScheduler reducing learning rate to 0.00027993599999999994.
Epoch 8/15

Epoch 00009: LearningRateScheduler reducing learning rate to 0.00016796159999999993.
Epoch 9/15

Epoch 00010: LearningRateScheduler reducing learning rate to 0.00010077695999999997.
Epoch 10/15

Epoch 00011: LearningRateScheduler reducing learning rate to 6.0466175999999974e-05.
Epoch 11/15

<tensorflow.python.keras.callbacks.History at 0x7fc37a21ea20>

In [None]:
#evaluate
test_loss, test_acc = cnnmodel1.evaluate(test_images,  test_labels)
print('\nTest accuracy:', test_acc)


Test accuracy: 0.9848999977111816


In [None]:
# Predict on the first 5 test images.
predictions = cnnmodel1.predict(test_images[:5])

In [None]:
#Raw predictions
predictions

array([[2.58821339e-13, 4.42130360e-10, 2.81399093e-06, 8.22663992e-10,
        4.22725073e-08, 1.46156628e-10, 3.79519306e-19, 9.99997020e-01,
        1.57904023e-10, 1.16600425e-07],
       [1.49077127e-12, 1.85843983e-12, 1.00000000e+00, 4.35442698e-12,
        2.57657827e-12, 6.03696712e-14, 1.64096066e-14, 5.49620523e-11,
        2.45187880e-11, 8.77612372e-16],
       [1.45968844e-16, 1.00000000e+00, 4.10229779e-14, 9.53375804e-19,
        3.30425687e-11, 5.42400597e-14, 4.69045679e-12, 2.23021471e-14,
        2.85816596e-12, 3.74989861e-13],
       [9.99956965e-01, 1.10112047e-14, 2.14770146e-10, 2.89034602e-10,
        1.29831657e-09, 1.60293254e-08, 3.27569251e-05, 1.62169421e-12,
        5.78023451e-09, 1.02208933e-05],
       [3.10790182e-18, 1.15170070e-14, 1.86831209e-13, 1.04194933e-20,
        1.00000000e+00, 8.31934267e-16, 6.68590888e-14, 4.18914306e-13,
        2.64455992e-13, 8.70124042e-11]], dtype=float32)

In [None]:
# Print our model's predictions
print(np.argmax(predictions, axis=1))

[7 2 1 0 4]


In [None]:
# Check our predictions against the ground truths
print(test_labels[:5]) # [7, 2, 1, 0, 4]

[7 2 1 0 4]


# 98.79

In [None]:
#setup model
cnnmodel2 = keras.Sequential([
    keras.layers.Conv2D(kernel_size=3,filters=12,activation='relu',padding='same',input_shape=(28,28,1)),
    keras.layers.Conv2D(kernel_size=6,filters=24,activation='relu',padding='same',strides=3),
    keras.layers.Conv2D(kernel_size=6,filters=32,activation='relu',padding='same',strides=3),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    #keras.layers.Dropout(0.25),
    keras.layers.Dense(10, activation = tf.nn.softmax)    
])

In [None]:
#compile model
cnnmodel2.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [None]:
#train model
cnnmodel2.fit(train_images, train_labels, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fc37644f208>

In [None]:
#evaluate
test_loss, test_acc = cnnmodel2.evaluate(test_images,  test_labels)
print('\nTest accuracy:', test_acc)


Test accuracy: 0.9879000186920166


In [None]:
# Predict on the first 5 test images.
predictions = cnnmodel2.predict(test_images[:5])

In [None]:
#Raw predictions
predictions

array([[9.1748900e-12, 9.1796556e-11, 2.7299302e-08, 3.6320422e-11,
        9.8894764e-12, 4.1397853e-13, 2.6194843e-17, 1.0000000e+00,
        6.2747745e-13, 2.2187985e-09],
       [7.1729658e-20, 1.1877107e-15, 1.0000000e+00, 1.1910056e-21,
        4.0960795e-23, 6.2814627e-26, 1.1696030e-18, 4.7109088e-23,
        8.9971780e-15, 2.7189868e-23],
       [4.1285803e-06, 9.9987817e-01, 3.0863303e-06, 2.2479698e-09,
        1.4219809e-06, 2.9214144e-07, 1.5567189e-06, 1.2510365e-06,
        1.1002855e-04, 1.6997017e-08],
       [9.9999952e-01, 1.8046469e-17, 2.3325680e-10, 7.8363465e-15,
        1.0223754e-11, 4.1493212e-11, 2.2690299e-08, 4.9674110e-12,
        2.4817688e-11, 4.6448841e-07],
       [2.7687352e-11, 4.4522434e-11, 2.8588606e-10, 8.1518204e-12,
        9.9994826e-01, 3.4906242e-10, 8.7942237e-11, 1.3688676e-11,
        1.2679978e-09, 5.1759605e-05]], dtype=float32)

In [None]:
# Print our model's predictions
print(np.argmax(predictions, axis=1))

[7 2 1 0 4]


In [None]:
# Check our predictions against the ground truths
print(test_labels[:5]) # [7, 2, 1, 0, 4]

[7 2 1 0 4]


# 98.94

In [None]:
#setup model
cnnmodel3 = keras.Sequential([
    keras.layers.Conv2D(kernel_size=3,filters=12,activation='relu',padding='same',input_shape=(28,28,1)),
    keras.layers.Conv2D(kernel_size=6,filters=24,activation='relu',padding='same',strides=3),
    keras.layers.Conv2D(kernel_size=6,filters=32,activation='relu',padding='same',strides=3),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.4),
    keras.layers.Dense(200, activation='relu'),
    keras.layers.Dropout(0.4),
    keras.layers.Dense(10, activation = tf.nn.softmax)    
])

In [None]:
#compile model
cnnmodel3.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [None]:
#train model
cnnmodel3.fit(train_images, train_labels, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fc374316c88>

In [None]:
#evaluate
test_loss, test_acc = cnnmodel3.evaluate(test_images,  test_labels)
print('\nTest accuracy:', test_acc)


Test accuracy: 0.9894999861717224


In [None]:
# Predict on the first 5 test images.
predictions = cnnmodel3.predict(test_images[:5])

In [None]:
#Raw predictions
predictions

array([[9.1748900e-12, 9.1796556e-11, 2.7299302e-08, 3.6320422e-11,
        9.8894764e-12, 4.1397853e-13, 2.6194843e-17, 1.0000000e+00,
        6.2747745e-13, 2.2187985e-09],
       [7.1729658e-20, 1.1877107e-15, 1.0000000e+00, 1.1910056e-21,
        4.0960795e-23, 6.2814627e-26, 1.1696030e-18, 4.7109088e-23,
        8.9971780e-15, 2.7189868e-23],
       [4.1285803e-06, 9.9987817e-01, 3.0863303e-06, 2.2479698e-09,
        1.4219809e-06, 2.9214144e-07, 1.5567189e-06, 1.2510365e-06,
        1.1002855e-04, 1.6997017e-08],
       [9.9999952e-01, 1.8046469e-17, 2.3325680e-10, 7.8363465e-15,
        1.0223754e-11, 4.1493212e-11, 2.2690299e-08, 4.9674110e-12,
        2.4817688e-11, 4.6448841e-07],
       [2.7687352e-11, 4.4522434e-11, 2.8588606e-10, 8.1518204e-12,
        9.9994826e-01, 3.4906242e-10, 8.7942237e-11, 1.3688676e-11,
        1.2679978e-09, 5.1759605e-05]], dtype=float32)

In [None]:
# Print our model's predictions
print(np.argmax(predictions, axis=1))

[7 2 1 0 4]


In [None]:
# Check our predictions against the ground truths
print(test_labels[:5]) # [7, 2, 1, 0, 4]

[7 2 1 0 4]


# 99.29

In [None]:
#setup model
cnnmodel4 = keras.Sequential([
    keras.layers.Conv2D(kernel_size=3,filters=12,use_bias=False,padding='same'),
    keras.layers.BatchNormalization(center=True,scale=False),
    keras.layers.Activation('relu'),

    keras.layers.Conv2D(kernel_size=6,filters=24,use_bias=False,padding='same',strides=2),
    keras.layers.BatchNormalization(center=True,scale=False),
    keras.layers.Activation('relu'),
    
    keras.layers.Conv2D(kernel_size=6,filters=32,use_bias=False,padding='same',strides=2),
    keras.layers.BatchNormalization(center=True,scale=False),
    keras.layers.Activation('relu'),
    
    keras.layers.Flatten(),
    
    keras.layers.Dense(128),
    keras.layers.BatchNormalization(center=True,scale=False),
    keras.layers.Activation('relu'),
    keras.layers.Dropout(0.4),

    keras.layers.Dense(200),
    keras.layers.BatchNormalization(center=True,scale=False),
    keras.layers.Activation('relu'),
    keras.layers.Dropout(0.4),
    
    keras.layers.Dense(10, activation = tf.nn.softmax)    
])

In [None]:
#compile model
cnnmodel4.compile(optimizer='adam',
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

In [None]:
#train model
cnnmodel4.fit(train_images, train_labels, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f461c2e4b38>

In [None]:
#evaluate
test_loss, test_acc = cnnmodel4.evaluate(test_images,  test_labels)
print('\nTest accuracy:', test_acc)


Test accuracy: 0.9929999709129333


In [None]:
# Predict on the first 5 test images.
predictions = cnnmodel4.predict(test_images[:5])

In [None]:
#Raw predictions
predictions

array([[8.15029988e-10, 6.32423962e-08, 1.49575797e-07, 4.90447150e-09,
        4.88744085e-08, 4.83876716e-10, 5.69438852e-09, 9.99999166e-01,
        8.34515027e-11, 6.13133807e-07],
       [1.54924024e-07, 3.56493821e-08, 9.99994516e-01, 1.02682804e-07,
        3.28987397e-08, 4.01041570e-12, 4.93736297e-06, 1.99132330e-07,
        1.24171269e-08, 5.16413845e-09],
       [3.17750465e-10, 1.00000000e+00, 1.05235687e-09, 1.21815127e-11,
        2.25314503e-10, 7.56320145e-11, 7.45349737e-09, 1.44219303e-08,
        1.30716185e-08, 2.39522256e-11],
       [9.99992847e-01, 5.15069551e-08, 9.99725547e-09, 2.39075315e-09,
        4.26711289e-08, 3.13056091e-07, 4.29830470e-06, 1.55122905e-07,
        1.62151912e-06, 7.66250992e-07],
       [1.31871081e-09, 2.82378380e-07, 5.03988240e-10, 1.55313193e-11,
        9.99982953e-01, 7.10562553e-10, 8.89375773e-09, 4.94240426e-09,
        1.31498590e-09, 1.68008792e-05]], dtype=float32)

In [None]:
# Print our model's predictions
print(np.argmax(predictions, axis=1))

[7 2 1 0 4]


In [None]:
# Check our predictions against the ground truths
print(test_labels[:5]) # [7, 2, 1, 0, 4]

[7 2 1 0 4]


# Comparison Table


| Approach | Accuracy(%) |
|---|---|
| Base 3x Conv2D | 98.79 |
| 3 x conv with Max Pool and lr Decay | 98.48 |
| Extra Dense layer with Dropout | 98.94 |
| 3x Conv, 2xDense, with Dropout & BatchNorm | 99.29|

**Conclusion**:
Batch Norm made an enourmous difference, allowing us to reach the required accuracy.