# MNIST DATASET PREDICTION BY CNN
Accuracy: 99.66%

![link text](https://cdn-images-1.medium.com/max/600/1*2lSjt9YKJn9sxK7DSeGDyw.jpeg)



## Input library

In [0]:
import sys
import sklearn
import numpy as np
import os

try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
    IS_COLAB = True
except Exception:
    IS_COLAB = False

# TensorFlow 
import tensorflow as tf
from tensorflow import keras #this is quite useful to match our code with Keras documentation!

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")


# to make this notebook's output stable across runs
np.random.seed(42)
tf.random.set_seed(42)

## PREPARATION

### Load data

In [0]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train_full = X_train_full / 255.
X_test = X_test / 255.
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

X_train = X_train[..., np.newaxis]
X_valid = X_valid[..., np.newaxis]
X_test = X_test[..., np.newaxis]

### Image Augmentation Preprocessing

Normally, with other dataset, we use set of different kinds of augmention like:
* Shifting
* Flipping
* Rotating
* Shearing

But in this case, some kinds of augmention cannot work, because it would lead to an unexpected change in our digits. 

In [0]:
#Image Augmentation
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=20,  
    shear_range=0.15,
    fill_mode="nearest"
    )

image_data = image_generator.flow(X_train, y_train)

### Callback

Here I use patience = 10 for early stopping. Because sometimes, if the training result is too fluctuated and the patience is too small, it can lead to a too early stop training.

In [0]:
es_callback = keras.callbacks.EarlyStopping(
    monitor='val_accuracy',
    patience=10,
    verbose=1)

In [0]:
cp_callback = keras.callbacks.ModelCheckpoint(
        filepath='mybestmodel.h5',
        save_best_only=True,
        monitor='val_accuracy',
        verbose=1)

In [0]:
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', factor=0.5, # Create ReduceLROnPlateau Callback
                              patience=3, min_lr=0.00001)

### Optimizer

I prepare some other kinds of optimizer to replace Nadam.
But normally, those will not work well if I donot fine tunning the hyperparameter

In [0]:
initial_learning_rate = 0.0001
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)

In [0]:
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)

# TRAINING

## Data Augumentation

In [0]:
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(128, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam",
              metrics=["accuracy"])

model.fit(image_data, epochs=50,callbacks=[es_callback,cp_callback], validation_data=(X_valid, y_valid))

Train for 1719 steps, validate on 5000 samples
Epoch 1/50
Epoch 00001: val_accuracy improved from -inf to 0.98820, saving model to mybestmodel.h5
Epoch 2/50
Epoch 00002: val_accuracy improved from 0.98820 to 0.98920, saving model to mybestmodel.h5
Epoch 3/50
Epoch 00003: val_accuracy improved from 0.98920 to 0.99120, saving model to mybestmodel.h5
Epoch 4/50
Epoch 00004: val_accuracy did not improve from 0.99120
Epoch 5/50
Epoch 00005: val_accuracy did not improve from 0.99120
Epoch 6/50
Epoch 00006: val_accuracy did not improve from 0.99120
Epoch 7/50
Epoch 00007: val_accuracy improved from 0.99120 to 0.99340, saving model to mybestmodel.h5
Epoch 8/50
Epoch 00008: val_accuracy did not improve from 0.99340
Epoch 9/50
Epoch 00009: val_accuracy did not improve from 0.99340
Epoch 10/50
Epoch 00010: val_accuracy did not improve from 0.99340
Epoch 11/50
Epoch 00011: val_accuracy did not improve from 0.99340
Epoch 12/50
Epoch 00012: val_accuracy did not improve from 0.99340
Epoch 00012: earl

<tensorflow.python.keras.callbacks.History at 0x7f97b5875110>

In [0]:
new_model = tf.keras.models.load_model('mybestmodel.h5')
new_model.evaluate(X_test, y_test)



[0.021746549489399695, 0.9929]

## DA + Reduce learning rate on plateau

In [0]:
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(128, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", optimizer='nadam',
              metrics=["accuracy"])

model.fit(image_data, epochs=50,callbacks=[es_callback,cp_callback,reduce_lr], validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)

Train for 1719 steps, validate on 5000 samples
Epoch 1/50
Epoch 00001: val_accuracy did not improve from 0.99480
Epoch 2/50
Epoch 00002: val_accuracy did not improve from 0.99480
Epoch 3/50
Epoch 00003: val_accuracy did not improve from 0.99480
Epoch 4/50
Epoch 00004: val_accuracy did not improve from 0.99480
Epoch 5/50
Epoch 00005: val_accuracy did not improve from 0.99480
Epoch 6/50
Epoch 00006: val_accuracy did not improve from 0.99480
Epoch 7/50
Epoch 00007: val_accuracy did not improve from 0.99480
Epoch 8/50
Epoch 00008: val_accuracy did not improve from 0.99480
Epoch 9/50
Epoch 00009: val_accuracy did not improve from 0.99480
Epoch 10/50
Epoch 00010: val_accuracy did not improve from 0.99480
Epoch 11/50
Epoch 00011: val_accuracy did not improve from 0.99480
Epoch 12/50
Epoch 00012: val_accuracy did not improve from 0.99480
Epoch 13/50
Epoch 00013: val_accuracy did not improve from 0.99480
Epoch 14/50
Epoch 00014: val_accuracy did not improve from 0.99480
Epoch 15/50
Epoch 00015:

[0.01938119462709324, 0.9952]

In [0]:
new_model = tf.keras.models.load_model('mybestmodel.h5')
new_model.evaluate(X_test, y_test)



[0.019714194189903583, 0.9954]

## DA + ReduceLRonPlateau + 2 more Conv2D layers + 2 more Conv2D layers

Whether I can make it better by adding 2 more Conv2D layers

In [0]:
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(128, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", optimizer='nadam',
              metrics=["accuracy"])

model.fit(image_data, epochs=50,callbacks=[es_callback,cp_callback,reduce_lr], validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)

Train for 1719 steps, validate on 5000 samples
Epoch 1/50
Epoch 00001: val_accuracy did not improve from 0.99640
Epoch 2/50
Epoch 00002: val_accuracy did not improve from 0.99640
Epoch 3/50
Epoch 00003: val_accuracy did not improve from 0.99640
Epoch 4/50
Epoch 00004: val_accuracy did not improve from 0.99640
Epoch 5/50
Epoch 00005: val_accuracy did not improve from 0.99640
Epoch 6/50
Epoch 00006: val_accuracy did not improve from 0.99640
Epoch 7/50
Epoch 00007: val_accuracy did not improve from 0.99640
Epoch 8/50
Epoch 00008: val_accuracy did not improve from 0.99640
Epoch 9/50
Epoch 00009: val_accuracy did not improve from 0.99640
Epoch 10/50
Epoch 00010: val_accuracy did not improve from 0.99640
Epoch 11/50
Epoch 00011: val_accuracy did not improve from 0.99640
Epoch 12/50
Epoch 00012: val_accuracy did not improve from 0.99640
Epoch 13/50
Epoch 00013: val_accuracy did not improve from 0.99640
Epoch 14/50
Epoch 00014: val_accuracy did not improve from 0.99640
Epoch 15/50
Epoch 00015:

[0.01589879503288201, 0.9954]

## Add Dropout + L2 regularizer

Now the model become overfitting, I need to do something

In [0]:
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Dropout(0.5),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Dropout(0.5),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Dropout(0.5),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(128, activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.0001)),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", optimizer='nadam',
              metrics=["accuracy"])

model.fit(image_data, epochs=50,callbacks=[es_callback,cp_callback,reduce_lr], validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)

## Reduce the drop out

Now it is underfitting, I need reduce the amount of drop out

In [0]:
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),

    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),

    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),

    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),

    keras.layers.Flatten(),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(128, activation="relu",kernel_regularizer=tf.keras.regularizers.l2(0.0001)),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", optimizer='nadam',
              metrics=["accuracy"])

model.fit(image_data, epochs=50,callbacks=[es_callback,cp_callback,reduce_lr], validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)

Train for 1719 steps, validate on 5000 samples
Epoch 1/50
Epoch 00001: val_accuracy improved from -inf to 0.98900, saving model to mybestmodel.h5
Epoch 2/50
Epoch 00002: val_accuracy improved from 0.98900 to 0.99080, saving model to mybestmodel.h5
Epoch 3/50
Epoch 00003: val_accuracy did not improve from 0.99080
Epoch 4/50
Epoch 00004: val_accuracy improved from 0.99080 to 0.99220, saving model to mybestmodel.h5
Epoch 5/50
Epoch 00005: val_accuracy did not improve from 0.99220
Epoch 6/50
Epoch 00006: val_accuracy did not improve from 0.99220
Epoch 7/50
Epoch 00007: val_accuracy improved from 0.99220 to 0.99320, saving model to mybestmodel.h5
Epoch 8/50
Epoch 00008: val_accuracy did not improve from 0.99320
Epoch 9/50
Epoch 00009: val_accuracy improved from 0.99320 to 0.99340, saving model to mybestmodel.h5
Epoch 10/50
Epoch 00010: val_accuracy improved from 0.99340 to 0.99480, saving model to mybestmodel.h5
Epoch 11/50
Epoch 00011: val_accuracy did not improve from 0.99480
Epoch 12/50


[0.018727440229803324, 0.9966]

In [0]:
new_model = tf.keras.models.load_model('mynewbestmodel.h5')
new_model.evaluate(X_test, y_test)



[0.018727440229803324, 0.9966]

## Another approach with Batch Normalization and Global Average Pooling

* Because the model now is too deep, we can add BatchNormalization between each block to prevent the disapeared as well as explosed gradient situation
* And at the end of Conv block we can use the Global Average Pooling, it is the common practice

In [0]:
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
    keras.layers.BatchNormalization()
    keras.layers.Conv2D(64, kernel_size=3, padding="same", activation="relu"),
    keras.layers.MaxPool2D(),
    karas.layers.GlobalAveragePooling2D()
    keras.layers.Dropout(0.25),
    keras.layers.Dense(128, activation="relu",kernel_regularizer=tf.keras.regularizers.l2(0.0001)),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy", optimizer='nadam',
              metrics=["accuracy"])

model.fit(image_data, epochs=50,callbacks=[es_callback,cp_callback,reduce_lr], validation_data=(X_valid, y_valid))
model.evaluate(X_test, y_test)