<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-Libraries" data-toc-modified-id="Import-Libraries-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import Libraries</a></span></li><li><span><a href="#Load-Data" data-toc-modified-id="Load-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load Data</a></span></li><li><span><a href="#EDA-and-Processing" data-toc-modified-id="EDA-and-Processing-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>EDA and Processing</a></span></li><li><span><a href="#Double-CNN" data-toc-modified-id="Double-CNN-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Double CNN</a></span></li><li><span><a href="#Triple-CNN" data-toc-modified-id="Triple-CNN-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Triple CNN</a></span></li><li><span><a href="#Additional-mnist-data" data-toc-modified-id="Additional-mnist-data-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Additional mnist data</a></span></li><li><span><a href="#Submission" data-toc-modified-id="Submission-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Submission</a></span></li></ul></div>

* Hello guys, in this notebook I have specifically used 3 approaches to show Digit Recognition on MNIST dataset.

* The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST's training dataset, while the other half of the training set and the other half of the test set were taken from NIST's testing dataset.

* The three approaches are - Double Layer CN Network, Triple Layer CN Network and Last one being Triple Layer CN Network but with additonal mnist data.

* My submissions log:<br>
1) Basic ML algo - 99.021% acc <br>
2) CNN with Data Augmentation - 99.41% acc<br>
3) Double Layer CNN - 97.842% acc<br>
4) Triple Layer CNN with BatchNormalization - 99.621% acc<br>
5) Triple Layer CNN with BatchNormalization plus additonal data - 99.835% acc<br>

* I hope you learn something from this notebook. 

# Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import itertools

from keras.models import Sequential, load_model
from keras.layers import Flatten, Dense, MaxPool2D, Conv2D, Dropout, BatchNormalization
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
from tensorflow.keras.datasets import mnist
from keras.utils.np_utils import to_categorical

sns.set(style='white', context='notebook', palette='deep')

%matplotlib inline

# Load Data

In [None]:
df = pd.read_csv("../input/digit-recognizer/train.csv")
df.head()

In [None]:
df.shape

In [None]:
df2 = pd.read_csv("../input/digit-recognizer/test.csv")
df2.head()

# EDA and Processing

In [None]:
df2.shape

In [None]:
y_train = df['label']
X_train = df.drop(labels = ['label'], axis=1)

In [None]:
hist = y_train.value_counts()

In [None]:
hist.plot.bar()

In [None]:
df.isnull().sum()

In [None]:
#normalization
X_train = X_train / 255.0
df2 = df2 / 255.0

In [None]:
#reshape
X_train = X_train.values.reshape(-1, 28,28,1)
df2 = df2.values.reshape(-1, 28,28,1)

In [None]:
#label encoding
y_train = to_categorical(y_train, num_classes=10)

In [None]:
#split data into training ang testing
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size = 0.1, random_state = 2)

In [None]:
print("X_train shape: ", X_train.shape)
print("X_test shape: ", X_test.shape)
print("y_train shape: ", y_train.shape)
print("y_test shape: ", y_test.shape)

In [None]:
nrows = 2
ncols = 2

pic_indexes = np.random.randint(X_train.shape[0], size=(nrows*ncols))

f, axs = plt.subplots(nrows, ncols, figsize=(12, 8))
axs = axs.flatten()

picture_index  = 0

for picture_index, ax in zip(pic_indexes, axs):
        ax.imshow(X_train[picture_index].reshape(28, 28), cmap='gray')
        ax.set_title(f'Label: {y_train[picture_index]}')

plt.show()

In [None]:
g = plt.imshow(X_train[0][:,:,0])

In [None]:
y_train[0]

Run the below cells to get the respective model trained on the data.

# Double CNN

In [None]:
model = Sequential()

model.add(Conv2D(filters=32, kernel_size = (5,5), padding = 'Same', activation= 'relu', input_shape = (28,28,1)))
model.add(Conv2D(filters=32, kernel_size = (5,5), padding = 'Same', activation= 'relu'))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters=64, kernel_size = (3,3), padding = 'Same', activation= 'relu'))
model.add(Conv2D(filters=64, kernel_size = (3,3), padding = 'Same', activation= 'relu'))
model.add(MaxPool2D(pool_size = (2,2), strides = (2,2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax'))

In [None]:
optimizer = RMSprop(lr = 0.001, rho = 0.9, epsilon=1e-08, decay =0.0)

In [None]:
model.summary()

In [None]:
model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [None]:
learning_rate_reduction = ReduceLROnPlateau(monitor='val_loss', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

In [None]:
epochs = 50 #model1 => 30
batch_size = 86

In [None]:
#Data Augmentation
datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images


datagen.fit(X_train)

In [None]:
# Fit the model
history = model.fit(datagen.flow(X_train,y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = (X_test,y_test),
                              verbose = 2, steps_per_epoch=X_train.shape[0] // batch_size
                              , callbacks=[learning_rate_reduction])

In [None]:
# Confusion matrix 

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

# Predict the values from the validation dataset
Y_pred = model.predict(X_test)
# Convert predictions classes to one hot vectors 
Y_pred_classes = np.argmax(Y_pred,axis = 1) 
# Convert validation observations to one hot vectors
Y_true = np.argmax(y_test,axis = 1) 
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) 
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(10)) 

In [None]:
score = model.evaluate(X_test, y_test, verbose = 0)
print("Test Accuracy: ",score[1])

# Triple CNN

In [None]:
def get_newtriplecnn():
    return Sequential([
        Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same', input_shape = (28,28,1)),
        Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'),
        BatchNormalization(),
        MaxPool2D(pool_size=(2, 2)),
        Dropout(0.25),
        
        Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
        Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same' ),
        BatchNormalization(),
        MaxPool2D(pool_size=(2, 2)),
        Dropout(0.25),
        
        Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same' ),
        Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same' ),
        BatchNormalization(),
        MaxPool2D(pool_size=(2, 2)),
        Dropout(0.25),
        
        
        Flatten(),
          
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(0.4),
        
        Dense(64, activation='relu'),
        BatchNormalization(),
        Dropout(0.3),
        
        Dense(10, activation = "softmax")
        
    ])

In [None]:
model = get_newtriplecnn()
model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

In [None]:
#adding callbacks
callbacks1 = [ 
    EarlyStopping(monitor = 'loss', patience = 6), 
    ReduceLROnPlateau(monitor = 'loss', patience = 3), 
    ModelCheckpoint('model.best.hdf5', save_best_only=True) # saving the best model
]

In [None]:
history = model.fit(datagen.flow(X_train,y_train, batch_size=batch_size), epochs = 50, 
                               steps_per_epoch = X_train.shape[0] // batch_size,
                               validation_data = (X_test, y_test),
                               callbacks = callbacks1,
                             )

# Additional mnist data

Using additional data to increase the accuracy

In [None]:
def load_data(path):
    with np.load(path) as f:
        x_train, y_train = f['x_train'], f['y_train']
        x_test, y_test = f['x_test'], f['y_test']
        return (x_train, y_train), (x_test, y_test)

(x_train1, y_train1), (x_test1, y_test1) = load_data(r'Dataset/mnist.npz')

In [None]:
x_train1 = x_train1 / 255.0
x_test1 = x_test1 / 255.0

x_train1 = x_train1.reshape(-1, 28, 28, 1)
x_test1 = x_test1.reshape(-1, 28, 28, 1)

y_train1 = y_train1.reshape(y_train1.shape[0], 1)
y_test1 = y_test1.reshape(y_test1.shape[0], 1)

In [None]:
print(len(y_train1))
print(len(y_test1))

In [None]:
train = pd.read_csv(r"Dataset/train.csv")

X_train = train.drop(labels = ["label"], axis = 1)
y_train = train['label']
#y_train = to_categorical(y_train)

X_train /= 255.0
X_train = X_train.values.reshape(-1, 28, 28, 1)

#y_train = y_train.reshape(y_train.shape[0], 1)

In [None]:
y_train = y_train.to_numpy()

In [None]:
Add_X = np.vstack((x_train1, x_test1))

Add_y = np.vstack((y_train1, y_test1))

In [None]:
len(Add_y)

In [None]:
len(Add_X)

In [None]:
y_train = y_train.reshape(y_train.shape[0], 1)

In [None]:
len(y_train)

Adding original training data to the additional data

In [None]:
Add_X = np.vstack((Add_X, X_train))

Add_y = np.vstack((Add_y, y_train))

In [None]:
Add_y = to_categorical(Add_y)

In [None]:
len(Add_y)

In [None]:
len(Add_X)

In [None]:
#split data into training ang testing
Add_X_train, Add_X_test, Add_y_train, Add_y_test = train_test_split(Add_X, Add_y, test_size = 0.1, random_state = 2)

In [None]:
add_train_aug = ImageDataGenerator(
        featurewise_center = False,
        samplewise_center = False,
        featurewise_std_normalization = False, 
        samplewise_std_normalization = False,
        zca_whitening = False,
        horizontal_flip = False,
        vertical_flip = False,
        fill_mode = 'nearest',
        rotation_range = 10,  
        zoom_range = 0.1, 
        width_shift_range = 0.1, 
        height_shift_range = 0.1)
        

add_train_aug.fit(Add_X_train)
add_train_gen = add_train_aug.flow(Add_X_train, Add_y_train, batch_size=batch_size)

In [None]:
add_callbacks = [ 
    EarlyStopping(monitor = 'loss', patience = 6), 
    ReduceLROnPlateau(monitor = 'loss', patience = 3), 
    ModelCheckpoint('additional_model.best.hdf5', save_best_only=True) # saving the best model
]

In [None]:
def get_addcnn():
    return Sequential([
        Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same', input_shape = (28,28,1)),
        Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'),
        BatchNormalization(),
        MaxPool2D(pool_size=(2, 2)),
        Dropout(0.25),
        
        Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
        Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same' ),
        BatchNormalization(),
        MaxPool2D(pool_size=(2, 2)),
        Dropout(0.25),
        
        Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same' ),
        Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same' ),
        BatchNormalization(),
        MaxPool2D(pool_size=(2, 2)),
        Dropout(0.25),
        
        
        Flatten(),
          
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(0.4),
        
        Dense(64, activation='relu'),
        BatchNormalization(),
        Dropout(0.3),
        
        Dense(10, activation = "softmax")
        
    ])

In [None]:
model = get_addcnn()
model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.summary()

In [None]:
history = model.fit((add_train_gen), epochs = 50, 
                               steps_per_epoch = Add_X_train.shape[0] // batch_size,
                               validation_data = (Add_X_test, Add_y_test),
                               callbacks = add_callbacks,
                             )

In [None]:
model = load_model('additional_model.best.hdf5')

In [None]:
score = model.evaluate(Add_X_test, Add_y_test, verbose = 0)
print("Test Accuracy: ",score[1])

# Submission

In [None]:
result = model.predict(df2)

result = np.argmax(result, axis=1)

result = pd.Series(result, name='Label')

In [None]:
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),result],axis = 1)

submission.to_csv("submission_cnn.csv",index=False)