<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">MNIST DIGIT RECOGNITION </h1><a id = "1" ></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='color:white; background:#0096FF ; border:0' role="tab" aria-controls="home"><center>Quick Navigation</center></h3>

* [Importing Libraries](#1.1)
* [Data Preprocessing](#2)
    - [Load Training Dataset](#2.1)
    - [Split Data](#2.2)
    - [Image Preprocessing](#2.3)
* [Data Visualization](#3)
* [Activation function](#4)
* [Model Summary](#5)
* [Data Augmentation](#6)
* [Save and Load Model](#6.2)
* [Plot Accuracy and Loss curve](#7)
* [Evaluation](#8)

**Objective**
* To recognize digits from the MNIST dataset.

<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">1. IMPORTING LIBRARIES</h1><a id = "1.1" ></a>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn import decomposition

import tensorflow 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Dropout,MaxPool2D, LSTM, BatchNormalization
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.layers import ELU
from tensorflow.keras.losses import sparse_categorical_crossentropy, categorical_crossentropy

import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

<h1 style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">2. DATA PREPROCESSING</h1><a id = "2" ></a>

<h2  style='color:#0096FF; background:white ; border:0' class="list-group-item list-group-item-action active">2.1 LOAD TRAINING DATASET</h2><a id = "2.1" ></a>

In [None]:
loc = '../input/digit-recognizer'
train_data = pd.read_csv(loc+'/train.csv')

In [None]:
test_file = loc+"/test.csv"
test_data = pd.read_csv(test_file)

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Data Dimension</h3>

In [None]:
print(f"train.csv size is {train_data.shape}")
print(f"test.csv size is {test_data.shape}")

In [None]:
train_data.head()

In [None]:
train_data.describe()

**To Check how many pixel columns are there such that all their values are zero as we can see from the describe() function.**

In [None]:
df = train_data.drop(columns=['label'])
lst,ind = [],[]
for feature in df.columns:
    l = len(df.loc[df[feature] ==0])
    lst.append(l)
# 42000 is the number of datapoints of training dataset.
lst.count(42000)

**Observation**:
* There are 76 pixel columns among the 784 columns , 42000 datapoints whose all values are 0. 
* For sake of dimensionality reduction, these features can be removed.

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Missing values</h3>

In [None]:
train_data.isna().sum()

* No missing values

In [None]:
train_labels=train_data['label']

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">2.2 SPLIT DATA</h2><a id = "2.2" ></a>

In [None]:
img_rows, img_cols = 28, 28 # 28*28 dimension of image reshape
num_classes = 10 # 10 number of labels

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">2.3 PREPROCESSING</h2><a id = "2.3" ></a>

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">One hot Encoding</h3>

One hot encoding is one method of converting data to prepare it for an algorithm and get a better prediction. With one-hot, we convert each categorical value into a new categorical column and assign a binary value of 1 or 0 to those columns. Each integer value is represented as a binary vector.

In [None]:
def data_prep(raw):
    out_y = tensorflow.keras.utils.to_categorical(raw.label, num_classes)

    num_images = raw.shape[0]
    x_as_array = raw.values[:,1:]
    x_shaped_array = x_as_array.reshape(num_images, img_rows, img_cols, 1)
    # normalization
    out_x = x_shaped_array / 255
    return out_x, out_y

x, y = data_prep(train_data)

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Data Split</h3>

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=42)#ratio 90:10
x_train, x_val, y_train, y_val= train_test_split(x_train, y_train, test_size = 1/9, random_state=42)

* Splitting data into 90% training and 10% test data.
* Splitting the training data into 90% training data and 10% validation data

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Reshaping the Data to 4 dimension</h3>

In [None]:
# Normalization
test_data = test_data / 255  
# reshaping
test_data = test_data.values.reshape(-1,28,28,1)
test_data.shape

<h3  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">Data Dimension</h3>

In [None]:
print(f"Training data size is {x_train.shape}")
print(f"Training data size is {y_train.shape}")
print(f"Testing data size is {x_test.shape}")
print(f"Training data size is {y_test.shape}")

<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">3. VISUALIZE DATA</h1><a id = "3" ></a>

In [None]:
title=[j for i in range(1, 10) for j in range(0,10) if y_train[i][j] == 1]
title

In [None]:
sns.countplot(train_labels)
plt.title('Count plot of the MNIST Labels')
print(list(train_data.label.value_counts().sort_index()))

* From the above countplot label 1 is data point highest in number at 4684 points, label 5 is least with 3795 data points. 

In [None]:
plt.figure(figsize=(7,9))
for i in range(1, 10):
    plt.subplot(330 + i)
    plt.imshow(x_train[i], cmap=plt.get_cmap('gray'))
    plt.title(title[i-1])
    
plt.tight_layout()

<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">4. ACTIVATION FUNCTION</h1><a id = "4" ></a>


- **LeakyReLu**

In [None]:
alpha = 0.01
def LeakyReLu(z,alpha):
    if z >= 0:
        return z
    else :
        return alpha*z

In [None]:
leakyrelu = tensorflow.keras.layers.LeakyReLU(alpha=0.01)
x = np.arange(-3,4)
pd.Series(leakyrelu(x)).plot(kind='line')
plt.title('LeakyRelu function')
plt.xticks(range(len(x)),x)
plt.ylabel('LeakyRelu')
plt.xlabel('x')
plt.grid('on')
plt.show()

**Note**
* LeakyRelu is linear in the positive x axis and goes close to x axis when x<0 but does not touch x axis

* **Swish** <br>
Mathematical formula <br>
Y = X * sigmoid(X)
  = X /(1 + e^-X)
 

In [None]:
x = np.arange(-8,5)
swish = x/(1+np.exp(-x))
print(swish)
pd.Series(swish).plot(kind='line')
plt.title('Swish function')
plt.xticks(range(len(x)),x)
plt.ylabel('Swish')
plt.xlabel('x')
plt.show

**Note**
* Swish function is approximately similar to Relu when x>0, when x<0 the values change.
* Swish algorithm looks similar to the ReLu function in the positive x axis.
 
 For more details of ["Why Swish is better than ReLu"](https://github.com/christianversloot/machine-learning-articles/blob/main/why-swish-could-perform-better-than-relu.md)

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">5. MODEL SUMMARY</h2><a id = "5" ></a>

<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">5.1 PCA</h1><a id = "5.1" ></a>

<h2  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">5.1.1 Compute standardization of data</h2><a id = "5.1" ></a>


In [None]:
from sklearn.preprocessing import StandardScaler

standardized_scalar = StandardScaler()
standardized_data = standardized_scalar.fit_transform(train_data)
standardized_data.shape

<h2  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">5.1.2 Apply PCA</h2><a id = "5.1.2" ></a>

In [None]:
pca = decomposition.PCA()
pca.n_components = 2
pca_data = pca.fit_transform(standardized_data)
pca_data.shape
pca_data

In [None]:
pca_data = np.vstack((pca_data.T, train_labels)).T
pca_data = pd.DataFrame(pca_data, columns=["f1", "f2", "labels"])
pca_data

In [None]:
sns.FacetGrid(pca_data, hue="labels", size=6,palette='Set2').map(plt.scatter, "f1", "f2").add_legend()
plt.title("Plotting PCA of Reduced 2D dataset ")
plt.show()

**Note**
* From the PCA plot we can notice that points of label 1,9 and 7 are close to each other.
* Similarly points of label 3,8 are closer to each other.
* Points of labels 5 and 6 overlap each other. 

<h2  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">5.1.3 Model</h2><a id = "5.3" ></a>

* activation = 'leakyrelu'

In [None]:
model = Sequential()

model.add(Conv2D(64, kernel_size=3, padding='same',  activation = leakyrelu,input_shape=(img_rows, img_cols, 1)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=3, padding = 'same', activation = leakyrelu))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Conv2D(128, kernel_size=3, padding = 'same',activation = leakyrelu))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=3, padding = 'same',activation = leakyrelu))

model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Conv2D(64, kernel_size=3, padding = 'same',activation = leakyrelu))
model.add(BatchNormalization())
model.add(Dropout(0.4))

model.add(Flatten())

model.add(Dense(256,activation='swish'))
model.add(BatchNormalization())
model.add(Dropout(0.4))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

# Metrics of above model
# Model Accuracy = 0.99564 for activation = leakyrelu

* **Model 2**
  - activation = 'relu'

In [None]:
model2 = Sequential()

model2.add(Conv2D(64, kernel_size=(5, 5), padding='same',  activation = 'relu',input_shape=(img_rows, img_cols, 1)))
model2.add(BatchNormalization())
model2.add(Conv2D(64, kernel_size=(5, 5), padding = 'same', activation = 'relu'))
model2.add(BatchNormalization())
model2.add(MaxPool2D(pool_size=(2,2)))
model2.add(Dropout(0.25))

model2.add(Conv2D(128, kernel_size=(3,3), padding = 'same',activation = 'relu'))
model2.add(BatchNormalization())
model2.add(Conv2D(128, kernel_size=(3, 3), padding = 'same',activation = 'relu'))

model2.add(BatchNormalization())
model2.add(MaxPool2D(pool_size=(2,2)))
model2.add(Dropout(0.25))

model2.add(Conv2D(64, kernel_size=(3, 3), padding = 'same',activation = 'relu'))
model2.add(BatchNormalization())
model2.add(Dropout(0.25))

model2.add(Flatten())

model2.add(Dense(256, activation = 'swish'))
model2.add(BatchNormalization())
model2.add(Dropout(0.25))
model2.add(Dense(num_classes, activation='softmax'))

model2.summary()
# Model 2
# Model Accuracy = 0.99485 for activation = relu

In [None]:
from tensorflow.keras.utils import plot_model
plot_model(model, to_file='model.png', show_shapes=True)
from IPython.display import Image
Image("model.png")

<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">6. DATA AUGMENTATION</h1><a id = "6" ></a>

One way to avoid overfitting and improve the accuracy is to increase the variability of existing samples. Which is also helps to compensate lack of data. Data augmentation generates data from existing samples by applying various transformations to the original dataset. This method aims to increase the number of unique input samples, which, in turn, will allow the model to show better accuracy on the validation dataset.

In [None]:
from keras.preprocessing.image import ImageDataGenerator

In [None]:
# use data augmentation to improve accuracy and prevent overfitting
augs_gen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False, 
        featurewise_std_normalization=False,  
        samplewise_std_normalization=False,  
        zca_whitening=False,  
        rotation_range=10,  
        zoom_range = 0.1, 
        width_shift_range=0.1,  
        height_shift_range=0.1, 
        horizontal_flip=False,  
        vertical_flip=False) 

train_generator = augs_gen.flow(x_train, y_train, batch_size=300)


<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active"> CallBacks</h2><a id = "6.1" ></a>

<h2  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">6.1 EarlyStopping</h2><a id = "6.1" ></a>

In [None]:
from keras.callbacks import EarlyStopping

In [None]:
early_stop = EarlyStopping(monitor='val_loss', min_delta=0.00001, patience=6, mode='auto', restore_best_weights=True)

<h2  style='color:#0096FF; background:#FFF ' class="list-group-item list-group-item-action active">6.2 Learning Rate Reduction</h2><a id = "6.2" ></a>

In [None]:
lr_reduction = ReduceLROnPlateau(monitor='val_loss',patience=4, verbose=1,  factor=0.6, min_lr=0.0001)

In [None]:
epochs = 40
# model = model2

In [None]:
model.compile(optimizer = 'adam',loss = categorical_crossentropy,metrics=['accuracy'])
model_fit = model.fit(train_generator, epochs=epochs,batch_size =300 ,validation_data=(x_val, y_val), verbose =1,callbacks=[early_stop,lr_reduction])

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">SAVE MODEL</h2><a id = "6.2" ></a>

In [None]:
!mkdir -p saved_model
model.save('saved_model/my_model')

In [None]:
load_model = tensorflow.keras.models.load_model('saved_model/my_model')
load_model.summary()

<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">7. PLOT ACCURACY AND LOSS CURVE</h2><a id = "7" ></a>

In [None]:
# Defining Figure
f = plt.figure(figsize=(20,7))

#For Accuracy - subplot
f.add_subplot(121)

plt.plot(model_fit.epoch,model_fit.history['accuracy'],label = "training accuracy") # Accuracy curve for training set
plt.plot(model_fit.epoch,model_fit.history['val_accuracy'],label = "validation accuracy")

plt.title("Accuracy Curve",fontsize=18)
plt.xlabel("Epochs",fontsize=15)
plt.ylabel("Accuracy",fontsize=15)
plt.grid(alpha=0.3)
plt.legend()

In [None]:
#Adding Subplot (For Loss)
f.add_subplot(122)

plt.plot(model_fit.epoch,model_fit.history['loss'],label="training loss") # Loss curve for training set
plt.plot(model_fit.epoch,model_fit.history['val_loss'],label="validation loss")

plt.title("Loss Curve",fontsize=18)
plt.xlabel("Epochs",fontsize=15)
plt.ylabel("Loss",fontsize=15)
plt.grid(alpha=0.3)
plt.legend()

plt.show()


<h1  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">8. EVALUATION</h1><a id = "8" ></a>

In [None]:
evaluate_test = model.evaluate(x_test, y_test, verbose=1)

print("\nAccuracy =", "{:.7f}%".format(evaluate_test[1]*100))
print("Loss     =" ,"{:.9f}".format(evaluate_test[0]))

In [None]:
y_predict = model.predict(x_test)

In [None]:
y_predict_max = np.argmax(y_predict,axis=1) 
y_predict_max


<h2  style='color:white; background:#0096FF ; border:0;text-align: center' class="list-group-item list-group-item-action active">8.1 SUBMISSION</h2><a id = "8.1" ></a>

In [None]:
submission_label = np.argmax(model.predict(test_data), axis=1)
submission_label = pd.Series(submission_label, name="Label")

image_id = pd.Series(range(1,len(test_data)+1))
image_id = pd.Series(image_id, name="ImageId")

In [None]:
submission = pd.concat([image_id,submission_label],axis = 1)
submission.to_csv("submission.csv", index=False)
pd.read_csv("submission.csv").head()

If you like my work, please do upvote. Thanks - `@tejasurya`