<br>
<h2 style = "font-size:40px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> Digit Recognizer</h2> 
<br>

<a id = '0'></a>
<h2 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #007580; color : #fed049; border-radius: 5px 5px; text-align:center; font-weight: bold" >Table of Contents</h2> 

1. [Overview](#1.0)
2. [Import the necessary libraries](#2.0)
3. [Data Collection](#3.0)
4. [Model Building and Validation](#4.0)
5. [Build Submission File](#5.0)
5. [Summary](6.0)

<a id = '1.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 1. Overview </h2> 

<p style = "font-size:20px; color: #007580 "><strong> MNIST Dataset </strong></p> 

The MNIST database contains 60,000 training images and 10,000 testing images taken from American Census Bureau employees and American high school students. The MNIST dataset is one of the most common datasets used for image classification and accessible from many different sources. In fact, even Tensorflow and Keras allow us to import and download the MNIST dataset directly from their API.

<a id = '2.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 2. Import the necessary libraries </h2> 

In [None]:
import tensorflow
tensorflow.__version__

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense, Dropout, MaxPooling2D, BatchNormalization

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras import regularizers, optimizers

In [None]:
# Initialize the random number generator
import random
random.seed(0)

# Ignore the warnings
import warnings

# suppress display of warnings
warnings.filterwarnings('ignore')

# display all dataframe columns
pd.options.display.max_columns = None

# to set the limit to 3 decimals
pd.options.display.float_format = '{:.7f}'.format

# display all dataframe rows
pd.options.display.max_rows = None

<a id = '3.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 3. Data Collection </h2> 

<p style = "font-size:20px; color: #007580 "><strong> Let's load MNIST dataset </strong></p> 

In [None]:
train = pd.read_csv('../input/digit-recognizer/train.csv')

In [None]:
# Get top 5 rows
train.head()

In [None]:
test = pd.read_csv('../input/digit-recognizer/test.csv')

In [None]:
# Get top 5 rows
test.head()

In [None]:
# Extract features
features = train.drop('label', axis=1)

# Extract label
y_train = train['label']

In [None]:
# Train images
X_ = np.array(features)
X_train = X_.reshape(X_.shape[0], 28, 28)

# Test images
X_test = np.array(test)

<p style = "font-size:20px; color: #007580 "><strong> Shape of the data </strong></p> 

In [None]:
print("Number of train images = {} and number of test images = {} in Insurance data frame".format(X_train.shape, X_test.shape))

<p style = "font-size:20px; color: #007580 "><strong> Let's visualize some numbers using matplotlib </strong></p> 

In [None]:
fig = plt.figure(figsize=(10,5))

for i in range(16):
    fig.add_subplot(4, 4, i+1)
    
    plt.imshow(X_train[i], cmap='gray')
    
    plt.xticks([])
    plt.yticks([])
    plt.tight_layout()
    plt.title('Digit: ' + str(y_train[i]))

In [None]:
# Now we have to check the count of values for our output layer
y_train.value_counts(normalize=True)

In [None]:
len(y_train.value_counts())

<p style = "font-size:20px; color: #007580 "><strong> Reshape train and test sets into compatible shapes </strong></p> 

- Sequential model in tensorflow.keras expects data to be in the format (n_e, n_h, n_w, n_c)
- n_e= number of examples, n_h = height, n_w = width, n_c = number of channels
- do not reshape labels

In [None]:
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)

<p style = "font-size:20px; color: #007580 "><strong> Normalize data </strong></p> 

- we must normalize our data as it is always required in neural network models
- we can achieve this by dividing the RGB codes with 255 (which is the maximum RGB code minus the minimum RGB code)
- normalize X_train and X_test
- make sure that the values are float so that we can get decimal points after division

In [None]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train /= 255
X_test /= 255

<p style = "font-size:20px; color: #007580 "><strong> Print shape of data and number of images </strong></p> 

- print shape of X_train
- print number of images in X_train
- print number of images in X_test

In [None]:
print("X_train shape:", X_train.shape)
print("Images in X_train:", X_train.shape[0])
print("Images in X_test:", X_test.shape[0])
print("Max value in X_train:", X_train.max())
print("Min value in X_train:", X_train.min())

<p style = "font-size:20px; color: #007580 "><strong> One-hot encode the class vector </strong></p> 

- convert class vectors (integers) to binary class matrix
- convert y_train and y_test
- number of classes: 10
- we are doing this to use categorical_crossentropy as loss

In [None]:
y_train = to_categorical(y_train, num_classes=10)

print("Shape of y_train:", y_train.shape)
print("One value of y_train:", y_train[0])

<a id = '4.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 4. Model Building and Validation </h2> 

<p style = "font-size:20px; color: #007580 "><strong> Initialize a sequential model </strong></p> 

- define a sequential model
- add 2 convolutional layers
    - no of filters: 32
    - kernel size: 3x3
    - activation: "relu"
    - input shape: (28, 28, 1) for first layer
- flatten the data
    - add Flatten later
    - flatten layers flatten 2D arrays to 1D array before building the fully connected layers
- add 2 dense layers
    - number of neurons in first layer: 128
    - number of neurons in last layer: number of classes
    - activation function in first layer: relu
    - activation function in last layer: softmax
    - we may experiment with any number of neurons for the first Dense layer; however, the final Dense layer must have neurons equal to the number of output classes

In [None]:
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=3, activation="relu", input_shape=(28, 28, 1)))
model.add(Conv2D(filters=32, kernel_size=3, activation="relu"))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(10, activation="softmax"))

<p style = "font-size:20px; color: #007580 "><strong> Compile and fit the model </strong></p> 

- let's compile our model
    - loss: "categorical_crossentropy"
    - metrics: "accuracy"
    - optimizer: "adam"
- then next step will be to fit model
    - give train data - training features and labels
    - batch size: 32
    - epochs: 10
    - give validation data - testing features and labels

In [None]:
# Compile the model
model.compile(loss="categorical_crossentropy", metrics=["accuracy"], optimizer="adam")

# Fit the model
model.fit( x=X_train, y=y_train, batch_size=32, epochs=10, validation_split = 0.3)

<p style = "font-size:20px; color: #007580 "><strong> Vanilla CNN + Pooling + Dropout </strong></p> 

- define a sequential model
- add 2 convolutional layers
    - no of filters: 32
    - kernel size: 3x3
    - activation: "relu"
    - input shape: (28, 28, 1) for first layer
- add a max pooling layer of size 2x2
- add a dropout layer
    - dropout layers fight with the overfitting by disregarding some of the neurons while training
    - use dropout rate 0.2
- flatten the data
    - add Flatten later
    - flatten layers flatten 2D arrays to 1D array before building the fully connected layers
- add 2 dense layers
    - number of neurons in first layer: 128
    - number of neurons in last layer: number of classes
    - activation function in first layer: relu
    - activation function in last layer: softmax
    - we may experiment with any number of neurons for the first Dense layer; however, the final Dense layer must have neurons equal to the number of output classes

In [None]:
# Initialize the model
model = Sequential()

# Add a Convolutional Layer with 32 filters of size 3X3 and activation function as 'relu' 
model.add(Conv2D(filters=32, kernel_size=3, activation="relu", input_shape=(28, 28, 1)))

model.add(BatchNormalization())

# Add a Convolutional Layer with 32 filters of size 3X3 and activation function as 'relu' 
model.add(Conv2D(filters=32, kernel_size=3, activation="relu"))

model.add(BatchNormalization())

# Add a MaxPooling Layer of size 2X2 
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(BatchNormalization())

# Apply Dropout with 0.2 probability 
model.add(Dropout(rate=0.2))

# Flatten the layer
model.add(Flatten())

# Add Fully Connected Layer with 128 units and activation function as 'relu'
model.add(Dense(128, activation="relu"))

model.add(BatchNormalization())

#Add Fully Connected Layer with 10 units and activation function as 'softmax'
model.add(Dense(10, activation="softmax"))

<p style = "font-size:20px; color: #007580 "><strong> Compile and fit the model </strong></p> 

- let's compile our model
    - loss: "categorical_crossentropy"
    - metrics: "accuracy"
    - optimizer: "adam"
- Use EarlyStopping
- then next step will be to fit model
    - give train data - training features and labels
    - batch size: 32
    - epochs: 10
    - give validation data - testing features and labels

In [None]:
# Optimizer
sgd = optimizers.SGD(lr=2e-2, decay=1e-6, momentum=0.9)
      
# Compile the model
model.compile(loss="categorical_crossentropy", metrics=["accuracy"], optimizer=sgd)

# Adding callbacks
es = EarlyStopping(monitor='val_loss', mode = 'min', patience=10, min_delta=1E-4, restore_best_weights=True)
rlrp = ReduceLROnPlateau(monitor='val_loss', factor=0.0001, patience=10, min_delta=1E-4)

callbacks = [es, rlrp]

# Fit the model
training_history = model.fit(x=X_train, y=y_train, batch_size=16, epochs=100, validation_split = 0.3, callbacks=[callbacks])

In [None]:
# Predict on Test set
preds = np.argmax(model.predict(X_test), axis=1)

<a id = '5.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 5. Build Submission File </h2> 

In [None]:
submission = pd.read_csv('../input/digit-recognizer/sample_submission.csv')

In [None]:
# Get the dimensions
submission.shape

In [None]:
submission['Label'] = preds
submission.to_csv('submission.csv',index=False)

submission.head()

<a id = '6.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 6. Summary </h2> 
<br>
<br>
<strong>What happend so far?</strong>

<ol>
<li>Loaded the MNIST dataset into dataframe.</li>
<li>Visualize some numbers using matplotlib.</li>
<li>Reshape train and test sets into compatible shapes.</li>
<li>One-hot encode the class vector.</li>
<li>Built and validated the sequential model with few Conv2D and dense layers.</li>
<li>Built and validated the sequential model with Vanilla CNN, Pooling and Dropout layers.</li>
<li>Built the submission file.</li>
</ol>
   
<br>
   
<p style = "font-size:30px; color: #007580 "><strong> Thanks for reading. We can try to add more layers and hypertune few parameters to imrove the score, will update soon... </strong></p>