## Introduction

In this notebook I use the Sequential Convolutional Neural Network for digits recognition. This is a fairly simple and versatile algorithm, before MNIST digits I used it for other image recognition. Although it showed less accuracy there due to the greater variety and complexity of the images. Here we have only 10 classes and they are more or less distinguishable.

You can try to use this notebook for MNIST digits recognition as well as other image recognition cases. If you have any ideas please share them in the comments, it would be interesting if you could share your approach.

In [None]:
# import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay

from keras.models import Sequential
from keras.layers import Dense, Input, Dropout, Flatten, Conv2D, MaxPool2D, LayerNormalization
from tensorflow.keras.utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import RMSprop
from tensorflow.keras import callbacks
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint

In [None]:
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Load and prepare the data

In [None]:
# Load the data
train = pd.read_csv('../input/digit-recognizer/train.csv')
test = pd.read_csv('../input/digit-recognizer/test.csv')

In [None]:
#take a look at the data
#np.unique(train)
#train.head()

In [None]:
# define X and Y
X = train.drop(['label'], axis = 1)
Y = train['label']

We did not make any significant transformations or conversions of the data. We rescale pixel values from the range (0, 255) to the range (0, 1) as it is the best format for neural network models. Through this transformation, we also reduce the effect of illumination's differences, which is not as relevant for our case but can generally be helpful when working with the photo. 

Scaling data to the range (0, 1) is usually called **normalization** and, in our case, is achieved by dividing the value of each pixel by 255 (normalization coefficient 1/255 = ~0.0039).

In [None]:
# normalize the data (features)
X = X / 255.0
test = test / 255.0

In [None]:
# convert data to np.array
X = X.values
test = test.values

Our data is now stored as 1D np.array. The length of the array is 784 (28 pixels x 28 pixels). To feed the data into the Keras model, we reshape it to 28 pixels x 28 pixels and add additional dimension for the number of channels (1 in case of greyscale).

In [None]:
# here we rashape the image into the following dimensions: height x width x channel
# 28 pixels x 28 pixels x 1 pixel (for black and white)

X = X.reshape(-1,28,28,1)
test = test.reshape(-1,28,28,1)

In [None]:
# convert features to categorical (similar to one hot encoder)
Y = to_categorical(Y, num_classes = 10)

In [None]:
# check the shape of the data
print(X.shape, Y.shape)

In [None]:
# split the data into train and test
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size = 0.1, random_state = 42)

print(X_train.shape, X_val.shape, Y_train.shape, Y_val.shape)

## Model

Here I use Keras Sequential API with the following architecture:

 - Input<br>
 - Conv2D - > Conv2D - > MaxPool2D -> LayerNormalization -> Dropout<br>
 - Conv2D - > Conv2D - > MaxPool2D -> LayerNormalization -> Dropout<br>
 - Flatten<br>
 - Dense<br>
 - Dropout<br>
 - Output


Some comments to the layers.

**Conv2D**  layers perform the process that is called convolution. The essence of convolution is to create another set of values, which is called a kernel or filter. In our case, this is a 3 x 3 matrix. Then we scan our image using this kernel. A convolution layer is applied to each section of the input image. In other words, here, the network learns the details of the image.

We can experiment with number of filters and their size.

**MaxPool2D** layer is simply compressing (reducing the size of) the image. The result will be a smaller image compared to the original input image. At this layer, the network also learns the whole structure of the image.

**Dropout** layer is used to avoid overfitting. 

**Flatten** layer converts the data into a 1D array. 

In [None]:
# define the model function

def create_model():
    
    model = Sequential()

    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', activation ='relu', input_shape = (28,28,1), name='conv_11'))
    model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same', activation ='relu', name='conv_12'))
    model.add(MaxPool2D(pool_size=(2,2), name='pool_1'))
    model.add(LayerNormalization(axis=3 , center=True , scale=True, name='norm_1'))
    model.add(Dropout(0.25))

    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', activation ='relu', name='conv_21'))
    model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', activation ='relu', name='conv_22'))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2), name='pool_2'))
    model.add(LayerNormalization(axis=3 , center=True , scale=True, name='norm_2'))
    model.add(Dropout(0.25))
    
    model.add(Conv2D(filters = 128, kernel_size = (3,3),padding = 'Same', activation ='relu', name='conv_31'))
    model.add(Conv2D(filters = 128, kernel_size = (3,3),padding = 'Same', activation ='relu', name='conv_32'))
    model.add(MaxPool2D(pool_size=(2,2), strides=(2,2), name='pool_3'))
    model.add(LayerNormalization(axis=3 , center=True , scale=True, name='norm_3'))
    model.add(Dropout(0.25))

    model.add(Flatten())
    
    model.add(Dense(256, activation = "relu"))
    
    model.add(Dropout(0.4))
    
    model.add(Dense(10, activation = "softmax"))
    
    return model

In [None]:
# create the model
model_CNN = create_model()
print(model_CNN.summary())

**Data augmentation** 

One way to avoid overfitting and improve the accuracy is to increase the variability of existing samples. Which is also helps to compensate lack of data.
<br>Data augmentation generates data from existing samples by applying various transformations to the original dataset. This method aims to increase the number of unique input samples, which, in turn, will allow the model to show better accuracy on the validation dataset.

In [None]:
# use data augmentation to improve accuracy and prevent overfitting
augs_gen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False, 
        featurewise_std_normalization=False,  
        samplewise_std_normalization=False,  
        zca_whitening=False,  
        rotation_range=10,  
        zoom_range = 0.1, 
        width_shift_range=0.1,  
        height_shift_range=0.1, 
        horizontal_flip=False,  
        vertical_flip=False) 

generator_train = augs_gen.flow(X_train, Y_train, batch_size=64)

In [None]:
# define number of steps (length of train set divided by batch size)
steps = int(X_train.shape[0] / 64)

In [None]:
# compile the model
model_CNN.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])

Callbacks are very convenient because we can be sure that the learning process will stop as soon as the chosen metrics stop improving. In this way, we can set a large number of epochs and do not worry that the metrics stopped improving.

In [None]:
#use callbacks
checkpoint = ModelCheckpoint("", monitor='val_accuracy', verbose=1, save_best_only=True)
reduce_lr = ReduceLROnPlateau(monitor='val_accuracy', factor=0.5, patience=5, min_lr=0.00005, verbose=1)
early_stop = EarlyStopping(monitor='val_accuracy', min_delta=0, patience=7, mode='auto', restore_best_weights=True)

...and here we go!

In [None]:
# fit the model
history = model_CNN.fit(generator_train, steps_per_epoch=steps, batch_size = 64, epochs = 50, validation_data = (X_val, Y_val), verbose = 1, callbacks = [checkpoint, reduce_lr, early_stop])

In [None]:
# evaluate the model

# predict on validation set
Y_pred_val = model_CNN.predict(X_val)

# check the class predicted with the highest probability (most common)
Y_pred_mc_class = np.argmax(Y_pred_val, axis=1)

# check the groudtruth most common class
Y_test_mc_class = np.argmax(Y_val, axis=1)

# compare them
accuracy_on_val = np.mean(Y_pred_mc_class == Y_test_mc_class)

# print the accuracy
print("Validation accuracy (after the training): ", accuracy_on_val, "\n")


# plot the validation and training accuracy
fig, axis = plt.subplots(1, 2, figsize=(16,6))
axis[0].plot(history.history['val_accuracy'], label='val_acc')
axis[0].set_title("Validation Accuracy")
axis[0].set_xlabel("Epochs")
axis[1].plot(history.history['accuracy'], label='acc')
axis[1].set_title("Training Accuracy")
axis[1].set_xlabel("Epochs")
plt.show()

# plot the Confusion Matrix
fig, ax = plt.subplots(figsize=(12, 12))
cm = confusion_matrix(Y_test_mc_class,Y_pred_mc_class, normalize='true')
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels = [0,1,2,3,4,5,6,7,8,9])
disp = disp.plot(ax=ax,cmap=plt.cm.Blues)
ax.set_title("Confusion Matrix")
plt.show()

In [None]:
# make prediction on real test
y_pred_test = model_CNN.predict(test)

In [None]:
# chose the max probability item
prediction = np.argmax(y_pred_test, axis = 1)

In [None]:
# create submission DataFrame
submission = pd.DataFrame({'ImageId' : range(1,28001), 'Label' : list(prediction)})
submission.head()

In [None]:
# create CSV-file
submission.to_csv("submission.csv",index=False)

In [None]:
#!kaggle competitions submit -c digit-recognizer -f submission.csv -m "Message"