***
***
<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;text-align:center;">
    <b><h1>&nbsp DIGIT RECOGNIZER - CNN </h1></b>
</div>

***
***

This is my second Digit Recognizer Notebook. It has 0.9947 score.

The first one is with SVM + PCA + Hyperparameter Tuning. (https://www.kaggle.com/code/darkwyvern/digit-recognizer-pca-svm)

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp Index</h2></b>
</div>

1. [Import Libraries & Load Data](#1)
1. [Some Configuration](#2)
1. [Exploration](#3)
1. [Process The Data](#4)
1. [Model](#5)
1. [Data Augmentation](#6)
1. [Fit The model](#7)
1. [Error Visualisation](#8)
1. [Prediction](#9)
1. [Submission](#10)

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "1">1 - Import Libraries & Load Data</a></h2></b>
</div>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D,BatchNormalization
from keras.models import Sequential
from keras.optimizers import adam_v2
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from keras.callbacks import EarlyStopping, ModelCheckpoint,ReduceLROnPlateau
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
data_train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
data_test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "2">2 - Some Configuration</a></h2></b>
</div>

In [None]:
#  Let pandas show all columns
pd.options.display.width = 0

# Seaborn theme setings
sns.set_theme(palette="deep")

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "3">3 - Exploration</a></h2></b>
</div>

Since [Digit Recogizer](https://www.kaggle.com/competitions/digit-recognizer/data?select=train.csv) explains clearly, we are not exploring database structure.

Every row has a 28x28 image which projected to a row with 784 column. Label column shows the digit drawn by the user. Other columns consists of values between 0-255 indicating lightness/darkness of the related pixel.

We know there is no missing data by the nature of the data (also checked Competition Data Tab). Otherwise it would be a imputation hell.

Let's see some example images.

In [None]:
# Prepare x_train and y_train
x_train = data_train.drop(labels=["label"], axis=1)
y_train = data_train["label"]

def get_image_matrix(row, dataset):
    row_values = dataset.iloc[row].values
    image_matrix = row_values.reshape(28, 28)
    return image_matrix

fig, ax = plt.subplots(1, 5, figsize=(12, 9), dpi=120)
plt.setp(ax, xticks=[], yticks=[])

ax_num=0
for i in [1,225,648,75,138]:
    ax[ax_num].imshow(get_image_matrix(i, x_train))
    ax_num+=1

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "4">4 - Process The Data </a></h2></b>
</div>

In [None]:
# Do MinMax Scaling on both train and test data.
scaler = MinMaxScaler()
x_train[:] = scaler.fit_transform(x_train)
data_test[:] = scaler.transform(data_test)

In [None]:
# Reshape the data for CNN.
x_train = x_train.values.reshape(-1, 28, 28, 1)
data_test = data_test.values.reshape(-1, 28, 28, 1)

# We need 10 classes from 0 to 9 (inclusive)
y_train = to_categorical(y_train, num_classes=10)

# Train and Test Split
x_train, x_test, y_train, y_test = train_test_split(x_train, y_train, test_size=0.33, random_state=42)

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "5">5 - Model </a></h2></b>
</div>

CNN will first look for edges, lines, curves etc. Hence, I guessed which features needs to be recognized by first convolutional layer. So I start with 16 features. I think  more depth is necessary for first convolutional layer, so I choose kernel size as (5,5). Then more feature will be needed in the following layers.

I have experimented with layers and the below one is slightly better than my first layer model.

Though Dropout rate 0.50 is recommended, (through a basic search) it seems lower levels  work better. I will use 0.25.

I prefer BatchNormalization in convolutional layers instead of Dropout.

I will use recommended Max Pooling parameters.

Max Pooling will reduce the number of features. So I am doubling the filters parameters to balance it as everbody does.

In [None]:
model = Sequential()

# I see two Conv2D increases performance here.
model.add(Conv2D(filters=16, kernel_size=(5, 5), padding="Same", activation="relu", input_shape=(28, 28, 1)))
model.add(Conv2D(filters=16, kernel_size=(5, 5), padding="Same", activation="relu", input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Conv2D(filters=32, kernel_size=(4, 4), padding="Same", activation="relu"))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="Same", activation="relu"))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

model.add(Flatten())
model.add(Dense(1024, activation="relu"))
model.add(Dropout(0.25))
model.add(Dense(10, activation="softmax"))

I will use Adam as optimizer. It seems to work well on digit recognizing. I will stick to recommended parameter values.

For better results I will use 1000 epochs along with batch_size 350, since it mostly makes an EarlyStop before 250 epochs.

I have tried Adadelta for adaptive learning rates. Then I changed it to Adam optimizer after seeing no performance gain with my model. I am aware minimum learning rate (min_lr) in ReduceLROnPlateau callback is a bit extreme. Since EarlyStopping is on duty, it will eventually stop if there is no decrease in val_loss.

I will both monitor val_loss and val_accuracy via ModelCheckpoint. I see best val_loss value is not far from val_loss value of best val_accuracy. Hence, I will use best val_accuracy in my model against a little bit confidence loss in predictions.

In [None]:
optimizer = adam_v2.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999)

model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"])

epochs = 1000
batch_size = 350

callbacks = [EarlyStopping(monitor='val_loss', patience=100,verbose=1),
             ModelCheckpoint(filepath='best_model_loss.h5', monitor='val_loss', save_best_only=True,verbose=0),
             ModelCheckpoint(filepath='best_model_accuracy.h5', monitor='val_accuracy', save_best_only=True,verbose=0),
             ReduceLROnPlateau(monitor="val_loss",factor=0.1, patience=25,verbose=1,mode="min",min_lr=0.000000001,cooldown=1)]

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "6">6 - Data Augmentation </a></h2></b>
</div>

By creating new versions of the images by data augmentation train dataset size will expand for better results.

In [None]:
# data augmentation
datagen = ImageDataGenerator(
    rotation_range=15,  # Rotate images in range 15 degrees at random
    zoom_range=0.1,  # Zoom images 10% at random
    width_shift_range=0.1,  # Shift images horizontally 10% at random
    height_shift_range=0.1)  # Shift images vertically 10% at random

datagen.fit(x_train)

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "7">7 - Fit The Model </a></h2></b>
</div>

In [None]:
history = model.fit(datagen.flow(x_train, y_train, batch_size=batch_size), epochs=epochs,
                              validation_data=(x_test, y_test), steps_per_epoch=x_train.shape[0] // batch_size, verbose=0,callbacks=callbacks)

# plot training history for loss
plt.plot(history.history['loss'], label='loss train')
plt.plot(history.history['val_loss'], label='loss test')
plt.legend()
plt.show()

# plot training history for accuracy
plt.plot(history.history['accuracy'], label='acc train')
plt.plot(history.history['val_accuracy'], label='acc test')
plt.legend()
plt.show()


model.load_weights('best_model_loss.h5')
# evaluate the model for val_loss
train_accuracy = model.evaluate(x_train, y_train, verbose=0)
test_accuracy = model.evaluate(x_test, y_test, verbose=0)

print()
print("*"*75)
print("train (val_loss): ", train_accuracy)
print("test (val_loss): ", test_accuracy)
print("*"*75)


model.load_weights('best_model_accuracy.h5')
# evaluate the model val_accuracy
train_accuracy = model.evaluate(x_train, y_train, verbose=0)
test_accuracy = model.evaluate(x_test, y_test, verbose=0)

print()
print("*"*75)
print("train (val_accuracy): ", train_accuracy)
print("test (val_accuracy): ", test_accuracy)
print("*"*75)

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "8">8 - Error Visualisation </a></h2></b>
</div>

First we will make predictions for x_test to see them in the confusion matrix. The below confusion matrix shows us most false predictions are from the digits that can not be recognized even by a human. (like 3 vs 8, 7 vs 1).

In [None]:
results = model.predict(x_test)
#Get the max prediction value (the best prediction out of the 10 predictions in the np array)
results = np.argmax(results,axis = 1)
y_test=np.argmax(y_test,axis = 1)

cm=confusion_matrix(y_test,results)
cmd=ConfusionMatrixDisplay(cm,display_labels=(np.arange(10)))
fig, ax = plt.subplots(figsize=(10,10))
ax.grid(False)
cmd.plot(ax=ax)
plt.show()

Now, we will visualize incorrect predictions:

In [None]:
correct_pred = np.nonzero(results == y_test)[0]
incorrect_pred = np.nonzero(results != y_test)[0]

size=len(incorrect_pred)

fig, axes = plt.subplots(10,10, sharex=True, sharey=True, figsize=(20, 20)) 
plt.setp(axes, xticks=[], yticks=[])
axes=axes.flatten()

for i, incorrect in enumerate(incorrect_pred):
    axes[i].imshow(x_test[incorrect], cmap='gray', interpolation='none')
    axes[i].set_title("P:{} - T:{}".format(results[incorrect], y_test[incorrect]))

for ax in axes[size:]:
    ax.remove()

plt.show()

It clearly shows these are hard to predict even for humans.In the light of these facts we can say the model works fine.

It is Convolutional Neural Network's success, not mine. All the above code shows how powerfull it is. Even with a simple setup and a little bit tuning you can reach high prediction values.

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "9">9 - Prediction </a></h2></b>
</div>

Now we are making the real prediction.

In [None]:
results = model.predict(data_test)
#Get the max prediction value (the best prediction out of the 10 predictions in the np array)
results = np.argmax(results,axis = 1)

<div style="background-color:#B0C4DE;color:Navy;width:100%;display:inline-block;">
    <b><h2>&nbsp <a id = "10">10 - Submission </a></h2></b>
</div>

In [None]:
results = pd.Series(results,name="Label")
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)

submission.to_csv("submission.csv",index=False)