# DIGITS RECOGNIZER WITH 99% ACCURACY

Easy approach to a basic Computer Vision task: number recognition.

**Computer Vision** takes advantage of Deep Learning using **Convolutional Neural Networks** to extract Features Map that make it possible for the computer to understand the characteristics of any image. Here, I'll show how to implement a basic CNN to achieve a great result.

* [Setup](#section-zero)
* [Data loading](#section-one)
* [Data preprocessing](#section-two)
* [Data visualization](#section-three)
* [MNIST dataset](#section-four)
* [Model implementation](#section-five)
* [Model training](#section-six)
* [Predicting on the test set](#section-seven)
* [Final evaluation of our model](#section-eight)
* [Saving predicted data](#section-nine)

<a id="section-zero"></a>
# Setup
First, import required libraries

In [None]:
import pandas as pd
import numpy as np
from math import sqrt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense, Input
import matplotlib.pyplot as plt

<a id="section-one"></a>
# Data loading

Here I'll show how to read our files from the dataset on Kaggle. However, later on I'll use MINST dataset from Keras to better perform the task.

The dataset provided in Kaggle is just a slice of the whole MNIST dataset

In [None]:
train = pd.read_csv('../input/digit-recognizer/test.csv')
test = pd.read_csv('../input/digit-recognizer/test.csv')

<a id="section-two"></a>
# Data preprocessing

The **head()** method is a must-do in order to make it easy to have an overall understanding of our data

In [None]:
train.head()

We can observe below that we don't have the right size for implementing a CNN algorithm given that pixels appeared as a 1D array for each image, instead of a matrix corresponding to the matrix formed by the pixels forming an image.

In [None]:
train.shape

In order to be able to use a CNN, we will reshape our image as follows.

In [None]:
width = height = int(sqrt(train.shape[1]))
total = train.shape[0]

In [None]:
X_train = train.to_numpy()
X_train = np.resize(X_train, (total, width, height))

Now we can see that we have achieve the required format for the input of our neural network

In [None]:
X_train.shape

<a id="section-three"></a>
# Data visualization
In computer vision, visualizing our data is a key task.

In [None]:
rows, cols = 6, 20
fig, axs = plt.subplots(rows, cols, figsize = (20, 6))
for i in range(rows):
    for j in range(cols):
        axs[i][j].imshow(X_train[cols*i + j], cmap='gray')
fig.tight_layout(pad=0.5)
plt.show()

<a id="section-four"></a>
# MNIST Dataset
We will use the complete MNIST Dataset to improve our model behavior

In [None]:
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = np.array(x_train, dtype = np.float16)
x_train = np.resize(x_train, (* x_train.shape[:3], 1))

Another resize is require because the input for a CNN is **W x H x C**, where **C** corresponds to the **number of channels**. Given that our images are in grayscale, the number of channels will be 1.

In [None]:
x_train.shape

<a id="section-five"></a>
# Model implementation

In [None]:
model = keras.Sequential([
    Conv2D(filters = 32, kernel_size = 3, padding = 'same', activation = 'relu', input_shape = (28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(filters = 64, kernel_size = 3, padding = 'same', activation = 'relu'),
    MaxPooling2D((2, 2)),
    Conv2D(filters = 128, kernel_size = 3, padding = 'same', activation = 'relu', input_shape = (28, 28, 1)),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dropout(0.2),
    Dense(128, activation = 'relu'),
    Dropout(0.2),
    Dense(10, activation = 'softmax')
])

In [None]:
model.compile(
    optimizer = 'adam',
    loss = 'sparse_categorical_crossentropy',
    metrics = ['accuracy']
)

##### Tip! **Model Summary**

Here is a useful tool to understand a neural network architecture from the inside: **model.summary()**. Sometimes, at first, it may be confusing to fully understand output shapes from every layer in the CNN. 

In [None]:
print(model.summary())

**Earlystopping callback** will prevent us from waiting more than necessary

In [None]:
from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(patience = 10, min_delta = 0.0001, restore_best_weights = True)

<a id="section-six"></a>
# Model training

In [None]:
history = model.fit(x_train, y_train, validation_split = 0.2, epochs = 100, callbacks = early_stopping)

It is always important to take a look to the **accuracy** and **loss** evolution while training

In [None]:
metrics = pd.DataFrame(history.history)
metrics[['loss', 'accuracy']].plot()

<a id="section-seven"></a>
# Predicting on the test set
The majority of times, the test set will also require some preprocessing

In [None]:
x_test = np.resize(x_test, (* x_test.shape[:3], 1))

In [None]:
pred = model.predict(x_test)

In [None]:
rows, cols = 6, 6
fig, axs = plt.subplots(rows, cols, figsize = (15, 15))
for i in range(rows):
    for j in range(cols):
        axs[i][j].imshow(x_test[rows*i + j], cmap='gray')
        axs[i][j].set_title("Predicted: "+str(list(pred[rows*i + j]).index(max(list(pred[rows*i + j])))))
fig.tight_layout(pad=3.0)
plt.show()

In [None]:
def histogram(pred):
    xhist = []
    for item in range(10):
        for j in range(int(pred[item] * 100)):
            xhist.append(item)
    return xhist
    
def c(item):
    return [0 if i != item.index(max(item)) else 1 for i in item]

In [None]:
rows, cols = 6, 6
fig, axs = plt.subplots(rows, cols, figsize = (15, 15))
for i in range(rows):
    for j in range(cols):
        axs[i][j].hist(histogram(pred[rows*i + j]), bins = np.linspace(-0.25, 9.25, 20))
        axs[i][j].set_xticks(range(10))
        axs[i][j].set_ylim((0, 100))
        axs[i][j].set_title("Real: "+str(y_test[rows*i + j]))
fig.tight_layout(pad=3.0)
plt.show()

In [None]:
def setup(pred):
    return list(map(lambda x: list(x).index(np.max(x)), pred))

In [None]:
final_pred = setup(pred)

<a id="section-eight"></a>
# Final evaluation of our model

Accuracy and loss are not the only metrics you must take into account. **Recall**, **precision** and, above all, **F1-Score**, give us another perspective of our model performance. The three of them are calculated from the **confusion matrix**. Take a look at what these parameters are and the meaning of confusion matrix clicking [here](https://en.wikipedia.org/wiki/Confusion_matrix).

**Scikit-learn** has built-in methods that provide us with this useful information easily.

In [None]:
from sklearn.metrics import confusion_matrix
df_conf=pd.DataFrame(confusion_matrix(final_pred, y_test), columns = range(10))
df_conf

In [None]:
from sklearn.metrics import classification_report
print(classification_report(final_pred, y_test))

<a id="section-nine"></a>
# Saving predicted data
To submit our predictions we will predict now on the Kaggle DataSet

In [None]:
test = test.to_numpy()
test = np.resize(test, (total, width, height))
test = np.resize(test, (* test.shape[:3], 1))

In [None]:
pred = model.predict(test)
final_pred = setup(pred)

In [None]:
output = pd.DataFrame(final_pred, columns = ['Label']); output.index += 1; output.head()

In [None]:
final_pred[:5]

In [None]:
output.to_csv('output.csv', index_label = 'ImageId')