# Digit Recognization with Keras

> ## 0. Setup

>> ### 0.1 Libraries

* NumPy and pandas are used for normalization and reshaping the data
* matplotlib is used for visualizing the normalized data
* sklearn.model_selection is used for train/test split
* Keras is used for the neural network

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow import keras
from ipywidgets import interact
import ipywidgets as widgets

>> ### 0.2 Loading the Data

>> * After loading the training and test sets to the memory, copying them recursively with the `copy()` function because we don't want changes to be reflected to the original data frame. After that, displaying the dimensions and the columns of the data sets
* The training set have one extra column called label which is the label of the digit. This column has to be separated from the `df_train`
* The other columns are the pixels of 28x28 images

In [None]:
df_train_orig = pd.read_csv('../input/train.csv')
df_test_orig = pd.read_csv('../input/test.csv')

df_train = df_train_orig.copy(deep=True)
df_test = df_test_orig.copy(deep=True)

print('Number of Training Examples = {}'.format(df_train.shape[0]))
print('Number of Test Examples = {}'.format(df_test.shape[0]))
print('Training Input Shape = {}'.format(df_train.shape))
print('Training Output Shape = {}'.format(df_train.shape[0]))
print('Test Input Shape = {}'.format(df_test.shape))
print('Test Output Shape = {}'.format(df_test.shape[0]))
print(df_train.columns)
print(df_test.columns)

> ##  1 Preprocessing


>> ###  1.1 Normalization
* First, the label column is dropped because it is categorical data
* Each of the values of pixels are divided by 255. Since the max value of a grayscale pixel can be 255, this will scale the values of pixels between 0 and 1
* The label column is stored in `Y_train`
* Finally, checking the dimensions of X_train and X_test are matching

In [None]:
X_train = df_train.drop(columns=['label'], axis=0).astype('float32').values / 255.0
X_test = df_test.astype('float32').values / 255.0
Y_train = df_train_orig['label'].astype('float32').values

assert(X_train.shape[1] == X_test.shape[1])

>> ###  1.2 Reshape
* X_train and X_test are in the flattened vector form (784, 1), In order to use a CNN, we need to reshape them back to image form (28, 28, 1)
* The depth is specified as 1 because the dataset is greyscale

In [None]:
X_train = X_train.reshape(-1,28,28,1)
X_test = X_test.reshape(-1,28,28,1)

print('Training Input Shape = {}'.format(X_train.shape))
print('Test Input Shape = {}'.format(X_test.shape))

>> ###  1.3 Sanity Check
*  `visualize_digit` function can plot any example from training set and their label. This function is useful when we need to look for a specific record.
* Checking the distributions of each label. 

In [None]:
def visualize_digit(index):
    # Plots the training example at the given index
    plt.imshow(X_train[index][:, :, 0])
    print("Visualizing {}th training example and the label is = {}".format(index + 1, str(Y_train[index])))
    
interact(visualize_digit, index=(0, X_train.shape[0] - 1, 1));

In [None]:
# Checking the data distribution
df_train_orig['label'].value_counts().plot.bar()

>> ###  1.4 Train / Validation Split
* Splitting the training set into training and validation sets with a fixed seed
* The split rates are 90% and 10%. (Training Set %90 / Validation Set %10)

In [None]:
Y_train = keras.utils.to_categorical(Y_train, num_classes=10)
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size=0.1, random_state=0)

print('Training Input Shape = {}'.format(X_train.shape))
print('Validation Input Shape = {}'.format(X_val.shape))
print('Training Output Shape = {}'.format(Y_train.shape))
print('Validation Output Shape = {}'.format(Y_val.shape))

> ##  2 . Machine Learning

>> ### 2.1 Layers
* I used 2 convolution layers followed by a max pooling layer 2 times
* The activation function of the convolution layers are relu
* Finally using softmax activation function on the final layer because it is a multi-class classification problem

In [None]:
model = keras.models.Sequential([
    keras.layers.Conv2D(filters=32, kernel_size=(5, 5), padding='Same', activation='relu', input_shape=(28, 28, 1)),
    keras.layers.Conv2D(filters=32, kernel_size=(5, 5), padding='Same', activation='relu'),
    keras.layers.MaxPool2D(pool_size=(2, 2)),
    keras.layers.Dropout(0.25),
    keras.layers.Conv2D(filters=64, kernel_size=(3, 3), padding='Same', activation ='relu'),
    keras.layers.Conv2D(filters=64, kernel_size=(3, 3), padding='Same', activation ='relu'),
    keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
    keras.layers.Dropout(0.25),
    keras.layers.Flatten(),
    keras.layers.Dense(256, activation="relu"),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])

>> ###  2.2 Optimizer, Loss Function, Metrics and Callbacks
* The optimizer is RMSprop with default parameters
* The loss function is categorical cross-entropy which is also called softmax loss
* Using accuracy for the metric
* Creating a callback function which reduces the learning rate, If accuracy doesn't increase in 3 epochs.

In [None]:
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
loss = 'categorical_crossentropy'
metrics = ['accuracy']

learning_rate_reduction = keras.callbacks.ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=0.00001)

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

In [None]:
epochs = 30
batch_size = 86

model.fit(X_train, Y_train, 
          epochs=epochs, 
          batch_size=batch_size, 
          callbacks=[learning_rate_reduction], 
          validation_data=(X_val, Y_val))

In [None]:
model.summary()

> ##  3 . Result

>> ###  3.1 Predicting with the Trained Model
* Predicting the labels of X_test with the model trained earlier
*  `visualize_prediction` function can plot any example from test set and its predicted label

In [None]:
Y_hat = model.predict(X_test, batch_size=None, verbose=0, steps=None)
Y_hat

In [None]:
def visualize_prediction(index):
    # Plots the predicted example from X_test at given index
    plt.imshow(X_test[index].reshape(28, 28))
    print("Visualizing {}th test example and the predicted label is = {}".format(index + 1, str(np.argmax(Y_hat[index]))))
    
interact(visualize_prediction, index=(0, X_test.shape[0] - 1, 1));

> ##  4 . Submission

In [None]:
submission_df = pd.DataFrame(columns=['ImageId', 'Label'])
submission_df['ImageId'] = list(range(1,len(Y_hat) + 1))
submission_df['Label'] = [np.argmax(Y_hat[i]) for i in range(len(Y_hat))]

In [None]:
submission_df.head(20)

In [None]:
submission_df.to_csv('submission.csv', header=True, index=False)