# Problem Statment 

Identify digits from a dataset of tens of thousands of handwritten images. The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning.

In this Notebook, we will discover how to develop a convolutional neural network for handwritten digit classification from scratch.

What we will implement in this notebook:

* How to develop a test harness to develop a robust evaluation of a model and establish a baseline of performance for a classification task.
* How to explore extensions to a baseline model to improve learning and model capacity.
* How to develop a finalized model, evaluate the performance of the final model, and use it to make predictions on new images.


## **This task can be divided into the following subtasks.**

1. **Data Preparation**
2. **Building a CNN Model**
3. **Evalution of the model**
4. **Prediction of validation data**


# Data Loading

It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.

The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.

We will load the data and visualize the data. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Conv2D, Dense, Dropout, Flatten, MaxPool2D,BatchNormalization

In [None]:
train = pd.read_csv('../input/digit-recognizer/train.csv')
test = pd.read_csv('../input/digit-recognizer/test.csv')

In [None]:
train.shape, test.shape

In [None]:
train.head()

In [None]:
test.head()

# Spliting the data in to X_train and Y_train

In [None]:
Y_train=train['label']

# Drop 'label' column
X_train = train.drop(labels = ["label"],axis = 1)

Y_train.value_counts()

# Data visualization

In [None]:
g = sns.countplot(Y_train)
plt.title('The distribution of the digits in the dataset', weight='bold', fontsize='18')

From the above plot, we say that all classes have almost equal distribution, We don't have any class imbalance here so we can go forward with our preprocessing.

In [None]:
# Check the data
X_train.isnull().any().describe()

In [None]:
# Check the data
test.isnull().any().describe()

# Normalization



Images can be used to rescale pixel values from the range of 0-255 to the range 0-1 preferred for neural network models.

Scaling data to the range of 0-1 is traditionally referred to as normalization.

here we are normalizing the pixel values of grayscale images, e.g. rescale them to the range [0,1]. This involves first converting the data type from unsigned integers to floats, then dividing the pixel values by the maximum value.

In this case, the ratio is 1/255 or about 0.0039.


In [None]:
# Normalize the data
X_train = X_train / 255
test = test / 255

# Reshape

![Reshape](https://backtobazics.com/wp-content/uploads/2018/08/numpy-reshape-vector-to-matrix.jpg)

In [None]:
# Reshape image in 3 dimensions (height = 28px, width = 28px , channel = 1)
X_train = X_train.values.reshape(-1,28,28,1)
test = test.values.reshape(-1,28,28,1)

Here is a glimpse of what we will be dealing with:

* Images of handwritten Digits from 0 to 9
* We will feed those images to the CNN in order to learn and predict the test images.
* We have below an example of few digit images from this dataset

In [None]:
X_train[0].shape

In [None]:
plt.figure(figsize=(15,8))
for i in range(50):
    plt.subplot(5,10,i+1)
    plt.imshow(X_train[i].reshape((28,28)),cmap='binary')
    plt.axis("off")
plt.show()

# Encoding 

![Encoding](https://i.imgur.com/wKtY1Og.png)

In [None]:
print("The shape of the labels before One Hot Encoding",Y_train.shape)
Y_train = to_categorical(Y_train, num_classes = 10)
print("The shape of the labels after One Hot Encoding",Y_train.shape)

In [None]:
Y_train[0]

In [None]:
# Split the train and the validation set for the fitting
random_seed = 2
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size = 0.3, random_state=random_seed)

In [None]:
import matplotlib.pyplot as plt
# Some examples
g = plt.imshow(X_train[0][:,:,0])

# Data augmentation

![Data Augmentation](https://nanonets.com/blog/content/images/2018/11/1_dJNlEc7yf93K4pjRJL55PA--1-.png)

* ImageDataGenerator accepts the original data, randomly transforms it, and returns only the new, transformed data.
* The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

In [None]:
datagen = ImageDataGenerator(zoom_range = 0.1, width_shift_range = 0.1, height_shift_range = 0.1, rotation_range = 10) 

# Building a CNN Model
![Cnn Model Architecture](https://miro.medium.com/max/700/1*uAeANQIOQPqWZnnuH-VEyw.jpeg)

## Constructing a sequential CNN model

* The model type that we will be using is Sequential. Sequential is the easiest way to build a model in Keras. It allows you to build a model layer by layer.

![Convolution neural network](https://miro.medium.com/max/1000/1*vkQ0hXDaQv57sALXAJquxA.jpeg)

* We use the ‘add()’ function to add layers to our model.
* Our first 3 layers are Conv2D layers. These are convolution layers that will deal with our input images, which are seen as 2-dimensional matrices.

![kernel](https://i.imgur.com/NcyYyaJ.gif)

* Activation is the activation function for the layer. The activation function we will be using for our first 2 layers is the ReLU, or Rectified Linear Activation. This activation function has been proven to work well in neural networks.

![Activation Function](https://miro.medium.com/max/1000/1*4ZEDRpFuCIpUjNgjDdT2Lg.png)

* Batchnormalization layer - Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. [More info](https://medium.com/analytics-vidhya/everything-you-need-to-know-about-regularizer-eb477b0c82ba)

![Normalization](https://miro.medium.com/max/1200/1*DmnOhSTIzn04sC0w1d3FPg.png)

* Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map. Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated presence of a feature respectively.

![Pooling](https://qph.fs.quoracdn.net/main-qimg-cf2833a40f946faf04163bc28517959c)

* Dropoutlayer - A single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training.
![Drop out](https://miro.medium.com/max/700/0*bTMVb8uekPpHxDcm)

* In between the Conv2D layers and the dense layer, there is a ‘Flatten’ layer. Flatten serves as a connection between the convolution and dense layers.
* ‘Dense’ is the layer type we will use in for our output layer. Dense is a standard layer type that is used in many cases for neural networks.
* We will have 10 nodes in our output layer, one for each possible outcome (0–9).
* The activation is ‘softmax’. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has the highest probability.

In [None]:
model = Sequential()
model.add(Conv2D(filters = 32, kernel_size = (3, 3), activation = 'relu', input_shape = (28, 28, 1)))
model.add(BatchNormalization())
model.add(Conv2D(filters = 32, kernel_size = (3, 3), activation = 'relu'))
model.add(BatchNormalization())
model.add(Conv2D(filters = 32, kernel_size = (5, 5), activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D(strides = (2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters = 64, kernel_size = (3, 3), activation = 'relu'))
model.add(BatchNormalization())
model.add(Conv2D(filters = 64, kernel_size = (3, 3), activation = 'relu'))
model.add(BatchNormalization())
model.add(Conv2D(filters = 64, kernel_size = (5, 5), activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D(strides = (2,2)))
model.add(Dropout(0.25))


model.add(Flatten())
model.add(Dense(512, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1024, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax'))

## Compiling the model

While compiling the model, three parameters requires loss, the optimizer and metrics.

* categorical_crossentropy is a loss function for categorical variables
* Use the Adam Optimizer to control the learning rate
* The metric 'accuracy' is used to measure the performance of the model

In [None]:
model.compile(optimizer='adam',metrics=['accuracy'],loss='categorical_crossentropy')

In [None]:
reduction_lr = ReduceLROnPlateau(monitor='val_accuracy',patience=2, verbose=1, factor=0.2, min_lr=0.00001)

Reduce learning rate when a metric has stopped improving.

In [None]:
hist = model.fit_generator(datagen.flow(X_train,Y_train,batch_size=32),epochs=20,validation_data = (X_val,Y_val),callbacks=[reduction_lr])

In [None]:
loss = pd.DataFrame(model.history.history)
loss[['loss', 'val_loss']].plot()
loss[['accuracy', 'val_accuracy']].plot()

In [None]:
final_loss, final_acc = model.evaluate(X_val, Y_val, verbose=0)
print("Final loss: {0:.4f}, final accuracy: {1:.4f}".format(final_loss, final_acc))

In [None]:
y_pred = model.predict(X_val, batch_size = 64)

y_pred = np.argmax(y_pred,axis = 1)
y_pred = pd.Series(y_pred,name="Label")
y_pred

In [None]:
plt.style.use('seaborn')
sns.set_style('whitegrid')
fig = plt.figure(figsize=(10,10))
ax1 = plt.subplot2grid((1,2),(0,0))
train_loss = hist.history['loss']
test_loss = hist.history['val_loss']
x = list(range(1, len(test_loss) + 1))
plt.plot(x, test_loss, color = 'cyan', label = 'Test loss')
plt.plot(x, train_loss, label = 'Training losss')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title(' Loss vs. Epoch',weight='bold', fontsize=18)

ax1 = plt.subplot2grid((1,2),(0,1))
train_loss = hist.history['loss']
test_loss = hist.history['val_loss']
x = list(range(1, len(test_loss) + 1))
plt.plot(x, test_loss, color = 'cyan', label = 'Test loss')
plt.plot(x, train_loss, label = 'Training losss')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title(' Accuracy vs. Epoch',weight='bold', fontsize=18)

In [None]:
Y_val.shape, y_pred.shape

In [None]:
Y_val = np.argmax(Y_val,axis = 1)
Y_val = pd.Series(Y_val,name="Label")

In [None]:
from sklearn.metrics import confusion_matrix
cmatrix = confusion_matrix(Y_val, y_pred)

plt.figure(figsize=(15,8))
plt.title('Confusion matrix of the test/predicted digits ', weight='bold', fontsize=18)
sns.heatmap(cmatrix,annot=True,cmap="Reds",fmt="d",cbar=False)

In [None]:
# #We use np.argmax with y_test and predicted values: transform them from 10D vector to 1D
# # class_y = np.argmax(Y_val,axis = 1) 
# # class_num=np.argmax(y_pred, axis=1)
# #Detect the errors
# errors = (y_pred - Y_val != 0)
# #Localize the error images
# predicted_er = y_pred[errors]
# y_test_er = Y_val[errors]
# x_test_er = X_val[errors]
#Plot the misclassified numbers
# plt.figure(figsize=(15,9))

# for i in range(30):
#     plt.subplot(5,6,i+1)
#     plt.imshow(x_test_er[i].reshape((-1,28,28,1)),cmap='binary')
#     plt.title( np.argmax(predicted_er[i]), size=13, weight='bold', color='red')
#     plt.axis("off")


# plt.show()



In [None]:
# test = test.values.reshape(-1, 28, 28, 1) / 255
y_pred1 = model.predict(test, batch_size = 64)

y_pred1 = np.argmax(y_pred1,axis = 1)
y_pred1 = pd.Series(y_pred1,name="Label")
y_pred1

In [None]:
y_pred1

In [None]:
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),y_pred1],axis = 1)
submission.to_csv("submission.csv",index=False)

More reference

https://www.pyimagesearch.com/2019/07/08/keras-imagedatagenerator-and-data-augmentation/

https://machinelearningmastery.com/how-to-normalize-center-and-standardize-images-with-the-imagedatagenerator-in-keras/

https://towardsdatascience.com/complete-guide-of-activation-functions-34076e95d044

https://towardsdatascience.com/building-a-convolutional-neural-network-cnn-in-keras-329fbbadc5f5

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

https://medium.com/analytics-vidhya/everything-you-need-to-know-about-regularizer-eb477b0c82ba

https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ReduceLROnPlateau

https://www.pluralsight.com/guides/getting-started-tensorflow