# **Digit Recognizer with Convolutional Neural Networks (CNN)**

### **Content**
* [Introduction](#1)
* [Data Preparation](#2)
    * [Normalization](#3)
    * [Reshape](#4)    
    * [Label Encoding](#5)    
    * [Train - Test Split](#6)    
* [CNN](#7)    
* [Conclusion](#8)

<a id="1"></a> 
## **Introduction**

![CNN](http://i64.tinypic.com/15mldu9.jpg)

* A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and object detection. That is specifically designed to process pixel data.
* It has feature detector for scanning images. This is also called filter. It detects edges or convex shapes.
* After detection operation, filtered image is obtained. (Convolved feature)
* While detection operation a feature detector matrix scans picture.
* After filtering layer it uses activation functions like ReLU to break linearity.
* Repetitive filtering operations cause image size to decrease. "Same Padding" method is used for prevent decreasing  of image size.
* **Max Pooling** :Continuously reduce the spatial size of the representation to reduce the amount of parameters and computation in the network. Pooling layer operates on each feature map independently. It also controls overfitting.
* **Flattening** : Convert matrices to one dimensional vectors.
* **Full Connection** : All neurons are connected with each other. They are all connected from previous to next layer.


![](http://)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

<a id="2"></a> 
### **Data Preparation**
* We'll read train data from csv for model trainin. And test for submission

In [None]:
train = pd.read_csv("../input/train.csv")
print(train.shape)
train.head()

In [None]:
# Read test data
test = pd.read_csv("../input/test.csv")
print(test.shape)
test.head()

In [None]:
# Get digin labels to yTrain
yTrain = train["label"]

# Drop 'label' column
xTrain = train.drop(labels = ["label"], axis = 1) 

In [None]:
plt.figure(figsize = (15,7))
g = sns.countplot(yTrain, palette = "icefire")
plt.title("Number of digits")
yTrain.value_counts()

In [None]:
def drawImage(imgArray):    
    imgArray = imgArray.reshape((28, 28))
    plt.imshow(imgArray, cmap = 'gray')
    plt.title(train.iloc[0,0])
    plt.axis("off")
    plt.show()

In [None]:
# plot some samples
drawImage(xTrain.iloc[9].as_matrix())
drawImage(xTrain.iloc[5].as_matrix())

<a id="3"></a> 
## **Normalization**

* We'll compress  the data to the range of [0, 1]
* CNN works faster on [0, 1] rather than [0, 255]

In [None]:
xTrain = xTrain / 255.0
test = test / 255.0
print("xTrain shape: ",xTrain.shape)
print("test shape: ",test.shape)

<a id="4"></a> 
## **Reshape**

* Keras needs a dimension for channels. We have gray scaled data so it needs 1 channel. We'll transform 28x28x1 3D matrix.

In [None]:
xTrain = xTrain.values.reshape(-1,28,28,1)
test = test.values.reshape(-1,28,28,1)
print("xTrain shape: ",xTrain.shape)
print("test shape: ",test.shape)

<a id="5"></a> 
## **Label Encoding**

* We'll encode the labels to vectors.
* Labels are 1,2,3,4..  after convertion they like 1-> [0,1,0,0,0,0,0,0,0,0]

In [None]:
# one-hot-encoding
from keras.utils.np_utils import to_categorical
yTrain = to_categorical(yTrain, num_classes = 10)

<a id="6"></a> 
## **Train - Test Split**

* Train - Test Split for training and validation

In [None]:
from sklearn.model_selection import train_test_split
xTrain, xVal, yTrain, yVal = train_test_split(xTrain, yTrain, test_size = 0.1, random_state = 2)
print("xTrain shape",xTrain.shape)
print("xVal shape",xVal.shape)
print("yTrain shape",yTrain.shape)
print("yVal shape",yVal.shape)

In [None]:
# Draw an example
drawImage(xTrain[2][:,:,0])

<a id="7"></a> 
# **CNN**

* We'll use Keras library for building convolutional neural networks.

In [None]:
from sklearn.metrics import confusion_matrix
import itertools

from keras.utils.np_utils import to_categorical # convert to one-hot-encoding
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, BatchNormalization
from keras.optimizers import RMSprop,Adam
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau
from keras.callbacks import LearningRateScheduler

model = Sequential()

model.add(Conv2D(filters = 64, kernel_size = (4,4),padding = 'Same', activation = 'relu', input_shape = (28,28,1)))
model.add(BatchNormalization())
model.add(Conv2D(filters = 32, kernel_size = (4,4),padding = 'Same', activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters = 16, kernel_size = (3,3),padding = 'Same', activation = 'relu'))
model.add(BatchNormalization())
model.add(Conv2D(filters = 16, kernel_size = (3,3),padding = 'Same', activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2,2), strides = (2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters = 16, kernel_size = (3,3),padding = 'Same', activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

### **Optimizer**

In [None]:
optimizer = Adam(lr = 0.0001, beta_1 = 0.9, beta_2 = 0.999)

### **Model Compilation**
* Categorical Cross-Entropy method will be used for loss function. 
* We will use categorical cross-entropy for multi class labels.

In [None]:
model.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics = ["accuracy"])

### **Epochs and Batch Size Definition**

In [None]:
epochs = 30 # 1 epoch means 1 forward and 1 backward pass.
batch_size = 385 # Number of training samples for one forward/backward pass.

### **Data Augmentation**

![](http://i68.tinypic.com/346kytk.jpg)

* To avoid overfitting, we will expand artificially our digit dataset. 
* This way we can diversify the data
* After this operation we will have various shapes of our digit images.

In [None]:
datagen = ImageDataGenerator(
        featurewise_center = False,  # set input mean to 0 over the dataset
        samplewise_center = False,  # set each sample mean to 0
        featurewise_std_normalization = False,  # divide inputs by std of the dataset
        samplewise_std_normalization = False,  # divide each input by its std
        zca_whitening = False,  # dimesion reduction
        rotation_range = 10,  # randomly rotate images in the range 10 degrees
        zoom_range = 0.1, # Randomly zoom image 1%
        width_shift_range = 0.1,  # randomly shift images horizontally 1%
        height_shift_range = 0.1,  # randomly shift images vertically 1%
        horizontal_flip = False,  # randomly flip images
        vertical_flip = False)  # randomly flip images

datagen.fit(xTrain)

### **Setting Annealer**

* It is a learning rate optimizer method provided by Keras. It optimizes learning rate during epoch steps.

In [None]:
annealer = LearningRateScheduler(lambda x: 1e-3 * 0.9 ** x)

In [None]:
# fit the model
history = model.fit_generator(datagen.flow(xTrain,
                                           yTrain, 
                                           batch_size = batch_size), 
                              epochs = epochs, 
                              validation_data = (xVal, yVal), 
                              steps_per_epoch = xTrain.shape[0] // batch_size,
                              callbacks = [annealer])

Below plot shows loss values against epochs.

In [None]:
# Plot the loss and accuracy curves for training and validation 
plt.plot(history.history['val_loss'], color = 'b', label = "validation loss")
plt.title("Test Loss")
plt.xlabel("Number of Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

In [None]:
# Final Accuracy
final_loss, final_acc = model.evaluate(xVal, yVal, verbose = 0)
print("Final loss: {0:.4f}, final accuracy: {1:.4f}".format(final_loss, final_acc))

### **Evaluation of Model**

In [None]:
# Predict the values from the validation dataset
Y_pred = model.predict(xVal)
# Convert predictions classes to one hot vectors 
Y_pred_classes = np.argmax(Y_pred,axis = 1) 
# Convert validation observations to one hot vectors
Y_true = np.argmax(yVal,axis = 1) 
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) 
# plot the confusion matrix
f,ax = plt.subplots(figsize=(10, 10))
sns.heatmap(confusion_mtx, annot=True, linewidths=0.01,cmap="Greens",linecolor="gray", fmt= '.1f',ax=ax)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()

Results can be improved by adding filters or increasing epochs. 

In [None]:
# predict results
results = model.predict(test)
results = np.argmax(results,axis = 1)
results = pd.Series(results, name = "Label")

In [None]:
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)
submission.to_csv("cnn_mnist_datagen.csv", index = False)

<a id="8"></a> 
## **Conclusion**
* We used Keras for building Convolutional Neural Network model. We got approximately %99 accuracy.
* *If you have a suggestion, I'd be happy to read it.*