# **Deep keras CNN as digit-recognizer : complete tutorial**

### Welcome on this tutorial ! In this notebook, we'll see how to build a deep convolutionnal neural network. We'll then train this CNN on the MNIST dataset to turn it into an incredibly effective digit-recognizer with a near 100% accuracy !

![](http://petr-marek.com/wp-content/uploads/2017/07/mnist.png)

### We are going to predict which number is drawn on different images. By doing this, we will go through several topics and fundamental techniques of deep learning. Here is a list of these techniques and some additional resources that you can consult to find out more:¶

[*Overfitting*](https://elitedatascience.com/overfitting-in-machine-learning)   
[*Artificial neural networks*](https://www.superdatascience.com/blogs/the-ultimate-guide-to-artificial-neural-networks-ann)  
[*Convolutionnal neural networks*](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)  
[*Activation functions in neural networks*](https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6)  
[*Deep Learning with Python*](https://www.manning.com/books/deep-learning-with-python)

## **Table of Contents**

1. [**Data exploration & preparation**](#data_exploration)
2. [**CNN Theory**](#theory)
3. [**Building model with Keras**](#model)
4. [**Make prediction**](#prediction)

## **Imports and useful functions**

In [None]:
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

import os
import pandas as pd
import numpy as np
import seaborn as sns

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization, Conv2D, MaxPool2D
from keras.utils import np_utils

In [None]:
#path of datasets
path_train = '../input/train.csv'
path_test = '../input/test.csv'

## **1. Data exploration** <a id="data_exploration"></a>

### Let's begin by printing a preview of the dataset and look at its size and what it contains:

In [None]:
#create dataframe for training dataset and print 5 first rows as preview
train_df_raw = pd.read_csv(path_train)
train_df_raw.head()

In [None]:
# print infos about the dataset
train_df_raw.info()

In [None]:
train_df_raw.describe()

In [None]:
# Check if there are missing datas
train_df_raw.isnull().values.any()
train_df_raw.isna().values.any()

### From this first preview and according to information from kaggle, it seems that each image contains 784 pixels (28x28 pixels), each pixel contain a value between 0 and 254 reprensenting the gray level of this pixel. There is no missing values. Let's try to display the first images from the values of the dataset:

In [None]:
def dislay_images_from_pixels(nb): 
    images = [np.array(train_df_raw.drop(['label'], 1).iloc[i].tolist()).reshape(28, 28) for i in range (nb)]
    rows = int(nb/15) if int(nb/15) != 0 else int(nb/15) + 1
    plt.figure(figsize=(16, rows))
    for n in range(1, nb + 1):
        plt.subplot(rows, 15, n)
        plt.imshow(images[n-1])
        plt.axis('off')
        
dislay_images_from_pixels(120)

### Finally, let's check at the occurence of each class, to be sure that there is no asymmetry in our data that can skew the algorithm :

In [None]:
sns.set()
plt.figure(figsize=(20, 8))
sns.distplot(train_df_raw.label)
plt.show()

### Nice, the distribution of the labels seems to be sufficiently uniform to solve properly the multiclass classification problem. 
### Now, we have to prepare our data in order to pass it to the CNN. This preparation involves reshaping and normalize our observations since the CNN needs a 4D tensor as input. Normalizing the data is necessary when working with neural networks because feeding a NN with large or heterogeneous values can trigger large gradient updates that will prevent the network from converging. Finally, we need to hot-one encode the target, to make it match with the shape of the output of our CNN, which is a 10 length vector for a 10-class classification problem.

In [None]:
# Prepare data in order to pass it to a CNN
train_df = train_df_raw.copy()

# Separate target from images
X_train = train_df.drop(['label'], 1)
Y_train = train_df['label']

# Add 2 dimensions to pass a 4D tensor to the CNN and normalize values
X_train = X_train.values.reshape(X_train.shape[0], 28, 28, 1).astype('float32') / 255 
# One-hot encoding of target
Y_train = np_utils.to_categorical(Y_train, 10)

## **2. CNN Theory** <a id="theory"></a>

### We are going to build a convolutionnal neural network. A CNN is a neural network componed of two parts : features learning and classification. The first part consists of basically applying several operations on images before passing the result to a standard ANN performing the second part. These operations contained in the features learning are convolution, application of an activation function, pooling (which we'll not use in this architecture), and flattening. Convolution + activation function and pooling may be repeated several times to build deeper networks and improve performances.
### Convolution is used to detect features in the images to make it more recognizable, it creates features maps containing the important features of the images. The activation function will then be applied on the features maps to break linearity of the images and make features even more recognizable. Pooling is used to downsize the result and make cnn faster. Flattening will transform the feature maps matrix from the convolution operation into a 1D vector to pass it to a standard ANN.
### Here is a standard architecture of a CNN, we can see that if is just a succession of convolution + activation function operations (here the function used is a relu, i.e. rectifier function) before an artificial neural network:

![](https://cdn-images-1.medium.com/max/2400/1*vkQ0hXDaQv57sALXAJquxA.jpeg)

### Here is a convolutionnal neural network detailled when applied to the digit recognition task. Our model will be deeper (with more convolutionnal layers) and will not contain pooling steps but will basically be built on the same architecture :

![](https://cdn-images-1.medium.com/max/1600/1*uAeANQIOQPqWZnnuH-VEyw.jpeg)

## **3. Building model with Keras** <a id="model"></a>

### Now, let's implement our model ! We'll use Keras, a framefork serving as high level API for Tensorflow, which is a tensor-manipulation framework made by google. Keras allows you to build neural networks by assembling blocks (which are the layers of your neural network). For more details, [here](https://elitedatascience.com/keras-tutorial-deep-learning-in-python) is a great keras tutorial.

![](https://cdn.actuia.com/wp-content/uploads/2018/05/keras.png)

In [None]:
def build_cnn():
    
    model = Sequential()
    
    # Multiple convolution operations to detect features in the images
    model.add(Conv2D(32,kernel_size=3,activation='relu',input_shape=(28,28,1)))
    model.add(BatchNormalization())
    model.add(Conv2D(32,kernel_size=3,activation='relu')) # no need to specify shape as there is a layer before
    model.add(BatchNormalization())
    model.add(Conv2D(32,kernel_size=5,strides=2,padding='same',activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.4)) # reduce overfitting

    model.add(Conv2D(64,kernel_size=3,activation='relu'))
    model.add(BatchNormalization())
    model.add(Conv2D(64,kernel_size=3,activation='relu'))
    model.add(BatchNormalization())
    model.add(Conv2D(64,kernel_size=5,strides=2,padding='same',activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.4)) # reduce overfitting
    
    # Flattening and classification by standard ANN
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.4))
    model.add(Dense(10, activation='softmax'))
    
    model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
    
    return model

### Now, let's explain the above keras model and try to link it to the representation of a convolutionnal neural network studied before. In the convolutionnal neural network architecture implemented above, we can observe several types of blocks :

- Convolutionnal layers :
        
      model.add(Conv2D(32,kernel_size=3,activation='relu'))
        
### These layers **only** performs the two first steps seen above (convolution and applying activation function). It first applies convolution on the images to detect features and create features maps. Then, it applies a relu activation function on the result of the convolution to supress linearity in the image to improve features detection. After this step (repeated several times to improve features detection), we are technically ready for passing the result to flattening and to the simple artificial neural network !  
        
- Batch normalization  :

      model.add(BatchNormalization())
        
### The batch normalisation is here to normalize the data after it goes out the convolutionnal layer, helping gradient propagation and allowing to build deeper models by stacking convolutional layers.
        
- Dropout

      model.add(Dropout(0.4))
        
### Dropout disables a fraction of all neurons of the previous densely connected or convolutionnal layer to reduce overfitting.
        
- Flattening
 
      model.add(Flatten())
         
### This just proceed the flattening operation we've seen before, in order to pass the features maps resulting of convolution operation to the standard artificial neural network.

![](https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/73_blog_image_1.png)
         
- Fully connected layer

      model.add(Dense(128, activation='relu'))

### This layer (called fully connected layer) is the main component of the ANN which will classify each digit to it's belonging class (i.e. the digit recognized). The softmax activation function is used in the output layer to convert output from 10 output nodes in probabilities of each digit to belong in each class.

### Do not hesitate to print a summary of the model to check the total number of parameter which gives you a overview of the depth and the complexity of your model (see below). We can clearly see here that the first convolutionnal layer, for exemple, creates 32 features maps (each one corresponding to one particular feature of the image) of size 26x26 from a single 28x28 image. Moreover, we can also observe that flattening transforms a 3D tensor into a 1D tensor (or 4D into 2D):

In [None]:
build_cnn().summary()

### These explanations are absolutely not complete at all and there is so much to say about neural networks that it is impossible to cover it in a notebook. To understand better these architecture and dive deeply (that's the case to say !)  in this exciting field, I highly recommend you to read the great book [**Deep Learning with Python**](https://www.manning.com/books/deep-learning-with-python) written by François Chollet, the creator of the keras framework. An other great ressource is accessible at the beginning of the notebook (convolutionnal neural networks) and may allows you to really understand how the features learning part works in a detailled way.

### Finally, I encourage you to check [**this link**](http://scs.ryerson.ca/~aharley/vis/conv/flat.html) to vizualise all these operations in practice, applied to the MNIST dataset ! Just draw the number in the square on the left of your screen and look at the result and the prediction made by the CNN. You can observe what the features maps looks like after the convolutions operation, and what it looks like after the pooling and flattening.

## **4. Make prediction** <a id="prediction"></a>

In [None]:
X_test = pd.read_csv(path_test)
X_test = X_test.values.reshape(X_test.shape[0], 28, 28, 1).astype('float32') / 255

In [None]:
model = build_cnn()
model.fit(X_train, Y_train, batch_size=64, epochs=16)

In [None]:
prediction = model.predict_classes(X_test, verbose=0)
submission = pd.DataFrame({"ImageId": list(range(1,len(prediction)+1)),
                         "Label": prediction})
submission.to_csv("submission.csv", index=False, header=True)