![](https://www.kdnuggets.com/wp-content/uploads/photo.jpg)

When you take a minute to stop and look around, the technological advancements of today could be perceived as something out of a futuristic novel. Cars are learning to drive, hands-free devices can turn on your lights or toast your bread, and flying drones are circling the skies. This is 2018. While the manifestation of Artificial Intelligence (AI) and Machine Learning (ML) haven’t been realized, impressive progress has certainly been made.

![](https://i.stack.imgur.com/mFBCV.png)
![](http://news.mit.edu/sites/mit.edu.newsoffice/files/styles/news_article_image_top_slideshow/public/images/2018/MIT-Invisible-Vision_0.jpg?itok=DWJiIHwB)

                         Reveals “invisible” objects in the dark

![](https://img-s3.onedio.com/id-5867000b4813bb8710ce7f1a/rev-0/w-635/f-jpg-webp/s-21846e142a4303dd1b398188d8c63b16b94de38f.webp)

                                   Google Lip Reading outperforms humans

![](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/03/191003285_edd8d0cf58-300x225.jpg)

                    A man wearing a black shirt and a little girl wearing an orange dress share a treat .

![](https://www.meme-arsenal.com/memes/9d68ce11eedb5c3d61049b393eee7dac.jpg)

## What is that Commonality among all the pictures have? 


## Video Time

# CNN

### What is convolution

Convolutional Neural networks allow computers to see, in other words, Convnets are used to recognize images by transforming the original image through layers to a class scores. CNN was inspired by the visual cortex. Every time we see something, a series of layers of neurons gets activated, and each layer will detect a set of features such as lines, edges. The high level of layers will detect more complex features in order to recognize what we saw.

![](https://cdn-images-1.medium.com/max/800/1*XbuW8WuRrAY5pC4t-9DZAQ.jpeg)

### Why CNN became popular?

CNNs can be thought of automatic feature extractors from the image. While if I use a algorithm with pixel vector I lose a lot of spatial interaction between pixels

ConvNet has two parts: feature learning (Conv, Relu,and Pool) and classification(FC and softmax).

![](https://cdn-images-1.medium.com/max/800/1*2SWb6CmxzbPZijmevFbe-g.jpeg)

## Input (the training data):


The input layer or input volume is an image that has the following dimensions: [width x height x depth].It is a matrix of pixel values.

**Example: Input: [32x32x3]=>(width=32, height=32, depth=3)The depth here, represents R,G,B channels.**

## CONV layer:
    
The objective of a Conv layer is to extract features of the input volume.

![](https://cdn-images-1.medium.com/max/800/1*_34EtrgYk6cQxlJ2br51HQ.gif)

The outcome of this operation is a single integer of the output volume (feature map). Then we slide the filter over the next receptive field of the same input image by a Stride and compute again the dot products between the new receptive field and the same filter. We repeat this process until we go through the entire input image. The output is going to be the input for the next layer.

### Terminology

* **Filter, Kernel, or Feature Detector** is a small matrix used for features detection.

* **Convolved Feature, Activation Map or Feature Map** is the output volume formed by sliding the filter over the image and computing the dot product.

* **Depth** is the number of filters.

* **Stride** has the objective of producing smaller output volumes spatially. For example, if a stride=2, the filter will shift by the amount of 2 pixels as it convolves around the input volume.


### How to compute the output volume[W2xH2xD2]?

Answer:

* W2=(W1−F+2P)/S+1

* H2=(H1−F+2P)/S+1

* D2=K

## ReLU layer :
ReLU Layer applies an elementwise activation function max(0,x), which turns negative values to zeros (thresholding at zero). This layer does not change the size of the volume and there are no hyperparameters.

## POOL layer:

Pool Layer performs a function to reduce the spatial dimensions of the input, and the computational complexity of our model. And it also controls overfitting. It operates independently on every depth slice of the input. There are different functions such as Max pooling, average pooling

Example of a Max pooling with 2x2 filter and stride = 2. So, for each of the windows, max pooling takes the max value of the 4 pixels.

![](https://cdn-images-1.medium.com/max/800/1*S86gKd43MIYquHIeR9m8JQ.png)

It has two hyperparameters: Filter (F) and Stride (S). More generally, having the input W1×H1×D1, the pooling layer produces a volume of size W2×H2×D2

* W2=(W1−F)/S+1
* H2=(H1−F)/S+1
* D2=D1

## Flatten

W2×H2×D2 tensor becomes vector of [W2×H2×D2]

## Fully Connected Layer (FC):

Fully connected layers connect every neuron in one layer to every neuron in another layer. The last fully-connected layer uses a softmax activation function for classifying the generated features of the input image into various classes based on the training dataset.


## Example of a ConvNet architecture:

CIFAR-10 classification [INPUT — CONV — RELU — POOL — FC]

## Hands on CNN using Digit Recognizer

This is a 5 layers Sequential Convolutional Neural Network for digits recognition trained on MNIST dataset. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
%matplotlib inline

np.random.seed(2)

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import itertools

from keras.utils.np_utils import to_categorical # convert to one-hot-encoding
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau

In [None]:
# Load the data
train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")

In [None]:
Y_train = train["label"]

# Drop 'label' column
X_train = train.drop(labels = ["label"],axis = 1) 

# free some space
del train 

In [None]:
g = sns.countplot(Y_train)

In [None]:
# Normalize the data
X_train = X_train / 255.0
test = test / 255.0
# Reshape image in 3 dimensions (height = 28px, width = 28px , canal = 1)
X_train = X_train.values.reshape(-1,28,28,1)
test = test.values.reshape(-1,28,28,1)

In [None]:
# Encode labels to one hot vectors (ex : 2 -> [0,0,1,0,0,0,0,0,0,0])
Y_train = to_categorical(Y_train, num_classes = 10)

In [None]:
# Set the random seed
random_seed = 2

In [None]:
# Split the train and the validation set for the fitting
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size = 0.1, random_state=random_seed)

In [None]:
g = plt.imshow(X_train[0][:,:,0])

In [None]:
model = Sequential()

model.add(Conv2D(filters = 32, kernel_size = (5,5),padding = 'valid', 
                 activation ='relu', input_shape = (28,28,1)))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.25))


model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', 
                 activation ='relu'))

model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))


model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation = "softmax"))

In [None]:
model.summary()

In [None]:
# Define the optimizer
optimizer = RMSprop(lr=0.001)

In [None]:
# Compile the model
model.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy"])

In [None]:
epochs = 2 
batch_size = 64

In [None]:
model.fit(X_train,Y_train,epochs=epochs,batch_size=batch_size,validation_data=(X_val,Y_val))

In [None]:
# Predict the values from the validation dataset
Y_pred = model.predict(X_val)
# Convert predictions classes to one hot vectors 
Y_pred_classes = np.argmax(Y_pred,axis = 1) 
# Convert validation observations to one hot vectors
Y_true = np.argmax(Y_val,axis = 1) 
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) 
confusion_mtx

In [None]:
# predict results
results = model.predict(test)

# select the indix with the maximum probability
results = np.argmax(results,axis = 1)

results = pd.Series(results,name="Label")

In [None]:
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),results],axis = 1)

submission.to_csv("submission.csv",index=False)

## Key Take aways:

* CNN helps in preserving the spatial information unlike NN.

* Total no of parameter that are to be trained in CNN are minimal when compared to NN

* Pooling avoids overfitting by down sampling the data.


**Please upvote if you find this kernel helpful.**

## Spread Happiness and Kindness...