# An Introduction to Convolutional Neural Networks

This is the implementation of [this](https://arxiv.org/pdf/1511.08458.pdf) research paper on Introduction to Convolution Neural Network by Keiron O’Shea and Ryan Nash.


## Artificial Neural Network(ANN)
ANNs are processing units which was inspired from the biological neural system of human brain. ANNs consists of a large number of interconnected computation units which works collectively to learn from the imput data to optimise the output.
![](https://cdn-images-1.medium.com/max/824/1*eBMwpBBboAXgqsawwOKkPw.png)

We load the input, a multidimensional vector to the input layer of the Neural Network, which will be distributed to the hidden layers.The hidden layer will make decision from previous layers and see how a change in characteristic of the node will will improve the output, this is called learning. (adapting itself by getting the input and trying to decide the output with generalization).



## Convolutional Neural Network(CNNs)
CNNs are mostly used in the field of pattern recognition within images.This helps to encode image-specific features, making the network more suited for image focused tasks.<br>

Traditional ANNs can still be used for image base tasks, but one of the problem is the computation complexity requirement in ANNs for images.<br>
A basic MNIST dataset has 28x28 images, so the first imput layer itself should have 28x28x1 = 784 weights.
Just think about a colour image which whose size is 3461 x 2266, that wll be 3661x2266x3 = 24,887,478 parameters in the first input layer.
And for the network to learn the image patters for the specific output , it will need more layers that just a single input layer.
<br><br>
If in ideal case , you have unlimited computation power to use any ANN of any size, there also comes a problem of [overfitting](https://en.wikipedia.org/wiki/Overfitting). More if the layers and nodes, more better will you network work on the training set, but gives poor performance on test set, due to overfitting. 
<br><br>
So, we need a model which is computationaly inexpensive and gives us good accuracy on both training and testing set.


## CNN Archintecture
![](https://s3.amazonaws.com/cdn.ayasdi.com/wp-content/uploads/2018/06/21100605/Fig2GCNN1.png)

CNN comprises of 3 layers. 
1. Convolution Layer
2. Polling Layer
3. Fully-Connected Layer

Convolution layer will determine the output of neurons of which are connected to local regions of the input through the calculation of the scalar product between their weights and the region connected to the input volume. The rectified linear unit (commonly shortened to ReLu) aims to apply an ’elementwise’ activation function such as sigmoid to the output of the
activation produced by the previous layer.

The pooling layer will then simply perform downsampling along the spatial dimensionality of the given input, further reducing the number of parameters within that activation

The fully-connected layers will then perform the same duties found in standard ANNs and attempt to produce class scores from the activations, to be used for classification. It is also suggested that ReLu may be used between these layers, as to improve performance.

![](https://www.researchgate.net/profile/Holger_Roth/publication/264160750/figure/fig3/AS:296012620025856@1447586316051/The-proposed-convolution-neural-network-consists-of-two-convolutional-layers-max-pooling.png)

## Implementation of Convolutional Neural Network

I will implement the convolutional neural network froms cratch using only basic libraries like numpy(for computational functions), matplotlib(for visualization).<br>
Although CNNs can now be implemented very easily and effectively using packahes like Keras, Tensorflow, Pytorch..etc.

### Importing Libraries & setting up library parameters 

In [0]:
import numpy as np
import h5py
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

np.random.seed(1)

### Convolution layer

Convolution is a mathematical operation.
In mathematics (and, in particular, functional analysis) convolution is a mathematical operation on two functions (f and g) to produce a third function that expresses how the shape of one is modified by the other.--WIKI


In [0]:
#simple convolution function
def single_convolution(a_prev, W, b):
    s = a_prev * W
    Z = np.sum(s)
    Z = Z + float(b)
    return Z

In [7]:
# lets try the function
a = np.random.randn(5,5,1)
W = np.random.randn(5,5,1)
b = np.random.randn(1,1,1)

print('Z = sum(a * W) + b =',single_convolution(a, W, b))

Z = sum(a * W) + b = -5.058783760414956


In [0]:
# we need a function to pad the matrix
def pad_matrix(X, pad):
    X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0)
    return X_pad

In [0]:
# Single Convolution Layer

def single_conv_layer(A_prev, W, b, stride_pad):
  (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape #getting dimension of previous layer
  (w, w, n_C_prev, n_C) = W.shape
  
  # getting stride and pad value from the dictionary
  stride = stride_pad['stride']
  pad = stride_pad['pad']
  
  # setting up dimension of the output
  n_H = np.floor((n_H_prev - w + 2*pad)/stride)+1
  n_W = np.floor((n_W_prev - w + 2*pad)/stride)+1
  
  #initializing output vector with 0s
  Z = np.zeros((m, int(n_H), int(n_W), int(n_C)))
  
  # padding the previous layer
  A_pad = pad_matrix(A_prev, pad)
  
  # iterate and apply the convolution operation
  for i in range(m):
    a = A_pad[i] #getting each example
    for h in range(int(n_H)):
      for wi in range(int(n_W)):
        for c in range(int(n_C)):
          y_ini = h * stride
          y_end = y_ini + w

          x_ini = wi * stride
          x_end = x_ini + w

          a_conv_part = a[y_ini:y_end, x_ini:x_end,:]
          Z[i, h, wi, c] = single_convolution(a_conv_part, W[:,:,:,c], b[:,:,:,c])
  params = (A_prev, W, b, stride_pad) 
  return Z, params
            
            

In [33]:
np.random.seed(1)
A_prev = np.random.randn(10,4,4,3)
W = np.random.randn(2,2,3,8)
b = np.random.randn(1,1,1,8)
pad_stride = {"pad" : 2, "stride": 2}

Z, cache_conv = single_conv_layer(A_prev, W, b, pad_stride)
print("Z = ", np.mean(Z))
print("Z[3,2,1] =", Z[3,2,1])

Z =  0.048995203528855794
Z[3,2,1] = [-0.61490741 -6.7439236  -2.55153897  1.75698377  3.56208902  0.53036437
  5.18531798  8.75898442]


![](https://hub.coursera-notebooks.org/user/cfpitxpxxwzbxkrumhshmg/notebooks/week1/images/Convolution_schematic.gif)

This is what the above function does.

### Pooling Layer

In [0]:
# pooling function
def pooling_layer(A_prev, stride_f, mode = "max"):
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    f = stride_f["f"]
    stride = stride_f["stride"]
    
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    A = np.zeros((m, n_H, n_W, n_C))              
    
    for i in range(m):         
        for h in range(n_H):  
            for w in range(n_W):          
                for c in range (n_C):       
                    
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    a_prev_slice = A_prev[i][vert_start:vert_end, horiz_start:horiz_end, :]
                    
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)
    cache = (A_prev, hparameters)

    return A, cache

In [37]:
A_prev = np.random.randn(2, 4, 4, 3)
hparameters = {"stride" : 2, "f": 3}

A, cache = pooling_layer(A_prev, hparameters)
print("mode of layer is max")
print("A =", A)


mode of layer is max
A = [[[[1.96710175 1.96710175 1.96710175]]]


 [[[2.52832571 2.52832571 2.52832571]]]]


![](https://cdn-images-1.medium.com/max/1200/1*q0lk6B6gzvsSQSDn-20zJA.png)

### Fully connected layer

Fully connected layer is same as ANNs. we take the pooled box of values and make it into a single layer of NN.

![](http://www.jpathinformatics.org/articles/2017/8/1/images/JPatholInform_2017_8_1_1_201108_f3.jpg)

Then continue adding new layers of fully connected layer is neeeded , else use the activation to predict the output.

### The final Model

The model will look like
<pre>CONV-->POOL-->CONV--->POOL..........-->CONV-->POOL-->FNN-->ACTIVATION-->PREDICTION</pre>
CONV- Convolution layer
POOL - Pooling Layer
FNN - Fully Connected layer
