# What are Convolutional Neural Networks?

- Convolutional Neural Networks (ConvNets or CNNs) are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. 
- ConvNets have been successful in identifying faces, objects and traffic signs apart from powering vision in robots and self driving cars.
- A Convolutional Neural Network (CNN) is comprised of one or more convolutional layers (often with a subsampling step) and then followed by one or more fully connected layers as in a standard multilayer neural network. 
- The architecture of a CNN is designed to take advantage of the 2D structure of an input image (or other 2D input such as a speech signal). 
- This is achieved with **local connections** and tied weights followed by some form of pooling which results in translation invariant features. 
- Another benefit of CNNs is that they are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units. 

In this article we will discuss 
* the architecture of a CNN and 
* the back propagation algorithm to compute the gradient with respect to the parameters of the model in order to use gradient based optimization. 

## Visualizing the Convolution Process

## Simple Convolution

![](https://miro.medium.com/max/1400/1*Fw-ehcNBR9byHtho-Rxbtw.gif)

### Kernel/Filter/Feature Extractor
  * Pixels in the center gets convoloved maximum number of times
  * Pixels at the edges are convolved less
  * Convolution output has lost some information (5 * 5 to 3 * 3)
  * Convolution will be similarly done on all the 3 (or more) channels
  * **Feature Map**: Container or Bucket that holds the extracted features from all the channels as an output of convolution operation
  * Widely used filters have dimensions 3 * 3
  * Other dimensions
    * 5 * 5
    * 1 * 1 (pointwise convolution)
    * Kernels with **Even** dimensions (4 * 4 or 2 * 2) results in **symmetry imbalance**
      * **Symmetry imbalance** has not been observed with odd kernels in practice

* 7 * 7 Image convolved with 3 * 3 kernels results in output of 5 * 5
* We are loosing 1 pixel from **top** and 1 pixel from **bottom** (7-1-1=5) because top and bottom layer are convolved only **ONCE**, other pixels are convolved more than once
* Even though we can have filters with dimension 1 * 3 or 3 * 1, but as per research done from 2014 untill 2017 it was proven that **3 * 3** are the best filters
* For cases where we want to quickly **shrink** an image, use filters with bigger dimensions, ex: 11 * 11, 7 * 7

## 1. Matrix Calculation

![](https://miro.medium.com/max/535/1*Zx-ZMLKab7VOCQTxdZ1OAw.gif)

## 2. Padding Concept
![](https://miro.medium.com/max/395/1*1okwhewf5KCtIPaFib4XaA.gif)

* Padding helps us to retain the information after convolution operation
* Padding ensures **Input** size of images equals **Output** size of image

## 3. Stride Concept
![](https://miro.medium.com/max/294/1*BMngs93_rm2_BpJFH2mS0Q.gif)

## 4. Feature Accumulation
![](https://miro.medium.com/max/2000/1*8dx6nxpUh2JqvYWPadTwMQ.gif)

## 5. Feature Aggregation
![](https://miro.medium.com/max/2000/1*CYB2dyR3EhFs1xNLK8ewiA.gif)

Convolution done on RGB channels with 3 * 3 kernel
  * Same Filter/Kernel convoloved on 3 channels
  * Finally after aggregation we have 1 feature
  * Similarly after using more filters, we would **extract** more features from the image
  * 1 Filter = 1 Feature
    * So, if we use 10 filters on our image, we will have 10 Feature
    * if we use 15 filters on our image, we will have 15 Feature

**Feature Map**: Container or Bucket that holds **all the extracted features** from all the channels as an output of convolution operation



## Convolution Operation

![](https://cdn-media-1.freecodecamp.org/images/gb08-2i83P5wPzs3SL-vosNb6Iur5kb5ZH43)


[Source](https://https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1)

[Source](https://cs231n.github.io/convolutional-networks/)