# Various applications of CNN in ML



## Timeline
- What is intelligence and the general Buzzwords
- What is the ultimate aim
- What is a neural network
- What are different types of neural networks and when to use them
- Convolutional neural networks
- Different applications of CNNs
- Is CNN the ultimate solution?
- Some modern methods


## What is intelligence and the general Buzzwords
![](images/1.png)


## What is the ultimate aim?
![](images/2.png)<br>
<center><font size="5">**Artificial General Intelligence or Strong AI or Human Level AI**</font></center>
<br><br><br>


<font size="5">**Neural Networks, the holly grail**</font>
![](images/4.png)<br>
<center><font size="4">**Function Approximation**</font></center>
<br><br>
![](images/3.png)


### Which one to choose?
Feed forward Neural Networks &nbsp; &rarr; &nbsp; Deterministic approach<br><br>
Recurrent Neural Networks &nbsp; &rarr; &nbsp; Time dependent data<br><br>
Convolutional Neural Networks &nbsp; &rarr; &nbsp; Vision tasks<br><br>


## Convolutional Neural Networks
![](images/5.png)
CNNs are biologically-inspired models inspired by research by D. H. Hubel and T. N. Wiesel. They proposed an explanation for the way in which mammals visually perceive the world around them using a layered architecture of neurons in the brain, and this in turn inspired engineers to attempt to develop similar pattern recognition mechanisms in computer vision.<br>
The architecture of deep convolutional neural networks was inspired by the ideas mentioned above

- local connections
- layering
- spatial invariance (shifting the input signal results in an equally shifted output signal. , most of us are able to recognize specific faces under a variety of conditions because we learn abstraction These abstractions are thus invariant to size, contrast, rotation, orientation

### Step 1 - Prepare a dataset of images
![](images/6.png)
- The image is matrix of RGB colours. Each of these colour channels have a range [0, 255] and such values for all three channels are considered while doing any operation. Thus the image in it’s entirety, constitutes a 3-dimensional structure called the Input Volume (255x255x3). 

### Step 2 - Convolution 

![alt text](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/more_images/Convolution_schematic.gif "Logo Title Text 1")

![alt text](http://xrds.acm.org/blog/wp-content/uploads/2016/06/Figure_2.png "Logo Title Text 1")

- A convolution is an orderly procedure where two sources of information are intertwined.

- A kernel (also called a filter) is a smaller-sized matrix in comparison to the input dimensions of the image, that consists of real valued entries.

- Kernels are then convolved with the input volume to obtain so-called ‘activation maps’ (also called feature maps).  
- Activation maps indicate ‘activated’ regions, i.e. regions where features specific to the kernel have been detected in the input. 

- The real values of the kernel matrix change with each learning iteration over the training set, indicating that the network is learning to identify which regions are of significance for extracting features from the data.

- We compute the dot product between the kernel and the input matrix. -The convolved value obtained by summing the resultant terms from the dot product forms a single entry in the activation matrix. 

- The patch selection is then slided (towards the right, or downwards when the boundary of the matrix is reached) by a certain amount called the ‘stride’ value, and the process is repeated till the entire input image has been processed. - The process is carried out for all colour channels.

- instead of connecting each neuron to all possible pixels, we specify a 2 dimensional region called the ‘receptive field[14]’ (say of size 5×5 units) extending to the entire depth of the input (5x5x3 for a 3 colour channel input), within which the encompassed pixels are fully connected to the neural network’s input layer. It’s over these small regions that the network layer cross-sections (each consisting of several neurons (called ‘depth columns’)) operate and produce the activation map. (reduces computational complexity)

![alt text](http://i.imgur.com/g4hRI6Z.png "Logo Title Text 1")
![alt text](http://i.imgur.com/tpQvMps.jpg "Logo Title Text 1")
![alt text](http://i.imgur.com/oyXkhHi.jpg "Logo Title Text 1")
![alt text](http://xrds.acm.org/blog/wp-content/uploads/2016/06/Figure_5.png "Logo Title Text 1")

Great resource on description of  convolution (discrete vs continous)  & the fourier transform

http://timdettmers.com/2015/03/26/convolution-deep-learning/


###  Step 3 - Pooling
![alt text](http://xrds.acm.org/blog/wp-content/uploads/2016/06/Figure_6.png "Logo Title Text 1")

- Pooling reducing the spatial dimensions (Width x Height) of the Input Volume for the next Convolutional Layer. It does not affect the depth dimension of the Volume.  
- The transformation is either performed by taking the maximum value from the values observable in the window (called ‘max pooling’), or by taking the average of the values. Max pooling has been favoured over others due to its better performance characteristics.
- also called downsampling

###  Step 4 - Normalization (ReLU in our case)

![alt text](http://xrds.acm.org/blog/wp-content/uploads/2016/06/CodeCogsEqn-3.png "Logo Title Text 1")

Normalization (keep the math from breaking by turning all negative numbers to 0)  (RELU) a stack of images becomes a stack of images with no negative values. 

Repeat Steps 2-4 several times. More, smaller images (feature maps created at every layer)

### Step 5 - Regularization 

- Dropout forces an artificial neural network to learn multiple independent representations of the same data by alternately randomly disabling neurons in the learning phase.
- Dropout is a vital feature in almost every state-of-the-art neural network implementation.
- To perform dropout on a layer, you randomly set some of the layer's values to 0 during forward propagation.

See [this](http://iamtrask.github.io/2015/07/28/dropout/)

![alt text](https://i.stack.imgur.com/CewjH.png "Logo Title Text 1")

###  Step 6 - Probability Conversion

At the very end of our network (the tail), we'll apply a softmax function to convert the outputs to probability values for each class. 

![alt text](https://1.bp.blogspot.com/-FHDU505euic/Vs1iJjXHG0I/AAAAAAABVKg/x4g0FHuz7_A/s1600/softmax.JPG "Logo Title Text 1")


###  Step 7 - Choose most likely label (max probability value) 

argmax(softmax_outputs)

These 7 steps are one forward pass through the network.

## So how do we learn the magic numbers? 

- We can learn features and weight values through backpropagation

![alt text](http://www.robots.ox.ac.uk/~vgg/practicals/cnn/images/cover.png "Logo Title Text 1")

![alt text](https://image.slidesharecdn.com/cnn-toupload-final-151117124948-lva1-app6892/95/convolutional-neural-networks-cnn-52-638.jpg?cb=1455889178 "Logo Title Text 1")

The other hyperparameters are set by humans and they are an active field of research (finding the optimal ones)

i.e -  number of neurons, number of features, size of features, poooling window size, window stride

## Good examples

<ol>
**<li> Image Classifier**
![](images/8.jpeg)
    <br><br>
<li> **Object Detection**
![](images/7.jpeg)
<br><br>

**<li> Image Captioning**
    ![](images/10.jpg)
<br><br>

**<li> CNN with Generative Adversarial Networks**
    ![](images/11.png)
<br><br>

**<li> Deep Fakes**
  ![](images/12.JPG)
    
</ol>
<br><br>

## Is CNN the ultimate Solution?
![](images/9.png)
<br><br>
**<div style="text-align:center"><font size="5">Capsule Network</font></div>**

<br><br>
## Some modern methods in Machine Intelligence
### [Numenta](https://numenta.org/)
<br>
![](images/14.jpeg)
<br>
**<div style="text-align:center"><font size="5">Using neuroscience to form the third generation of Artificial Intelligence</font></div>**
<br>
![](images/13.png)