## Lighthouse Labs - Synaptive Medical

### W7D7 Deep Learning and Convolutional Neural Networks (CNNs)

Instructor: Socorro Dominguez  
January 15, 2021

**Agenda:**
- What kind of layers CNN have?
    - Convolution
    - Pooling
    - Flattening
    - Full connection
     
- Case Studies of different CNN algorithms. 

- CNN tutorial

## What is a CNN?


![img](img/CNNnews.png)

.... not that CNN....

This CNN:

![img](img/CNN.png)

CNNs most common task is Computer Vision.

* Sports:
    * Player Tracking
    * Ball Tracking

* Health and Medicine:
    * Cancer / Tumor Detection
    * Cell Classification
    * Movement Analysis for neurological and musculoskeletal diseases

* Agriculture and farming:
    * Plant Recognition.
    * Farm Automation
    * Animal Monitoring

* Transportation,  Oiling and mining, many others!

Convolutional Neural Networks are a type of Deep Learning Algorithm.

1. CNNs take an image as an input.
2. CNNs learn the features of the image through filters. 
3. They identify important objects present in the image, allowing them to learn to discern one image from the other.

In our walkthrough, the CNN will learn specific features of cats that differentiate them from the dogs. 
Then, when it is provided input of cats and dogs, it can differentiate between the two. 

! During cold-start, the filters "require" hand engineering but with progress in training, they are able to adapt to the learned features and develop filters of their own. CNNs are continuously evolving.

![img](img/robot.png)

* CNNs output is usually probabilities of being something. 

* A CNN is a special tries to reduce the number of parameters in a deep neural network with many units without losing too much in the quality of the model. 

* In images, pixels that are close to one another usually have the same type of information: sky, water, leaves, etc. 

* The exception from the rule are **the edges**: the parts of an image where two different objects “touch” one another.

* The neural network is trained to recognize regions of the same information as well as the edges. This would allow to predict the object represented in the image. 

* **Example:** If the neural network detected multiple skin regions and edges that look like parts of an oval with skin-like tone on the inside and bluish tone on the outside, then it is likely that it’s a face on a sky background. 
    * If the goal is to detect people on pictures, the neural network will most likely succeed in predicting a person in this picture.

**The most important information in the image is local**

How does a CNN work?
- We split the image into square patches using a moving window approach. 
- We can train multiple smaller regression models at once, each regression model receives a square patch as input.
    - We train the 'filters'.
- Each regression model's work is to learn to detect a specific kind of pattern in the input patch. 

For example, one small regression model will learn to detect the sky; another one will detect the grass, the third one will detect edges of a building.

* CNNs perform similarly to an ordinary fully connected Neural Networks. 
    * They have weights and biases that are learned from the input and biases. 
    * Every neuron connected in the network receives an input and performs a dot product on it. 
    * There is a function at the end that consists of scores that we obtain from the various layers. 
    * They have a loss function at the end to evaluate performance. 

![img](img/anatomyofcnn.png)

What seems different

![](https://upload.wikimedia.org/wikipedia/commons/4/46/Colored_neural_network.svg)


The first architecture is more practical manner. 

There is no linear arrangement of neurons. CNN's neurons have a structure of three dimensions – Length, Width, and Height. 

For instance, Dogs and Cats images are dimensions 32x32x3 and the final output will have a singular vector of the images of dimensions 1x1x2.

![](img/3channels.png)

The goal:  Reduce the images into an easier form to process, without losing features which are critical for getting a good prediction.

**ARCHITECTURE**

* INPUT – A typical image dataset will hold images if dimensions l x w x d, where the depth denotes the number of channels (RGB) in the image.
  
  
* CONV layer - computes the dot product between the weights of the neuron and the region of the input image to which share a connection. An example would be 32x32x12 denoting the 12 filters which the neural network makes use of.
  
  
* The third layer consists of RELU which (activation function) to our resultant dot product. 
  
  
* The fourth layer is a POOLing layer, it downsamples the spatial dimensions of the image (width and height).
  
  
* The fully connected layer will compute the class score, leading to a final volume of 1 x 1 x n; where n is the number of categories to classify.

The convolutional component comprises the learnable filter.  

* To detect some pattern, a small regression model has to learn the parameters of a matrix F (for “filter”) of size p × p, where p is the size of a patch.


* If we had for input a black and white image, 1 would represent the black and 0 would represent the white pixels. 
* Assume 3x3 pixels patches (p = 3). Some patch could then look like the following matrix P (for “patch”):

$$P = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 1 & 1 \\ 0 & 1 & 0 \end{bmatrix}$$

The previous patch represents a pattern that looks like a cross. 

The small regression model that will detect such patterns (and only them) would need to learn a 3 by 3 parameter matrix F where parameters at positions corresponding to the 1s in the input patch would be positive numbers, while the parameters in positions corresponding to 0s would be close to zero. 

If we calculate the convolution of matrices P and F, the value we obtain is higher the more similar F is to P. To illustrate the convolution of two matrices, assume that F looks like this:

$$F = \begin{bmatrix} 0 & 2 & 3 \\ 2 & 4 & 1 \\ 0 & 3 & 0 \end{bmatrix}$$

Then convolution operator is only defined for matrices that have the same number of rows and columns. For our matrices of P and F it’s calculated as illustrated below:

![convolution](img/02_Convolution.png)

If our patch had a different pattern, then the convolution with F would give a different result. 

*The more the patch “looks” like the filter, the higher the value of the convolution operation is*

For convenience, there’s also a bias parameter b associated with each filter F which is added to the result of a
convolution before applying the nonlinearity (activation function).

One layer of a CNN consists of multiple convolution filters (each with its own bias parameter).

Each filter of the first layer slides — or convolves — across the input image, left to right, top to bottom, and convolution is computed at each iteration.

Like this:

![](https://miro.medium.com/max/1400/1*ciDgQEjViWLnCbmX-EeSrA.gif)

If the CNN has one convolution layer following another convolution layer, then the subsequent layer *l + 1* treats the output of the preceding layer *l* as a collection of size *l* image matrices.

**Pooling**

This is a technique very often used in CNNs. Pooling works in a way very similar to convolution, as a filter applied using amoving window approach. 

Instead of applying a trainable filter to an input matrix, a pooling layer applies a fixed operator, usually either max or average. 

Pooling's hyperparameters are also the size of the filter and the stride. 

Usually, a pooling layer follows a convolution layer, and it gets the output of convolution as input. 

Pooling does not have parameters to learn. It also contributes to the increased accuracy of the model and improves the speed of training by reducing the number of parameters of the neural network.
![pooling](https://miro.medium.com/max/792/1*uoWYsCV5vBU8SHFPAPao-w.gif)

### Why ReLU as Normalization Technique

After getting the new convolved matrix, anything negative is turned to zero.

This removes unnecessary noise. 

Hyperparameters:
* **Stride** Choose how big you want the step to be for the pooling, conv layers
* **Padding** Add zeros around the image

## What a CNN looks like after all?

![img](img/05_FullCNN.png)

![img](img/mnistcnn.png)

**Implementing Example in Keras**   

We'll move on to the famous the MNIST digits -- a classic dataset for deep learning. The MNIST data set is  bigger than the digits dataset built into sklearn: the images are larger ($28\times28$ instead of $8\times8$) and there are more of them ($70000$ insetad of $1797$). In total, we're dealing with $70000\times28\times28\approx 55$ million training pixels instead of $1797\times8\times8\approx80000$ training pixels (about $500$ times more data). 

The following code loads the MNIST dataset. The first time you run it, the data will be downloaded. In future times, it will use the local version.

*Close presentation view*


Also check FMNIST Fashion Example

Check out this video to understand more: https://www.youtube.com/watch?v=FmpDIaiMIeA&feature=emb_title