# Introduction to CNNs

## 1. Introduction to convolutional neural networks?

### 1.1 Why?

#### 1.1.1 CNNs can deal better with large images (until now, images used were fairly small)

- Imagine an color image with 500 x 500 pixels, this means you would    end up having 500 x 500 x 3 = 750,000 input features, $(x_1,...,x_{750,000})$.
- next, imagine having 2000 hidden units in the first hidden layer. Then the matrix $w^{[1]}$ would have dimensions (2000 x 750,000), and will have 1.5 billion parameters. So it becomes a very high-dimensional problem!

#### 1.1.1 CNNs have certain features that identify patterns in images because of  "convolution operation"

- Dense layers learn global patterns in their input feature space

- Convolution layers learn local patterns, and this leads to the following interesting features:
    - Unlike with densely connected networks, when a convolutional neural network recognizes a patterns let's say, in the upper-right corner of a picture, it can recognize it anywhere else in a picture. 
    - Deeper convolutional neural networks can learn spatial hierarchies. A first layer will learn small local patterns, a second layer will learn larger patterns using features of the first layer patterns, etc. 
     

### 1.2 What are they used for?
- Image classification
- Object detection in images
- Picture neural style transfer

## 2. The convolution operation 

### 2.1 The basic convolution operation 

The idea: detect edges in your image. Typically, we'll detect vertical or horizontal edges. Let's look at what horizontal edge detection would look like!

![title](conv.png)

This is a simplified 5 x 5 pixel image (greyscale!). You use a so-called "filter" (denoted on the right) to perform a convolution operation. This particular filter operation will detect horizontal edges. The matrix in the left should have number in it (from 1-255, or let's assume we rescaled it to number 1-10). The output is a 3 x 3 matrix. (*This example is for computational clarity, no clear edges*)

What dimension would the output matrix have had if we had started from a 7 x 7 matrix? And a 64 x 64 matrix?

(*Then, create a new example with a clear edge and look at the output*)

In Keras, function for the convolution step is `Conv2D`

### 2.2 Padding

downsides of using filters in images:
- image shrinks with each convolution layer and you're throwing away information in each layer!
    - Starting from a 5 x 5 matrix, and using a 3 x 3 matrix, you end up with a 3 x 3 image. 
    - Starting from a 10 x 10 matrix, and using a 3 x 3 matrix, you end up with a 8 x 8 image. 
    - etc.
- pixels around the edges are used much less in the outputs because of the way that filters work.

Solution for both of these problems: pad your image before applying the convolution! just one layer of pixels around the edges preserves the image size when having a 3 x 3 filter. We can also use bigger filters, but generally the dimensions are odd!

Terminology:

- "Valid" - no padding
- "Same" - padding such that output is same as the input size

### 2.3 Strided convolutions

Step over 2 steps instead of 1 ("stride" = 2)

### 2.4 Convolutions on RGB images

Instead of 5 x 5 grayscale, look at a 7 x 7 RGB image, which boils down to having a 7 x 7 x 3 tensor. Then, you need to use a filter that has the third dimension equal to 3 as well, let's say, 3 x 3 x 3 (a 3D "cube"). the output is a 5 x 5 image. So, how does this work? multiply all the 27 number in the cube for each stride.

This allows us to detect, eg only horizontal edges in the blue channel (filter on the red and green channel all equal to 0). 

Then, in each layer, you can convolve with several 3D filters.
Then, you stack every output result together, and that way you end up having a 5 x 5 x (number of filters) shape.


If you think of it, the filter plays the same role as the w^{[1]} in our densely connected networks.

The advantage is, your image can be huge, the amount of parameters you have still only depends on how many filters you're using!


Imagine 20 (3 x 3 x 3) --> 20 * 27 + a bias for each filter (1* 20) = 560 parameters.

Notation:

- $f^{[l]}$ = size of the filter
- $p^{[l]}$ = padding
- $s^{[l]}$ = amount of stride
- $ n_c^{[l]}$ = number of filters
- filter: $f^{[l]}$ x $f^{[l]}$ x $ n_c^{[l-1]}$


- Input = $ n_h^{[l-1]} * n_w^{[l-1]} * n_c^{[l-1]} $
- Output = $ n_h^{[l]} * n_w^{[l]} * n_c^{[l]} $

Height and width are given by:

$n_h^{[l]}= \Bigr\lfloor\dfrac{n_h^{[l-1]}+2p^{[l]}-f^{[l]}}{s^{[l]}}+1\Bigr\rfloor$

$n_w^{[l]}= \Bigr\lfloor\dfrac{n_w^{[l-1]}+2p^{[l]}-f^{[l]}}{s^{[l]}}+1\Bigr\rfloor$


Activations: $a^{[l]}$ is of dimension $ n_h^{[l]} * n_w^{[l]} * n_c^{[l]} $

**example: walk through a network genre https://www.coursera.org/learn/convolutional-neural-networks/lecture/A9lXL/simple-convolutional-network-example, or (after pooling) https://www.coursera.org/learn/convolutional-neural-networks/lecture/uRYL1/cnn-example**



## 3. Pooling layer

Max pooling example break 
Hyperparameters:
- f (filter size)
- S (stride)

Common: f=2, s=2 and f=3, s=2, this shrinks the size of the representations.
If a feature is detected anywhere in the quadrants, a high number will appear. so max pooling preserves this feature.

Except for hyperparameters, no parameters in max pooling layers!

Max pooling on a 3d input: just do the same for several channels.

another way of pooling = *average pooling*

We'll treat Convolutional layers plus pooling layers as 1 layer.

## 4. fully connected layers in your CNN.

Add fully connected layers towards the end of the network.


# Extra reading

https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html

https://datascience.stackexchange.com/questions/16463/what-is-are-the-default-filters-used-by-keras-convolution2d