# Course 4: Week 1

## # Convolution

[What is an intuitive explanation for convolution?](https://www.quora.com/What-is-an-intuitive-explanation-for-convolution)

[Intuitively Understanding Convolutions for Deep Learning](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1)

## ## Edge detection

We'll apply a filter, running over the image matrix.

The following filter (kernel) is used for vertical edge detection:

![](media/ed.png)

(For each iteration, this could be represented as: $line_{from\_picture} * column_{from\_filter}.T$)

Why it works:

![](media/ed2.png)

[...] "a vertical edge is where there are bright pixels on the left, you do not care that much what is in the middle and dark pixels on the right."

![](media/ed3.png)

___
**Example:**

![](media/ed4.png)

___
**Example:**

![](media/ed5.png)

## ## Padding

Default convolving will lead to lose information on the image's border/edges.

![](media/padd.png)

So, we could add/use padding. Which will increase the relevance of the image's borders/edges pixels and preserve the output dimmension equals to the input's.

![](media/padd2.png)

___
Forms of implementing padding:

![](media/padd3.png)

$padding = \frac{f - 1}{2}$

## ## Strided convolutions

*(Note that the **default** stride value is `1`)*

Works as a `step`, $(i,j)$-wise.

With a padding of $p$ and a stride of $s$:


![](media/stri.png)

$i.e.:$

______

![](media/stri3.png)

___

*(For non-integer values, round down (floor) the value.)*

The kernel product only happens when the multiplication is possible:

![](media/stri2.png)

___ 
**Math:**

He then explains about this twist and turn,

![](media/stri4.png)

<div class="alert alert-danger">
    <b>Disclaimer:</b>
    <img src='media/stri5.png'></img>
</div>

<div class="alert alert-info">
Achieving this result is just a matter of performin the following:
</div>

In [153]:
import numpy as np

v = np.array([[3, 4, 5], [1, 0, 2], [-1, 9, 7]])

print('input:\n{}'.format(v))

# flipped identity matrix
anti_id = np.array([[0, 0, 1],[0, 1, 0],[1, 0, 0]])

# flip horizontally: v * anti_id
print('\nflipping horizontally: \033[1mv * anti_id\033[0m')
first_step = np.dot(v, anti_id)

# flip vertically: anti_id * v
print('\nflipping vertically: \033[1manti_id * v\033[0m')
second_step = np.dot(anti_id, first_step)

print('\noutput:\n{}'.format(second_step))

input:
[[ 3  4  5]
 [ 1  0  2]
 [-1  9  7]]

flipping horizontally: [1mv * anti_id[0m

flipping vertically: [1manti_id * v[0m

output:
[[ 7  9 -1]
 [ 2  0  1]
 [ 5  4  3]]


___

### ### Convolutions over volume (RGB)


You'll apply convolutions, likewise, on each layer of the rgb channels.

![](media/rgb.png)

We could apply filters that only work on a single channel, zeroing the other layers.

Convolving on volumes will allow us to operate on RGB pictures and detect more complex edges, like horizontal edges, or angled edges.

![](media/rgb2.png)

![](media/rgb3.png)

![](media/rgb4.png)

![](media/rgb5.png)

![](media/rgb6.png)

![](media/rgb7.png)

## #### l-convolutional layer of a convolution network

#### If layer '$l$' is a convolution layer:


$f^{[l]}$ = filter size

$p^{[l]}$ = padding

$s^{[l]}$ = stride

$n_c^{[l]}$ = number of filters

___

Note that the size of each filter will be,

$f^{[l]}$ x $f^{[l]}$ x numberOfChannels

i.e.:

$f^{[l]}$ x $f^{[l]}$ x $n_{channels}^{[l-1]}$
___

The weights (all filters put together) have dimension:

$f^{[l]}$ x $f^{[l]}$ x $n_{channels}^{[l-1]}$ x $n_{channels}^{[l]}$
___

Bias will be of shape

$1$ x $n_c^{[l]}$

$e.g.: (1, 1, 1,$ ...$, n_c^{[l]})$

___

Input dimension:

$n_{height}^{[l-1]}$ x $n_{width}^{[l-1]}$ x $n_{channels}^{[l-1]}$
___

Output dimension:

$n_{height}^{[l]}$ x $n_{width}^{[l]}$ x $n_{channels}^{[l]}$

**where**,

$n^{[l]} = floor([\frac{n^{[l]}+2*p^{[l]}-f}{s^{[l]}} +1]$)

if $height \neq width$, then

$n_{height}^{[l]} = floor([\frac{n_{height}^{[l]}+2*p^{[l]}-f}{s^{[l]}} +1]$)

**and**,

$n_{width}^{[l]} = floor([\frac{n_{width}^{[l]}+2*p^{[l]}-f}{s^{[l]}}+1]$)

___

The activation of the $l$-Layer for a single example:

$a^{[l]} = n_{height}^{[l]}$ x $n_{width}^{[l]}$ x $n_{channels}^{[l]}$
___

Whereas the activation of the entire layer $l$ for $m$ training examples:

$A^{[l]} = n_{height}^{[l]}$ x $n_{width}^{[l]}$ x $n_{channels}^{[l]}$ x $m$
___

**Example:**

![](media/ex.png)

## ## Max pooling

Select the highest value from an area (strided):

![](media/mp.png)

If this/these features are detected anywhere in this filter, then keep a high number to represent it, else, maybe this feature doesn't exist.

*(Note that this layer has no parameters to pass to back propagation)*

![](media/mp2.png)


We could also use `average pooling` instead as an alternative to max pooling.
___

**Example:**

![](media/mp3.png)

![](media/mp4.png)

___

A common architecture that we may find is:

`conv -> pool -> conv -> pool -> fc -> fc -> softmax`

## ## Why convolutions?

[...] "I think there are two main advantages of convolutional layers over just using fully connected layers. And the advantages are parameter sharing and sparsity of connections"

- For starters, we save on the size of the $W^{[l]}$'s matrix:

![](media/conv.png)

- Parameter sharing:

![](media/conv2.png)

[...] "And maybe you do have a dataset where you have the upper left-hand corner and lower right-hand corner have different distributions, so, they maybe look a little bit different but they might be similar enough, they're sharing feature detectors all across the image, works just fine."

- Sparsity of connections:

![](media/conv3.png)

![](media/conv4.png)