# Convolutional Networks

## A basic problem

Suppose you have a 1,000x1,000 image; the first layer of a dense neural network would be a 3 billion element matrix. this is too big.

## Let's detect edges instead

Take a 6x6 image and perform a convolution on it.
```
[[8, 4, 9, 3, 3, 1],
[6, 8, 0, 0, 2, 4],
[1, 3, 3, 2, 4, 4],
[2, 1, 2, 8, 5, 8],
[9, 5, 5, 8, 1, 8],
[4, 7, 3, 8, 4, 2]]
```
 An edge detection filter of the form
```
[[ 1.,  0., -1.],
[ 1.,  0., -1.],
[ 1.,  0., -1.]]
```
Take the sum of the elementwise product of this filter imposed on each 3x3 section of the grid. The Convolutional output will be 4x4 with each elementwise product. Hence, the first cell of our convolution output will be 1*(8+6+1)+0*(4+8+3)+-1*(9+0+3).


## How does this detect edges?

If you look at an image with pixels like this:
```
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
```
The middle 2 columns of the convolutional out put will be 30 and 0 on the edges. (You can verify this yourself.) The filter essentailly says "there's an edge where we have dark pixels on one side and light ones on another." Note, that the numbers would be negative if right side were 10s and the left were ones.

### 1,0,-1 is arbitrary

We could use a different matrix for our edge detector; it will have slightly different properties, which might be better or worse for our applicaiton. For example a Sobel filter would be:
```
[[ 1  0 -1]
 [ 2  0 -2]
 [ 1  0 -1]]
```
Which adds emphasis on the middle of the edge. Another possibility is Scharr:
```
[[  3   0  -3]
 [ 10   0 -10]
 [  3   0  -3]]
```
Which places more emphsis on the middle. 

## The **best** part

Is that the value of the filter/kernel is a learnable parameter that can be trained by gradient descent.

## Padding

We need to pad images because corners are used just once and edges 4 times while middle pixels get used 16 times. Also, a convolition reduces the dimensionality of our image. If we have a very deep network, we lose pixels.

*Valid*: no Padding

*Same*: output has dimension of input; this implies that padding = (f-1)/2, where f is the dimension of the filter.

## Striding

Stride is the number of steps taken before "overlaying" the filter. 

Once we adding padding (p) and stride (s) with filter (f), the dimension of the output is the floor of...
$$dim = \frac{n + 2p - f}{s}+1$$
in each dimension.

### 3 dimensions

Images are HxWxC dimensions, where C is the RGB chanel (usuall with 0-255 values).

The 3D convolution filter is a 3D volume. To detect a vertical edge (light to dark):
```
[[[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]

 [[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]

 [[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]]
  ```
  To detect red edges:
  ```
[[[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]

 [[ 0.  0. -0.]
  [ 0.  0. -0.]
  [ 0.  0. -0.]]

 [[ 0.  0. -0.]
  [ 0.  0. -0.]
  [ 0.  0. -0.]]]
  ```
  The convolution operation now overlays a _cubic array_ on the image, taking elemnetwise products and summing to get the output. N.B. _The output is 2D._ 

### Multiple Filters

When we apply multiple filters, we stack the outputs of the filters. The shape will be 

$$dim = \frac{n + 2p - f}{s}+1, \frac{n + 2p - f}{s}+1, n_c$$

where $n_c$ is the number of channels.

 

In [87]:
import numpy as np

x = np.ones((3,3))
print(x)
x = x[:,:,None]*[1,0,-1]
x[1:] = x[1:]*0
print(x)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]

 [[ 0.  0. -0.]
  [ 0.  0. -0.]
  [ 0.  0. -0.]]

 [[ 0.  0. -0.]
  [ 0.  0. -0.]
  [ 0.  0. -0.]]]


In [68]:
x = np.array([[1,0,-1]])
print(x.shape)
sobel = np.array([3,10,3])
sobel = np.outer(sobel,x)
print(sobel.shape)
#np.ones(3)*np.ones([3,1])

(1, 3)
(3, 3)


In [100]:
x = np.ones((3,3))
y = np.zeros((3,3))
z = np.stack((x, y), axis=-1)

In [102]:
x == z[:,:,0]

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])