# Convolutional Networks

## A basic problem

Suppose you have a 1,000x1,000 image; the first layer of a dense neural network would be a 3 billion element matrix. this is too big.

## Let's detect edges instead

Take a 6x6 image and perform a convolution on it.
```
[[8, 4, 9, 3, 3, 1],
[6, 8, 0, 0, 2, 4],
[1, 3, 3, 2, 4, 4],
[2, 1, 2, 8, 5, 8],
[9, 5, 5, 8, 1, 8],
[4, 7, 3, 8, 4, 2]]
```
 An edge detection filter of the form
```
[[ 1.,  0., -1.],
[ 1.,  0., -1.],
[ 1.,  0., -1.]]
```
Take the sum of the elementwise product of this filter imposed on each 3x3 section of the grid. The Convolutional output will be 4x4 with each elementwise product. Hence, the first cell of our convolution output will be 1*(8+6+1)+0*(4+8+3)+-1*(9+0+3).


## How does this detect edges?

If you look at an image with pixels like this:
```
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
[10., 10., 10.,  0.,  0.,  0.],
```
The middle 2 columns of the convolutional out put will be 30 and 0 on the edges. (You can verify this yourself.) The filter essentailly says "there's an edge where we have dark pixels on one side and light ones on another." Note, that the numbers would be negative if right side were 10s and the left were ones.

### 1,0,-1 is arbitrary

We could use a different matrix for our edge detector; it will have slightly different properties, which might be better or worse for our applicaiton. For example a Sobel filter would be:
```
[[ 1  0 -1]
 [ 2  0 -2]
 [ 1  0 -1]]
```
Which adds emphasis on the middle of the edge. Another possibility is Scharr:
```
[[  3   0  -3]
 [ 10   0 -10]
 [  3   0  -3]]
```
Which places more emphsis on the middle. 

## The **best** part

Is that the value of the filter/kernel is a learnable parameter that can be trained by gradient descent.

## Padding

We need to pad images because corners are used just once and edges 4 times while middle pixels get used 16 times. Also, a convolition reduces the dimensionality of our image. If we have a very deep network, we lose pixels.

*Valid*: no Padding

*Same*: output has dimension of input; this implies that padding = (f-1)/2, where f is the dimension of the filter.

## Striding

Stride is the number of steps taken before "overlaying" the filter. 

Once we adding padding (p) and stride (s) with filter (f), the dimension of the output is the floor of...
$$dim = \frac{n + 2p - f}{s}+1$$
in each dimension.

### 3 dimensions

Images are HxWxC dimensions, where C is the RGB chanel (usuall with 0-255 values).

The 3D convolution filter is a 3D volume. To detect a vertical edge (light to dark):
```
[[[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]

 [[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]

 [[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]]
  ```
  To detect red edges:
  ```
[[[ 1.  0. -1.]
  [ 1.  0. -1.]
  [ 1.  0. -1.]]

 [[ 0.  0. -0.]
  [ 0.  0. -0.]
  [ 0.  0. -0.]]

 [[ 0.  0. -0.]
  [ 0.  0. -0.]
  [ 0.  0. -0.]]]
  ```
  The convolution operation now overlays a _cubic array_ on the image, taking elemnetwise products and summing to get the output. N.B. _The output is 2D._ 

### Multiple Filters

When we apply multiple filters, we stack the outputs of the filters. The shape will be 

$$dim = \frac{n + 2p - f}{s}+1, \frac{n + 2p - f}{s}+1, n_c$$

where $n_c$ is the number of channels.

When we train our model, we train each filter with a bias and an activation separately. 

 

If we have 10 filters (3x3) then we have 28 x 10 paramters to learn. Each filter has 3x3 (=27 values) plus a bias correction to be learned. 

Notice that this is true regardless of the number of input features.

### Notation Summary
For a convoluation layer l:


$f^{[l]}$ = filter size

$p^{[l]}$ = padding

$s^{[l]}$ = stride

$n_c^{[l]}$ = # of filters

Input: $n_H^{[l-1]}, n_W^{[l-1]}, n_c^{[l-1]}$

Output: $n_H^{[l]}, n_W^{[l]}, n_c^{[l]}$

$$

## An example

We can build a deep ConvNet in which there are several layers of convolution. Such a network might work like this:
- Input Layer: A 39x39x3 layer.
- Convolution 1: 3x3, stride 1, padding 0, 10 filters
- Layer 1: 37x37x10 (this is the output from the above input + filter parameters.)
- Convolution 2: 5x5, stride 2, p = 0, 20 filters.
- Layer 2: 17x17x20
- Conv 3: 5x5, stride 2, p=0, 40 filters
- Layer 4: 7x7x40
- Output: softmax/sigmoid layer.

The trends here are common in computer vision: the layers tend to have shrinking dimensions with an increasing numner of filters and (hence) layer channels.

### Layers in a ConvNet

Usually, we find three types of layers in a Convolutional Network:
- Convolution (CONV), which can be the only type
- Pooling
- Fully Connected


## Pooling

A pooling layer reduces the height and width dimensions by applying a filter in a simple way. A filter is applied by steps (as with a conv filter) and either the maximum value or the average value is the output. So, for example, a 4x4x2 layer is reduced to a 2x2x2 layer when we have a filter size of 2 and a step of 2; each index of the output would be a maximum in the 2x2 region it derives from.

There are no paramters to learn with pooling. 

# leNet-5-ish

This is an example of a "typical" convolutional neural network:
Inputs: 32x32x3
Layer 1: CONV1, filter to 28x28x6, POOL1 pool to 14x14x6
Layer 2: CONV2 filter to 10x10x16, POOL2 pool to 5x5x16
Layer 3: Flatten to 400
Layer 4: to Dense 120
Layer 5: to Dense 84
Output: Softmax 10

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D

## Similar to leNet5 example from Coursera

inputs = keras.Input(shape=(28,28,1))
x = keras.layers.experimental.preprocessing.RandomRotation(.06)(inputs)
x = Conv2D(filters=8, kernel_size=5, activation='relu', name='CONV1')(x)
x = MaxPooling2D(name='POOL1')(x)
x = Conv2D(filters=16, kernel_size=5, activation='relu', name='CONV2')(x)
x = MaxPooling2D(name='POOL2')(x)
x = keras.layers.Flatten()(x)
x = Dense(120, activation='relu', name='FC3')(x)
x = Dense(84, activation='relu', name='FC4')(x)
outputs = Dense(10, activation='softmax', name='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

In [3]:
df = pd.read_csv('../tf_keras/data/train.csv')

In [4]:
X = df.drop('label', axis=1).to_numpy().reshape((-1,28,28,1))
y = df['label'].to_numpy()

In [5]:
model.compile(optimizer='Adam', 
    loss='sparse_categorical_crossentropy')

In [6]:
model.summary()

Model: "functional_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
random_rotation_1 (RandomRot (None, 28, 28, 1)         0         
_________________________________________________________________
CONV1 (Conv2D)               (None, 24, 24, 8)         208       
_________________________________________________________________
POOL1 (MaxPooling2D)         (None, 12, 12, 8)         0         
_________________________________________________________________
CONV2 (Conv2D)               (None, 8, 8, 16)          3216      
_________________________________________________________________
POOL2 (MaxPooling2D)         (None, 4, 4, 16)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 256)              

In [7]:
model.fit(X, y,
    validation_split=.1,
    batch_size=128,
    epochs=1)



<tensorflow.python.keras.callbacks.History at 0x21391f91148>

In [12]:
train_ds = tf.data.Dataset.from_tensor_slices((X,y))
model.fit(train_ds.batch(512))



<tensorflow.python.keras.callbacks.History at 0x214a615a7c8>