# Convolutional Layer

Convolutional layer is a feedforward layer that satisfies the following additional properties:
* sparse connectivity - each neuron is connected with every channel of a small spatial region of the input
* shared parameters - each neuron within layer uses the same parameters

This notebook will not cover details of convolution arithmetic.
Instead it focuses on Tensorflow's implementation and practical examples.
For intuition behind convolutional operation, refer to the lecture slides.
For mathematical details of convolutional operation, an excellent source would be:
https://arxiv.org/abs/1603.07285

In convolutional layers, it's very important to get a good grasp on the dimensions of inputs, filters, and outputs. To discuss dimensionality, we will use the following conventions in this notebook:
* $ n^{[l]}_h $ - activation height of layer $l$, or of input image if $l = 0$
* $ n^{[l]}_w $ - activation width of layer $l$, or of input image if $l = 0$
* $ n^{[l]}_c $ - activation depth of layer $l$, or of input image if $l = 0$
* $ f^{[l]} $ - height and width of a filter in layer $l$ (typically, the same across both dimensions}

# Convolution (1 input channel, 1 output channel)

The key to understanding the relationship between dimensions of the input and of the output in a convolutional layer is to consider the basic case with 1 input and 1 output channels. Given input of shape $ n^{[l-1]}_h \times n^{[l-1]}_w $ and filter of shape $ f^{[l]} \times f^{[l]} $, the shape of the output of a convolution with padding $p$ and stride $s$: 

$ \text{Convolution}\big( n^{[l-1]}_h \times n^{[l-1]}_w, f^{[l]} \times f^{[l]} \big) \rightarrow n^{[l]}_h \times n^{[l]}_w $

$ n^{[l]}_h = \big\lfloor \frac{n^{[l-1]}_h + p^{[l]} - f^{[l]}}{s^{[l]}} + 1 \big\rfloor $

$ n^{[l]}_w = \big\lfloor \frac{n^{[l-1]}_w + p^{[l]} - f^{[l]}}{s^{[l]}} + 1 \big\rfloor $

In practive, padding is often specified as either valid or same. Valid padding simply means no padding at all ($p$ = 0), while same padding means as much padding as needed to keep spatial dimensions of the input and of the ouput the same. The exact size of same padding depends on the stride and the spatial dimensions of the input and of the filter. Typically, if the filter is larger, then more padding is needed to keep the spatial dimensions of the output equal to that of the input. Let's see an example of this calculation with Tensorflow.

In [1]:
# Run this cell to configure Tensorflow to use your GPU
import tensorflow as tf
for gpu in tf.config.experimental.list_physical_devices('GPU'):
    print(gpu)
    tf.config.experimental.set_memory_growth(gpu, True)

PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')


In [2]:
import tensorflow as tf

?tf.nn.conv2d

[0;31mSignature:[0m
[0mtf[0m[0;34m.[0m[0mnn[0m[0;34m.[0m[0mconv2d[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0minput[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfilters[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstrides[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpadding[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdata_format[0m[0;34m=[0m[0;34m'NHWC'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdilations[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Computes a 2-D convolution given 4-D `input` and `filters` tensors.

Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
and a filter / kernel tensor of shape
`[filter_height, filter_width, in_channels, out_channels]`, this op
performs the following:

1. Flattens the filter to a 2-D matrix with shape
   `[filter_height * filter_width * i

In [3]:
import tensorflow as tf
import numpy as np

# A simple wrapper function around Tensorflow convolution operation
# that takes a single input matrix, and a single filter matrix,
# and uses same padding and 1-stride.
def conv_simple(input_mat, filter_mat):
    # Desired input shape = batch, height, width, channel
    # so we need to add 0th, and 3rd axes
    input_mat = np.expand_dims(input_mat, 0)
    input_mat = np.expand_dims(input_mat, 3)
    input_tn = tf.constant(input_mat, dtype=tf.float32)
    
    # Desired filter shape = height, width, in, out
    # so we need to add 2nd, and 3rd axes
    filter_mat = np.expand_dims(filter_mat, 2)
    filter_mat = np.expand_dims(filter_mat, 3)
    filter_tn = tf.constant(filter_mat, dtype=tf.float32)

    output_tn = tf.nn.conv2d(
        input_tn, 
        strides=[1, 1, 1, 1], 
        padding='SAME', 
        filters=filter_tn
    )
    
    return output_tn.numpy()

In [4]:
import pandas as pd 
    
input_np = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

filter_np = [
    [10, 20],
    [30, 40]
]

output = conv_simple(input_np, filter_np)
pd.DataFrame(output[0, :, :, 0]).astype(int)

Unnamed: 0,0,1,2
0,370,470,210
1,670,770,330
2,230,260,90


# Convolution (many input channels, 1 output channel)

In the case of many input channels, the output dimensions are not affected, because the number of channels in the output depends on the number of channels in the filter. Given input of shape $n^{[l-1]}_h \times n^{[l-1]}_w \times n^{[l-1]}_c $ and filter of shape $f^{[l]} \times f^{[l]} \times n^{[l-1]}_c$, the shape of the output of a convolution with padding $p$ and stride $s$: 

$
\text{Convolution} \big( 
    n^{[l-1]}_h \times n^{[l-1]}_w \times n^{[l-1]}_c, 
    f^{[l]} \times f^{[l]} \times n^{[l-1]}_c
\big) 
\rightarrow n^{[l]}_h \times n^{[l]}_w 
$

$ n^{[l]}_h = \big\lfloor \frac{n^{[l-1]}_h + p^{[l]} - f^{[l]}}{s^{[l]}} + 1 \big\rfloor $

$ n^{[l]}_w = \big\lfloor \frac{n^{[l-1]}_w + p^{[l]} - f^{[l]}}{s^{[l]}} + 1 \big\rfloor $

Note that the third dimension of the filter has to match the number of channels in the input. When the filter is applied to the input, the input's channels are matched with the filter's channels. Products of corresponding spatial locations are added together across space and channels, which is why the output is still flat.

## Convolution (many input channels, many output channels)

When both input and output has many channels, the calculations are pretty similar but we calculate multiple independent activation maps, which gives depth to the activation tensor. Given input of shape $n^{[l-1]}_h \times n^{[l-1]}_w \times n^{[l-1]}_c $ and filter of shape $f^{[l]} \times f^{[l]} \times n^{[l-1]}_c \times n^{[l]}_c$, the shape of the output of a convolution with padding $p$ and stride $s$: 

$
\text{Convolution} \big( 
    n^{[l-1]}_h \times n^{[l-1]}_w \times n^{[l-1]}_c, 
    f^{[l]} \times f^{[l]} \times n^{[l-1]}_c \times n^{[l]}_c
\big) 
\rightarrow n^{[l]}_h \times n^{[l]}_w \times n^{[l]}_c
$

$ n^{[l]}_h = \big\lfloor \frac{n^{[l-1]}_h + p^{[l]} - f^{[l]}}{s^{[l]}} + 1 \big\rfloor $

$ n^{[l]}_w = \big\lfloor \frac{n^{[l-1]}_w + p^{[l]} - f^{[l]}}{s^{[l]}} + 1 \big\rfloor $

Note that the third dimension of the filter has to match the number of channels in the input,
while the fourth dimension of the filter matches the number of channels in the output. 
Spatial dimensions (e.g. the first and the second dimensions) of the output are not affected by the number of channels.
When applying the filter to the input, each activation map is calculated independently and then stacked together to produce activation volume. Simply stated, the fourth dimensions of the filter describes a collection of independent filters.