In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

In [2]:
%run beautify_plots.py

In [3]:
# My standard magic !  You will see this in almost all my notebooks.

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Reload all modules imported with %aimport
%load_ext autoreload
%autoreload 1

%matplotlib inline

In [4]:
import pandas as pd
import numpy as plt

import matplotlib.pyplot as plt

import os 


# Notation for the CNN layer



Layer $\ll$ in a Sequential NN transforms transforms input $\y_{(\ll-1)}$ to output $y_\llp$
- $\y_\llp$ is called a *feature map*, for all layers $\ll$
    - for each location in $\y_{(\ll-1)}$
    - it measures the intensity of the pattern match when the pattern is centered at that location 
    
So we write the input as $\y_{(\ll-1)}$ rather than the $\x$ we had used previously.

The size of all quantities in the convolution can vary by layer
- so we add a parenthesized subscript to indicate the layer

We write
- the kernel size as $f_\llp$ (can vary by layer) rather than the $f$ used previously
- the collection of kernels for layer $\ll$ as $\W_\llp$

In general a layer $\ll$ output $\y_\llp$ will have
- $N_\llp \gt 0$ non-feature dimensions
    - non-feature dimension $i$ has length (number of indices) $d_{\llp,i}$  indices
        - for dimensions $0 \le i \lt N_\llp$
    - the set of indexes in dimension $i$ is written as $D_i$
        - usually equal to $0, \ldots, d_{\llp,i}$
- one feature dimension

A CNN Layer $\ll$
- preserves the non-feature dimensions (when padding is used)
$$
\begin{array} \\
N_{(\ll-1)} & = & N_\llp \\
d_{(\ll-1),i} & = & d_{\llp,i} & 0 \le i \lt N_{(\ll-1)} \\
\end{array}
$$
- changes the length of the feature dimension
    - from $n_{(\ll-1)}$ to $n_\llp$

Thus the shape of the input $\y_{(\ll-1)}$ and $\y_\llp$ may only differ in the length of the feature dimension
- provided padding is used
    - in the absence of padding: $\lfloor \frac{f_\llp}{2} \rfloor$ locations are lost at each boundary

Thus the CNN layer $\ll$

$$
\begin{array}\\
|| \y_{(\ll-1)} || & = & (d_{(\ll-1),0} \times d_{(\ll-1),1} \times \ldots d_{(\ll-1), N_{(\ll-1)}}, & \mathbf{n_{(\ll-1)}} ) \\
|| \y_\llp || &  = & (d_{(\ll-1),0} \times d_{(\ll-1),1} \times \ldots d_{(\ll-1),N_{(\ll-1)}},  &\mathbf{n_\llp} )
\end{array}
$$


We write 
$$\y_{\llp, \idxb, j}$$ 
to denote feature $j$ of layer $\ll$ at non-feature dimension location $\idxb$

## Channel Last/First

We have adopted the convention of using the final dimension as the feature dimension.
- This is called *channel last* notation.

Alternatively: one could adopt a convention of the first channel being the feature dimension.
- This is called *channel first* notation.

When using a programming API: make sure you know which notation is the default
- Channel last is the default for TensorFlow, but other toolkits may use channel first.


## Kernel, Filter

There is one pattern per output feature.

A pattern is also called a *kernel*.

The kernels of layer $\ll$ are just the weights of the layer.

The vector $\W_{\llp,1}$ above

So kernel  $j$ ($\kernel_j$)is just an element $\W_{\llp,j}$ of the weights of layer $\ll$.
entered at  $\y_{(\ll-1),j,1}$

There is one kernel per output feature, so $n_\llp$ kernels
- $\kernel_{\llp,1}, \ldots, \kernel_{\llp, n_\llp}$

The length of the feature dimension of a kernel matches it's input, i.e., $n_{(\ll-1)}$

The weight vector $\W_\llp$ therefore has multiple dimensions.  Our convention for each dimension is
- $\mathbf{W}_{\llp, j', \ldots,j}$
    - layer $\ll$
    - output feature $j$
    -  location: $\ldots \in \{1,2,3\}$
    - input feature $j'$

## Padding

Convolution centers the pattern at each location of the non-feature dimensions of the input.

But what happens when we try to center a patter over the first/last location ?
- the pattern may extend beyond the boundaries of the input

In such a case, we can choose to *pad* the input
- create a special padding input at the locations of the input beyond the original boundary

We will see this in pictures below.

## Activation of a CNN layer

Just like the Fully Connected layer, a CNN layer is usually paired with an activation.

The default activation $a_\llp$ in Keras is "linear"
- That is: it returns the dot product input unchanged
- Always know what is the default activation for a layer; better yet: always specify !

In [5]:
print("Done")

Done
