![Deep Learning for Scientists in a hurry](./fig/Title.png)

In [None]:
%load_ext watermark

In [None]:
%watermark

In [None]:
import time
start = time.time()
chapter_number = 5
import matplotlib
%matplotlib inline
%load_ext autoreload
%autoreload 2
import matplotlib.pyplot as plt

In [None]:
import numpy as np

In [None]:
%watermark -iv

# Convolutional Neural Networks (Concepts)

The neural networks that we have seen so were built from layers densely connected in sequence. 

We used in a previous chapter a dense NN capable of classifying hand-written digits. We did that by flattening the bitmap into a vector and using several layers. There are two limitations to this approach. 

1. An image is a 2D array of pixels and the natural connection of those pixels is lost when we flatten the array. We would like to preserve the fact that an image is a 2D array and pixels have other neighbors 

2. From another side, there is translational invariance on an image, for example, you can move a digit a few pixels to the right and the image should be equally recognizable. However, from the point of view of a dense Neural Network that corresponds to a very different input. 


Convolutional Neural networks were found to be a very good solution to those limitations.

In this notebook, we will describe CNN's without using any particular DL engine.

## Convolutional Neural networks (CNNs)

Convolutional Neural Networks is an specialized kind of NN for data that has some sort of grid-like topology.
By grid-like topology, we understand some sort of contiguity between the values of the input.

For example, we can think about time-series data as a one-dimensional grid with values sampled at regular intervals. An image is logically a 2D grid and a video could be considered a 3D grid of data. 

The name " convolutional network" implies that the network uses something inspired by a mathematical operation called **convolution**.
This is a special kind of linear operation that will replace the matrix multiplication that we used for dense neural networks.

There is another operation that is used in convolutional networks and it is called **pooling**. Pooling is no other thing that a reduction of dimensionality by applying a certain mathematical operation to patches of the grid.

*Convolutional neural networks are simply neural networks that use **convolution** in at least one of their layers.*

### The mathematical convolution

In mathematics, a convolution is an operation on two functions. In one dimension you can express a convolution with this equation:

$$s(t) = (f * g)(t) := \int f(a) g(t-a) da$$

A **convolution** is defined as the integral of the product of the two functions after one is reversed and shifted. The integral is evaluated for all values of shift, producing what is also called the **convolution function**.

Let's explore this definition with a simple numerical example:
We start with two functions on two NumPy arrays.

In [None]:
def f(x):
    if x < -1 or x > 1:
        return 0.0
    else:
        return 1.0

In [None]:
def g(x):
    if x<-1 or x > 1:
        return 0.0
    else:
        return 1.0 - (x+1)/2.0

In [None]:
x = np.arange(-2,2,0.01)
f_vec = np.vectorize(f)
g_vec = np.vectorize(g)
f_arr=f_vec(x)
g_arr=g_vec(x)

In [None]:
plt.plot(x, f_arr, label=r"$f(x)$")
plt.plot(x, g_arr, label=r"$g(x)$")
plt.legend();

These two functions are particularly suited for convolutions because their product will vanish for most points except on a restricted range. 

As we need to compute an integral, ie the area under the product of these two functions let's do that here with the array:

In [None]:
def conv(t):
    ret = 0.0
    delta = 0.01
    for x in np.arange(-10,10,delta):
        ret += f(x)*g(t-x)*delta
    return ret

In [None]:
conv_vec = np.vectorize(conv)

In [None]:
x_conv = np.arange(-3,3,0.01)
conv_arr=conv_vec(x_conv)

In [None]:
plt.plot(x_conv, conv_arr, label=r"$(f * g)(t)$")
plt.legend();

Convolution has applications in many areas of science including probability, statistics, acoustics, spectroscopy, signal processing, image processing, geophysics, engineering, physics, computer vision, and differential equations.

What is important to keep in mind is that on a convolution there is the displacement of a function over another, the product of the resulting overlapping and a final sum. This is in essence what we use as convolution, now with a grid of values instead of real-valued functions.

In machine learning the function $f(x)$ is replaced by a grid of values, the entire image for example, and the function $g(x)$ will be a grid of weights, the values that will be optimized by the neural network.

In the case of images, for example. we use a two-dimensional image I as the input and the weights form another grid K called the **kernel**. In this case, the convolution is defined as:

$$S(i,j) = (I*K)(i,j) = \sum_m \sum_n I(m, n) K(i-m,j-n)$$

In general $(f*g)(t) \neq (g*f)(t)$ except for symmetric functions. However, in this particular case, we can swap the two grids and the result will be the same, ie:

$$S(i,j) = (K*I)(i,j) = \sum_m \sum_n I(i-m, j-n) K(m,n)$$

In practice what many neural network engines implement is a related function called the **cross-correlation** which is the same as a convolution but without the flipping of the kernel. The point is that the definition of the kernel could be considered as the grid of values that are used.

### Example of 2-D convolution

To clarify the role of convolutions for Neural Networks let's present two views. One from the symbolic point of view and another from a numerical example.

![Convolution](./fig/convolution.svg)

The first element:

![Convolution](./fig/convolution11.svg)

The last element:

![Convolution](./fig/convolution23.svg)

---

# References

There are many books about Deep Learning and many more on Machine Learning. 
This list is by no means an exhaustive list of books. I am listing the books from which I took inspiration. Also, I am listing materials where I found better ways to present topics. Often I am amazed by how people can create approachable materials for seemingly dry subjects.

The order of the books goes from divulgation and practical to the more rigorous and mathematical. Slides, blogs, and videos are those I have found over the internet or suggested by others.

### Selection of Books on Deep Learning

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning - Kelleher" 
       src="./fig/books/Deep Learning - Kelleher.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning<br>
      John D. Kelleher<br>
      2019<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Introduction to Deep Learning - Charniak" 
       src="./fig/books/Introduction to Deep Learning - Charniak.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Introduction to Deep Learning<br>
      Eugene Charniak<br>
      2018<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Introduction to Deep Learning - Skansi" 
       src="./fig/books/Introduction to Deep Learning - Skansi.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Introduction to Deep Learning<br>
      Sandro Skansi<br>
      2018<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning with PyTorch - Subramanian" 
       src="./fig/books/Deep Learning with PyTorch - Subramanian.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning with PyTorch<br>
      Vishnu Subramanian<br>
      2018<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning with PyTorch - Stevens" 
       src="./fig/books/Deep Learning with PyTorch - Stevens.png" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning with PyTorch<br>
      Eli Stevens, Luca Artiga and Thomas Viehmann<br>
      2020<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning with Python - Chollet" 
       src="./fig/books/Deep Learning with Python - Chollet.jpg" 
       height="100" width="100" />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning with Python (Second Edition)<br>
      François Chollet<br>
      2021<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning - Patterson" 
       src="./fig/books/Deep Learning - Patterson.jpeg"
       height="100" width="100" />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning, a practitioner's approach<br>
      Josh Patterson and Adam Gibson<br>
      2017<br>
  </div>
</div>

<br>
<div style="clear: both; display: table;">
  <div style="border: none; float: left; width: 200; padding: 5px">
  <img alt="Deep Learning - Goodfellow" 
       src="./fig/books/Deep Learning - Goodfellow.jpg" 
       height="100" width="100"  />
  </div>
  <div style="border: none; float: left; width: 800; padding: 5px">
      Deep Learning<br>
      Ian Goodfellow, Yoshua Bengio, and Aaron Courville<br>
      2016<br>
  </div>
</div>

### Interactive Books

  * [Dive into Deep Learning](https://d2l.ai/index.html)<br>
    Interactive deep learning book with code, math, and discussions<br> 
    Implemented with PyTorch, NumPy/MXNet, and TensorFlow<br>
    Adopted at 300 universities from 55 countries


### Slides

  * John Urbanic's ["Deep Learning in one Afternoon"](https://www.psc.edu/wp-content/uploads/2022/04/Deep-Learning.pdf)<br>
An excellent fast, condensed introduction to Deep Learning.<br>
John is a Parallel Computing Scientist at Pittsburgh Supercomputing Center

  * [Christopher Olah's Blog](http://colah.github.io) is very good. For example about [Back Propagation](http://colah.github.io/posts/2015-08-Backprop)

  * Adam W. Harley on his CMU page offers [An Interactive Node-Link Visualization of Convolutional Neural Networks](https://www.cs.cmu.edu/~aharley/vis/)



### Jupyter Notebooks

 * [Yale Digital Humanities Lab](https://github.com/YaleDHLab/lab-workshops)
 
 * Aurelien Geron Hands-on Machine Learning with Scikit-learn 
   [First Edition](https://github.com/ageron/handson-ml) and
   [Second Edition](https://github.com/ageron/handson-ml2)
   
 * [A progressive collection notebooks of the Machine Learning course by the University of Turin](https://github.com/rugantio/MachineLearningCourse)
   
 * [A curated set of jupyter notebooks about many topics](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)
   
### Videos

 * [Caltech's "Learning from Data" by Professor Yaser Abu-Mostafa](https://work.caltech.edu/telecourse.html)
 
 * [3Blue1Brown Youtube Channel](https://www.youtube.com/watch?v=Ilg3gGewQ5U)
 
 ---

# Back of the Book

In [None]:
n = chapter_number
t = np.linspace(0, (2*(n-1)+1)*np.pi/2, 1000)
x = t*np.cos(t)**3
y = 9*t*np.sqrt(np.abs(np.cos(t))) + t*np.sin(0.3*t)*np.cos(2*t)
plt.plot(x, y, c="green")
plt.axis('off');

In [None]:
end = time.time()
print(f'Chapter {chapter_number} took {int(end - start):d} seconds')