<a href="https://colab.research.google.com/github/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/02_convolutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Convolutions

A convolution is "a mathematical operation on two functions (`f` and `g`) that produces a third function (`f * g`) expressing how the shape of one is modified by the other."

In image processing, a convolution matrix is also called a kernel or filter. 

Typical image processing operations—like blurring, sharpening, edge detection, and more, are
accomplished by performing a convolution between a kernel and an image.

##Setup

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

In [2]:
try:
    import google.colab
    import requests
    url = 'https://raw.githubusercontent.com/dvgodoy/PyTorchStepByStep/master/config.py'
    r = requests.get(url, allow_redirects=True)
    open('config.py', 'wb').write(r.content)    
except ModuleNotFoundError:
    pass

from config import *
config_chapter5()
# This is needed to render the plots in this chapter
from plots.chapter5 import *

Downloading files from GitHub repo to Colab...
Finished!


In [3]:
import random
import numpy as np
from PIL import Image

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import Compose, Normalize

from data_generation.image_classification import generate_dataset
from helpers import index_splitter, make_balanced_sampler
from stepbystep.v1 import StepByStep

## Filter / Kernel

Usually, the filters are
small square matrices. The convolution itself is performed by applying the filter on
the image repeatedly. 

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv1.png?raw=1)

That’s the region to which the filter is being applied and is called the
receptive field, drawing an analogy to the way human vision works.

Let’s try a concrete example to make it more clear.

In [4]:
single = np.array([
  [ # batch dim
      [ # channel dim
          [5, 0, 8, 7, 8, 1], # height and width dim
          [1, 9, 5, 0, 7, 7],
          [6, 0, 2, 4, 6, 6],
          [9, 7, 6, 6, 8, 4],
          [8, 3, 8, 5, 1, 3],
          [7, 2, 7, 0, 1, 0]
      ]
  ]
])
single.shape

(1, 1, 6, 6)

In [5]:
identity = np.array([
    [
        [
            [0, 0, 0],
            [0, 1, 0],
            [0, 0, 0]
        ]
    ]
])
identity.shape

(1, 1, 3, 3)

##Convolving

Convulution performs an element-wise multiplication between the
two, region and filter, and adds everything up.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv2.png?raw=1)

In [6]:
region = single[:, :, 0:3, 0:3]
filtered_region = region * identity
total = filtered_region.sum()
total

9

Doing a convolution produces an image with a
reduced size.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv3.png?raw=1)

## Moving Around

Next, we move the region one step to the right; that is, we change the receptive
field and apply the filter again.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/stride1.png?raw=1)

In code, it means we’re changing the slice of the input image:

In [7]:
new_region = single[:, :, 0:3, (0+1):(3+1)]

But the operation remains the same: First, an element-wise multiplication, and
then adding up the elements of the resulting matrix.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv5.png?raw=1)

In [8]:
new_filtered_region = new_region * identity
new_total = new_filtered_region.sum()
new_total

5

Great! We have a second pixel value to add to our resulting image.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv6.png?raw=1)

We can keep moving the gray region to the right until we can’t move it anymore.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv7.png?raw=1)

The fourth step to the right will actually place the region partially outside the
input image.

In [9]:
last_horizontal_region = single[:, :, 0:3, (0+4):(3+4)]

The selected region does not match the shape of the filter anymore. 

So, if we try to
perform the element-wise multiplication, it fails:

In [10]:
try:
  last_horizontal_region * identity
except Exception as exp:
  print(exp)

operands could not be broadcast together with shapes (1,1,3,2) (1,1,3,3) 


##Shape

Next, we go back to the left side and move down one step. If we repeat the
operation, covering all valid regions, we’ll end up with a resulting image that is
smaller (on the right).

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv8.png?raw=1)

How much smaller is it going to be?

It depends on the size of the filter.

>The larger the filter, the smaller the resulting image.

Since applying a filter always produces a single value, the reduction is equal to the
filter size minus one.

$$
\Large
(h_i, w_i) * (h_f, w_f) = (h_i - (h_f - 1), w_i - (w_f - 1))
$$

If we assume the filter is a square matrix of size f, we can simplify the expression
above to:

$$
\Large
(h_i, w_i) * f = (h_i - f + 1, w_i - f + 1)
$$

But I’d like to keep the image size, is it possible?

Sure it is! Padding comes to our rescue in this case.

##Convolving in PyTorch

Now that we know how a convolution works, let’s try it out using PyTorch.

In [11]:
# convert our image and filter to tensors
image = torch.as_tensor(single).float()
kernel = torch.as_tensor(identity).float()

Just like the activation functions, convolutions come in two
flavors: functional and module. 

There is a fundamental difference between the
two, though: The functional convolution takes the kernel / filter as an argument
while the module has (learnable) weights to represent the kernel / filter.

Let’s use the functional convolution, `F.conv2d()`, to apply the identity filter to our
input image.

In [12]:
convolved = F.conv2d(image, kernel, stride=1)
convolved

tensor([[[[9., 5., 0., 7.],
          [0., 2., 4., 6.],
          [7., 6., 6., 8.],
          [3., 8., 5., 1.]]]])

Now, let’s turn our attention to PyTorch’s convolution module, `nn.Conv2d`.

In [20]:
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1)
conv(image)

tensor([[[[ 0.3116, -2.5939, -2.5096,  1.8754],
          [-1.1964, -0.3318,  0.2698,  1.8475],
          [-1.4853,  1.1750,  0.9138, -1.0752],
          [-2.8796,  0.1066, -4.0413, -2.7793]]]],
       grad_fn=<ConvolutionBackward0>)

These results are gibberish now because the convolutional module randomly initializes the weights representing
the kernel / filter.

That’s the whole point of the convolutional module: It will learn
the kernel / filter on its own.

In traditional computer vision, people would develop different
filters for different purposes: blurring, sharpening, edge
detection, and so on.

Can we tell it to learn multiple filters at once?

In [16]:
conv_multiple = nn.Conv2d(in_channels=1, out_channels=2, kernel_size=3, stride=1)
conv_multiple(image)

tensor([[[[ 1.9461,  5.1062,  2.4449,  4.0832],
          [ 5.1448,  1.9132,  3.0770,  5.8165],
          [ 5.4219,  5.9928,  4.9376,  4.3885],
          [ 4.2059,  2.9469,  2.8337,  1.5178]],

         [[-1.7295, -5.2360, -3.8463, -1.0855],
          [-5.8062, -0.2293, -2.4201, -5.3449],
          [-2.0328, -3.6810, -4.5338, -3.6006],
          [-3.9992, -2.4474, -4.3811, -2.5371]]]],
       grad_fn=<ConvolutionBackward0>)

In [17]:
conv_multiple.weight

Parameter containing:
tensor([[[[-0.2465,  0.1548, -0.0782],
          [ 0.2471,  0.2226,  0.1297],
          [ 0.0705,  0.1128,  0.1854]]],


        [[[ 0.2079, -0.2087, -0.3006],
          [-0.2097,  0.0569, -0.0502],
          [-0.1175, -0.0695, -0.0050]]]], requires_grad=True)

We can also force a convolutional module to use a particular filter by setting its weights.

In [21]:
with torch.no_grad():
  conv.weight[0] = kernel
  conv.bias[0] = 0

conv(image)

tensor([[[[9., 5., 0., 7.],
          [0., 2., 4., 6.],
          [7., 6., 6., 8.],
          [3., 8., 5., 1.]]]], grad_fn=<ConvolutionBackward0>)

In [22]:
conv.weight

Parameter containing:
tensor([[[[0., 0., 0.],
          [0., 1., 0.],
          [0., 0., 0.]]]], requires_grad=True)

Setting the weights to get specific filters is at the heart of
transfer learning. 

Someone else trained a model, and that model
learned lots of useful filters, so we don’t have to learn them
again. We can set the corresponding weights and go from there.

##Striding