<a href="https://colab.research.google.com/github/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/02_convolutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Convolutions

A convolution is "a mathematical operation on two functions (`f` and `g`) that produces a third function (`f * g`) expressing how the shape of one is modified by the other."

In image processing, a convolution matrix is also called a kernel or filter. 

Typical image processing operations—like blurring, sharpening, edge detection, and more, are
accomplished by performing a convolution between a kernel and an image.

##Setup

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

In [2]:
try:
    import google.colab
    import requests
    url = 'https://raw.githubusercontent.com/dvgodoy/PyTorchStepByStep/master/config.py'
    r = requests.get(url, allow_redirects=True)
    open('config.py', 'wb').write(r.content)    
except ModuleNotFoundError:
    pass

from config import *
config_chapter5()
# This is needed to render the plots in this chapter
from plots.chapter5 import *

Downloading files from GitHub repo to Colab...
Finished!


In [3]:
import random
import numpy as np
from PIL import Image

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import Compose, Normalize

from data_generation.image_classification import generate_dataset
from helpers import index_splitter, make_balanced_sampler
from stepbystep.v1 import StepByStep

## Filter / Kernel

Usually, the filters are
small square matrices. The convolution itself is performed by applying the filter on
the image repeatedly. 

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv1.png?raw=1)

That’s the region to which the filter is being applied and is called the
receptive field, drawing an analogy to the way human vision works.

Let’s try a concrete example to make it more clear.

In [4]:
single = np.array([
  [ # batch dim
      [ # channel dim
          [5, 0, 8, 7, 8, 1], # height and width dim
          [1, 9, 5, 0, 7, 7],
          [6, 0, 2, 4, 6, 6],
          [9, 7, 6, 6, 8, 4],
          [8, 3, 8, 5, 1, 3],
          [7, 2, 7, 0, 1, 0]
      ]
  ]
])
single.shape

(1, 1, 6, 6)

In [5]:
identity = np.array([
    [
        [
            [0, 0, 0],
            [0, 1, 0],
            [0, 0, 0]
        ]
    ]
])
identity.shape

(1, 1, 3, 3)

##Convolving

Convulution performs an element-wise multiplication between the
two, region and filter, and adds everything up.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv2.png?raw=1)

In [6]:
region = single[:, :, 0:3, 0:3]
filtered_region = region * identity
total = filtered_region.sum()
total

9

Doing a convolution produces an image with a
reduced size.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv3.png?raw=1)

#### Moving Around

Next, we move the region one step to the right; that is, we change the receptive
field and apply the filter again.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/stride1.png?raw=1)

In code, it means we’re changing the slice of the input image:

In [7]:
new_region = single[:, :, 0:3, (0+1):(3+1)]

But the operation remains the same: First, an element-wise multiplication, and
then adding up the elements of the resulting matrix.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv5.png?raw=1)

In [8]:
new_filtered_region = new_region * identity
new_total = new_filtered_region.sum()
new_total

5

Great! We have a second pixel value to add to our resulting image.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv6.png?raw=1)

We can keep moving the gray region to the right until we can’t move it anymore.

![](https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/deep-learning-with-pytorch-step-by-step/Part-II-Computer-Vision/images/conv7.png?raw=1)

The fourth step to the right will actually place the region partially outside the
input image.

In [9]:
last_horizontal_region = single[:, :, 0:3, (0+4):(3+4)]

The selected region does not match the shape of the filter anymore. 

So, if we try to
perform the element-wise multiplication, it fails:

In [11]:
try:
  last_horizontal_region * identity
except Exception as exp:
  print(exp)

operands could not be broadcast together with shapes (1,1,3,2) (1,1,3,3) 


##Shape