In [1]:
# Import the necessary libraries

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
from scipy import ndimage, misc

<a id="ref0"></a>
<h4 align=center>What is Convolution?</h4>


Convolution is a linear operation similar to a linear equation, dot product, or matrix multiplication. Convolution has several advantages for analyzing images. Convolution preserves the relationship between elements, and it requires fewer parameters than other methods.  

We can see the relationship between the different methods:

$$linear \ equation :y=wx+b$$
$$linear\ equation\ with\ multiple \ variables \ where \ \mathbf{x} \ is \ a \ vector \ \mathbf{y}=\mathbf{wx}+b$$
$$ \ matrix\ multiplication \ where \ \mathbf{X} \ in \ a \ matrix \ \mathbf{y}=\mathbf{wX}+\mathbf{b} $$
$$\ convolution \ where \ \mathbf{X} \ and \ \mathbf{Y} \ is \ a \ tensor \  \mathbf{Y}=\mathbf{w}*\mathbf{X}+\mathbf{b}$$


In convolution, the parameter <b>w</b> is called a kernel. we can perform convolution on images where we let the variable image denote the variable X and w denote the parameter.


<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/chapter%206/6.1.1xw.png" width="500," align="center">


Create a two-dimensional convolution object by using the constructor Conv2d, the parameter <code>in_channels</code> and <code>out_channels</code> will be used for this section, and the parameter kernel_size will be three.


In [2]:
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3)

In [3]:
conv

Conv2d(1, 1, kernel_size=(3, 3), stride=(1, 1))

Because the parameters in nn.Conv2d are randomly initialized and learned through training, let's give them some values manually.

In [4]:
# Check the randomly initialized values
conv.state_dict()

OrderedDict([('weight',
              tensor([[[[ 0.1483, -0.0017, -0.1635],
                        [-0.0366, -0.1454,  0.1201],
                        [ 0.1000, -0.1046,  0.1044]]]])),
             ('bias', tensor([0.2016]))])

In [5]:
conv.state_dict()['weight'][0][0] = torch.tensor([[1.0, 0.0, -1.0], [2.0, 0.0, -2.0], [1.0, 0.0, -1.0]])
conv.state_dict()['bias'][0] = 0.0
conv.state_dict()

OrderedDict([('weight',
              tensor([[[[ 1.,  0., -1.],
                        [ 2.,  0., -2.],
                        [ 1.,  0., -1.]]]])),
             ('bias', tensor([0.]))])

Create a dummy tensor to represent an image. The shape of the image is (1,1,5,5) where:

(number of inputs, number of channels, number of rows, number of columns ) but set the third column to 1.

In [6]:
image = torch.zeros(1, 1, 5, 5)
image[0, 0, :, 2] = 1
image

tensor([[[[0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.],
          [0., 0., 1., 0., 0.]]]])

Call the object <code>conv</code> on the tensor <code>image</code> as an input to perform the convolution and assign the result to the tensor <code>z</code>. 


In [7]:
z=conv(image)
z

tensor([[[[-4.,  0.,  4.],
          [-4.,  0.,  4.],
          [-4.,  0.,  4.]]]], grad_fn=<ConvolutionBackward0>)

The following animation illustrates the process, the kernel performs at the element-level multiplication on every element in the image in the corresponding region. The values are then added together. The kernel is then shifted and the process is repeated. 


<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/chapter%206/6.1.1convltuon.gif" width="500," align="center">



<a id="ref1"></a>
<h4 align=center>Determining  the Size of the Output</h4>


The size of the output is an important parameter. Here, we will assume square images. For rectangular images, the same formula can be used in for each dimension independently.  

Let M be the size of the input and K be the size of the kernel. The size of the output is given by the following formula:


$$M_{new}=M-K+1$$


Create a kernel of size 2:


In [8]:
K = 2
conv1 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=K)
conv1.state_dict()['weight'][0][0] = torch.tensor([[1.0, 1.0], [1.0, 1.0]])
conv1.state_dict()['bias'][0] = 0.0
conv1.state_dict()
conv1

Conv2d(1, 1, kernel_size=(2, 2), stride=(1, 1))

Create an image of size 4:

In [9]:
M = 4
image1 = torch.ones(1, 1, M, M)

<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/chapter%206/6.1.1kernal2.png" width="500," align="center">


The following equation provides the output:


$$M_{new}=M-K+1$$
$$M_{new}=4-2+1$$
$$M_{new}=3$$


The following animation illustrates the process: The first iteration of the kernel overlay of the images produces one output. As the kernel is of size K, there are M-K  elements for the kernel to move in the horizontal direction. The same logic applies to the vertical direction.  


<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/chapter%206/6.1.1outsize.gif" width="500," align="center">


Perform the convolution and verify the size is correct:


In [12]:
z1=conv1(image1)
print("z1:",z1)
print("shape:",z1.shape[2:])

z1: tensor([[[[4., 4., 4.],
          [4., 4., 4.],
          [4., 4., 4.]]]], grad_fn=<ConvolutionBackward0>)
shape: torch.Size([3, 3])


<a id='ref3'></a>
<h4 align=center>Zero Padding </h4>


As we apply successive convolutions, the image will shrink. We can apply zero padding to keep the image at a reasonable size, which also holds information at the borders.


In addition, we might not get integer values for the size of the kernel. Consider the following image:


In [13]:
image1

tensor([[[[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]]]])

Try performing convolutions with the <code>kernel_size=2</code> and a <code>stride=3</code>. Use these values:

$$M_{new}=\dfrac{M-K}{stride}+1$$
$$M_{new}=\dfrac{4-2}{3}+1$$
$$M_{new}=1.666$$


In [14]:
conv4 = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=2,stride=3)
conv4.state_dict()['weight'][0][0]=torch.tensor([[1.0,1.0],[1.0,1.0]])
conv4.state_dict()['bias'][0]=0.0
conv4.state_dict()
z4=conv4(image1)
print("z4:",z4)
print("z4:",z4.shape[2:4])

z4: tensor([[[[4.]]]], grad_fn=<ConvolutionBackward0>)
z4: torch.Size([1, 1])


We can add rows and columns of zeros around the image. This is called padding. In the constructor <code>Conv2d</code>, you specify the number of rows or columns of zeros that we want to add with the parameter padding. 

For a square image, we merely pad an extra column of zeros to the first column and the last column. Repeat the process for the rows. As a result, for a square image, the width and height is the original size plus 2 x the number of padding elements specified. We can then determine the size of the output after subsequent operations accordingly as shown in the following equation where we determine the size of an image after padding and then applying a convolutions kernel of size K.


$$M'=M+2 \times padding$$
$$M_{new}=M'-K+1$$


Consider the following example:


In [None]:
conv5 = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=2,stride=3,padding=1)

conv5.state_dict()['weight'][0][0]=torch.tensor([[1.0,1.0],[1.0,1.0]])
conv5.state_dict()['bias'][0]=0.0
conv5.state_dict()
z5=conv5(image1)
print("z5:",z5)
print("z5:",z4.shape[2:4])

The process is summarized in the following  animation: 


<img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0110EN/notebook_images%20/chapter%206/6.1.1zeropad.gif" width="500," align="center">


In [15]:
#Practice

# A kernel of zeros with a kernel size=3 is applied to the following image:
Image=torch.randn((1,1,4,4))
Image

tensor([[[[-1.8096, -0.7328,  0.5076, -0.0776],
          [-1.0550, -0.9096,  0.3485,  0.2969],
          [-1.7371, -1.1403,  0.2809, -0.2672],
          [ 0.0183, -0.6505,  0.1311,  0.1464]]]])

In [16]:
conv = nn.Conv2d(in_channels=1, out_channels=1,kernel_size=3)
conv.state_dict()['weight'][0][0]=torch.tensor([[0,0,0],[0,0,0],[0,0.0,0]])
conv.state_dict()['bias'][0]=0.0

In [17]:
op = conv(Image)

In [18]:
op

tensor([[[[0., 0.],
          [0., 0.]]]], grad_fn=<ConvolutionBackward0>)