# Electronics and Communication Sciences Unit
# Indian Statistical Institute, Kolkata



---




#**Transposed Convolution**


---



---



![start.png](http://www.mit.edu/~jessicav/6.S198/Blog_Post/imgs/checkerboard_explanation.gif)

https://distill.pub/2016/deconv-checkerboard/




The CNN layers we have seen so far,
such as convolutional layers  and pooling layers,
typically reduce (downsample) the spatial dimensions (height and width) of the input,
or keep them unchanged.
In semantic segmentation
that classifies at pixel-level,
it will be convenient if
the spatial dimensions of the
input and output are the same.
For example,
the channel dimension at one output pixel 
can hold the classification results
for the input pixel at the same spatial position.


To achieve this, especially after 
the spatial dimensions are reduced by CNN layers,
we can use another type
of CNN layers
that can increase (upsample) the spatial dimensions
of intermediate feature maps.
In this section,
we will introduce 
*transposed convolution*, which is also called *fractionally-strided convolution* :cite:`Dumoulin.Visin.2016`, 
for reversing downsampling operations
by the convolution.




## Basic Operation

Ignoring channels for now,
let's begin with
the basic transposed convolution operation
with stride of 1 and no padding.
Suppose that
we are given a 
$n_h \times n_w$ input tensor
and a $k_h \times k_w$ kernel.
Sliding the kernel window with stride of 1
for $n_w$ times in each row
and $n_h$ times in each column
yields 
a total of $n_h n_w$ intermediate results.
Each intermediate result is
a $(n_h + k_h - 1) \times (n_w + k_w - 1)$
tensor that are initialized as zeros.
To compute each intermediate tensor,
each element in the input tensor
is multiplied by the kernel
so that the resulting $k_h \times k_w$ tensor
replaces a portion in
each intermediate tensor.
Note that
the position of the replaced portion in each
intermediate tensor corresponds to the position of the element
in the input tensor used for the computation.
In the end, all the intermediate results
are summed over to produce the output.

As an example,
:numref:`fig_trans_conv` illustrates
how transposed convolution with a $2\times 2$ kernel is computed for a $2\times 2$ input tensor.


![Transposed convolution with a $2\times 2$ kernel. The shaded portions are a portion of an intermediate tensor as well as the input and kernel tensor elements used for the  computation.](http://d2l.ai/_images/trans_conv.svg)


:label:`fig_trans_conv`





## CASE 1: Basic

![image.png](https://miro.medium.com/max/1400/1*FZ6mACe7DJLjvD3LLCn9TQ.gif)


shows the calculation process of a transposed convolutional layer with kernel_size to be 3, and other parameters set to default. The dimensions of input (2x2) and output (4x4) could be easily recognized.

Following is the by-step calculation process. As the animation shows, there are 4 steps to generate the final output.

![image.png](https://miro.medium.com/max/1400/1*Blv7vr9sMAmfYTTgKCKGVw.jpeg)



In [None]:
import torch
from torch import nn

input_data = torch.tensor([[[[1.,2.],[3.,4.]]]])

In [None]:
# Create 2x2 input tensor
input_data = torch.tensor([[[[1.,2.],[3.,4.]]]])

# Create a TCL layer with kernel_size=3
transConv1 = nn.ConvTranspose2d(1, 1, 3, bias=False)

# Set kernel weights to be 1
transConv1.weight.data = torch.ones(1,1,3,3)

# Calculate the output
transConv1(input_data)

tensor([[[[ 1.,  3.,  3.,  2.],
          [ 4., 10., 10.,  6.],
          [ 4., 10., 10.,  6.],
          [ 3.,  7.,  7.,  4.]]]], grad_fn=<ConvolutionBackward0>)

## Case 2: Stride
we will change the parameter stride, leave everything else to be the same as the 1st case.

The default value of stride is 1, here we set stride to be 2.

![image2.png](https://miro.medium.com/max/1400/1*na8U3QpHwuAB3R9QohQC1w.gif)

As you can see, after each multiplication step, the kernel matrix moves 2 steps horizontally until it hits the end, and then move 2 steps vertically and start from the beginning.

Let’s see the calculation processes:

![img.png](https://miro.medium.com/max/1400/1*zsqJh7WU7ARUGZ-jgvF1dw.jpeg)



In [None]:


# Create a TCL layer with stride=2
transConv2 = nn.ConvTranspose2d(1, 1, 3, stride=2, bias=False)
# Set kernel weights to be 1
transConv2.weight.data = torch.ones(1,1,3,3)
# Disable gradient decent
for w in transConv2.parameters():
    w.requires_grad = False
# Calculate
transConv2(input_data)


tensor([[[[ 1.,  1.,  3.,  2.,  2.],
          [ 1.,  1.,  3.,  2.,  2.],
          [ 4.,  4., 10.,  6.,  6.],
          [ 3.,  3.,  7.,  4.,  4.],
          [ 3.,  3.,  7.,  4.,  4.]]]])

# CASE 3: Padding

We will keep building based on stride case, this time we change parameter padding to 1

 the padding has default value 0.

 ![img4.png](https://miro.medium.com/max/1400/1*4KKbju-YNsbvDSfYlIHFVg.gif)

 The final output, in this case, is the center 3x3 matrix. You can interpret it as, after calculation, drop the border cells of the matrix. You should be able to imagine if we set padding equal 2, the result would be the center cell (1x1).

 ![img5.png](https://miro.medium.com/max/1400/1*HokEeAWSt_rwzREUdyFWCQ.jpeg)

 as you can see, it is almost identical to above. The only difference is we ‘removed’ the outer cells.

In [None]:


# Create a TCL layer with stride=2, padding=1
transConv3 = nn.ConvTranspose2d(1, 1, 3, stride=2, padding=2, bias=False)
# Set kernel weights to be 1
transConv3.weight.data = torch.ones(1,1,3,3)
# Disable gradient decent
for w in transConv3.parameters():
    w.requires_grad = False
# Calculate
transConv3(input_data)



tensor([[[[10.]]]])

# CASE 4: Output Padding

we have another kind of padding. Their difference is simple:
Output Padding adds cells to one side of the output, while padding removes cells from both sides of the output.

![img6.png](https://miro.medium.com/max/1400/1*8KQDwfmCJpBZAS4BWQAHFg.gif)

In this case, we set parameter output_padding to be 1 (default is 0), and stride to be 2. As shown in figure 7, one side of the output matrix has been added cells, which has value 0.

calculation steps:

![img8.png](https://miro.medium.com/max/1400/1*l-TG8mr4E2QdPBRy0pKmZA.jpeg)





In [None]:


# Create a TCL layer with stride=2, output_padding=1
transConv4 = nn.ConvTranspose2d(1, 1, 4, stride=3, output_padding=2, bias=False)
# Set kernel weights to be 1
transConv4.weight.data = torch.ones(1,1,3,3)
# # Disable gradient decent
# for w in transConv4.parameters():
#     w.requires_grad = False
# Calculate
transConv4(input_data)



tensor([[[[1., 1., 1., 2., 2., 2., 0., 0.],
          [1., 1., 1., 2., 2., 2., 0., 0.],
          [1., 1., 1., 2., 2., 2., 0., 0.],
          [3., 3., 3., 4., 4., 4., 0., 0.],
          [3., 3., 3., 4., 4., 4., 0., 0.],
          [3., 3., 3., 4., 4., 4., 0., 0.],
          [0., 0., 0., 0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0., 0., 0., 0.]]]], grad_fn=<ConvolutionBackward0>)

# CASE 5: Dilation

Dilation influence the structure of the kernel matrix.

The PyTorch documentation puts,

    dilation controls the spacing between the kernel points;

look at figure, you might be able to understand it. To make things easier, let’s use the 2x2 kernel in this example. 

![img9.png](https://miro.medium.com/max/1400/1*dCr3pn2WQ_yv_Lt6Litnvw.jpeg)

Above is what the kernel matrix looks like with different dilation values. Basically, if dilation value is n, then the kernel matrix will be interjected n-1 cells filled with 0. At this point, it should not be hard to imagine the same transformation for bigger kernel matrices. And the rest calculation remains the same as before, as shown in figure 

![img10.png](https://miro.medium.com/max/1400/1*XJpA0zdmc80seyqZODSF7Q.gif)

Below is the calculation steps:

![img11.png](https://miro.medium.com/max/1400/1*0FL5feyXLuewR3duDj6vww.jpeg)



In [None]:


# Create a TCL layer with kernel_size=2, stride=2, dilation=2
transConv5 = nn.ConvTranspose2d(1, 1, 2, stride=2, dilation=2, bias=False)
# Set kernel weights to be 1
transConv5.weight.data = torch.ones(1,1,2,2)
# Disable gradient decent
for w in transConv5.parameters():
    w.requires_grad = False
# Calculate
transConv5(input_data)



tensor([[[[ 1.,  0.,  3.,  0.,  2.],
          [ 0.,  0.,  0.,  0.,  0.],
          [ 4.,  0., 10.,  0.,  6.],
          [ 0.,  0.,  0.,  0.,  0.],
          [ 3.,  0.,  7.,  0.,  4.]]]])

# FORMULA: Math behind the output shape

The of the formula for output size is:

![img12.png](https://miro.medium.com/max/1400/1*RRJw1colfm5tqfJfFB_1KA.png)

Where n is the output size (n x n matrix), m is the input size (m x m matrix). Besides, there are 5 parameters in the formula: K is the kernel size, S is the stride value, P is the padding value, D is the dilation value and P_out is the output_padding value.

## Step by Step:

**Only consider S (stride) and K (kernel size)**

Because the input size is m, so we have m*m steps of calculation. But we really only need to consider the first m steps, since the first m step would fix the width of the output matrix.

We can imagine the output progressively grow as the calculation proceeds

1. In the 1st step, the output size is K.
2.     In the 2nd step, the intermedia matrix shift by S, so the output size is K + S.
3. In the 3rd step, the intermedia matrix shift by S, so the output size is K + 2S.
4. In the m-th step, the intermedia matrix shift by S, so the output size is K + (m-1)S.

Therefore, if we only consider S and K, the formula would be:

![img13.png](https://miro.medium.com/max/1400/1*sZUe9JH7M7INAklIYSdbwQ.png)

______________________________________________________________________________
## Very close connection with the matrix transposition you can check here

https://colab.research.google.com/github/d2l-ai/d2l-pytorch-colab/blob/master/chapter_computer-vision/transposed-conv.ipynb#scrollTo=n4Us83L5U3Tc



---

## Problem with Deconvolution/transposed convolution : Checkerboard Artifacts 
https://distill.pub/2016/deconv-checkerboard/



---



---






# Reference

https://medium.com/analytics-vidhya/demystify-transposed-convolutional-layers-6f7b61485454

https://distill.pub/2016/deconv-checkerboard/

http://www.mit.edu/~jessicav/6.S198/Blog_Post/ProceduralGeneration.html

https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html

