## "Laplacian Pyramid"

This notebook is ment to demonstrate my objections concerning the implementaiton of the laplacian pyramid in DRLN.
Please remember their proposed desing of the attention mechanism in their paper. They aim to produce attention differenlty at each pyramid by using different downsampling layers.

![Attention](figs/LapAtt.png)
In the following cell is the original code of their implementation of the attention layer (CA). As you can see they used three different streames (c1,c2,c3) which are ment to be their "pyramids". However, the only parameters they change is the dilation and padding parameter of a 2D convolution as you can see in the BaiscBlock class they use.

In [1]:
import torch
import torch.nn as nn
import numpy as np

In [2]:
class CALayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(CALayer, self).__init__()

        self.avg_pool = nn.AdaptiveAvgPool2d(1)

        self.c1 = ops.BasicBlock(channel , channel // reduction, 3, 1, 3, 3)
        self.c2 = ops.BasicBlock(channel , channel // reduction, 3, 1, 5, 5)
        self.c3 = ops.BasicBlock(channel , channel // reduction, 3, 1, 7, 7)
        self.c4 = ops.BasicBlockSig((channel // reduction)*3, channel , 3, 1, 1)

    def forward(self, x):
        print(x.shape, "input")
        y = self.avg_pool(x)
        print(y.shape, "after pooling")
        c1 = self.c1(y)
        c2 = self.c2(y)
        c3 = self.c3(y)
        print(c1.shape, c2.shape, c3.shape, "shapes")
        c_out = torch.cat([c1, c2, c3], dim=1)
        y = self.c4(c_out)
        return x * y

class BasicBlock(nn.Module):
    def __init__(self,
                 in_channels, out_channels,
                 ksize=3, stride=1, pad=1, dilation=1):
        super(BasicBlock, self).__init__()

        self.body = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, ksize, stride, pad, dilation),
            nn.ReLU(inplace=True)
        )

        init_weights(self.modules)
        
    def forward(self, x):
        out = self.body(x)
        return out


But with a input of size (batch_size, channels,
1, 1), padding it with the exact amount of zeros needed to fit the
increased dilation will render all weights of the convolution layers
except the center ones useless. Thus, it seems to me that they do the
same computational process in each pyramid which is also the same as the
simple CA used by RCAN but with introducing unnecessary weights (i.e.
weights that always only get the input zero from the padding) by
changing padding and dilation. I tried it myself, and came to the
conclusion that one could produce the same result with just using 1x1
kernels without padding and dilation, thus getting rid of quite a few
parameters.
I tried to produce a simple visualization of my remark (see picture below) and wrote the following code to support my claim.
![](figs/DRLN_cnn_dilation_and_padding.png)


In [3]:
# the different layers in the Attention Model
channels_in = 64
reduction = 16
channels_out = channels_in // reduction
dilation3 = nn.Conv2d(channels_in, channels_out, 3,1,3,3, bias=False)
dilation5 = nn.Conv2d(channels_in, channels_out, 3,1,5,5, bias=False)
dilation7 = nn.Conv2d(channels_in, channels_out, 3,1,7,7, bias=False)

In [4]:
# setting the weights to 1
conv_all_one = []
for conv in [dilation3, dilation5, dilation7]:
    #conv.weight = nn.Parameter(torch.ones(conv.weight.shape))
    print(f"{conv} weights: {conv.weight} ")
    conv_all_one.append(conv)

Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(3, 3), dilation=(3, 3), bias=False) weights: Parameter containing:
tensor([[[[ 0.0194,  0.0082, -0.0070],
          [-0.0206, -0.0349, -0.0054],
          [-0.0237,  0.0378, -0.0342]],

         [[ 0.0061, -0.0162, -0.0329],
          [-0.0229, -0.0115,  0.0208],
          [ 0.0105,  0.0258, -0.0354]],

         [[ 0.0118, -0.0330,  0.0227],
          [ 0.0015, -0.0335,  0.0152],
          [ 0.0045, -0.0139, -0.0201]],

         ...,

         [[-0.0361,  0.0002, -0.0052],
          [ 0.0041, -0.0185, -0.0294],
          [ 0.0240,  0.0122,  0.0228]],

         [[ 0.0299, -0.0288,  0.0032],
          [ 0.0295,  0.0335, -0.0177],
          [-0.0142,  0.0159,  0.0390]],

         [[ 0.0102, -0.0335, -0.0273],
          [-0.0058,  0.0353,  0.0183],
          [-0.0052,  0.0245, -0.0352]]],


        [[[-0.0015, -0.0266, -0.0134],
          [ 0.0318,  0.0102, -0.0257],
          [ 0.0395,  0.0312,  0.0364]],

         [[-0.0358, -0.02

In [5]:
# defining a dummy input of shape (1 (batch_size), channels in, 1, 1)
arr = np.arange(channels_in).reshape(1,channels_in, 1, 1)
dummy = torch.tensor(arr, dtype=torch.float)
print(f"dummy input has chape {dummy.shape}")
print(dummy)

dummy input has chape torch.Size([1, 64, 1, 1])
tensor([[[[ 0.]],

         [[ 1.]],

         [[ 2.]],

         [[ 3.]],

         [[ 4.]],

         [[ 5.]],

         [[ 6.]],

         [[ 7.]],

         [[ 8.]],

         [[ 9.]],

         [[10.]],

         [[11.]],

         [[12.]],

         [[13.]],

         [[14.]],

         [[15.]],

         [[16.]],

         [[17.]],

         [[18.]],

         [[19.]],

         [[20.]],

         [[21.]],

         [[22.]],

         [[23.]],

         [[24.]],

         [[25.]],

         [[26.]],

         [[27.]],

         [[28.]],

         [[29.]],

         [[30.]],

         [[31.]],

         [[32.]],

         [[33.]],

         [[34.]],

         [[35.]],

         [[36.]],

         [[37.]],

         [[38.]],

         [[39.]],

         [[40.]],

         [[41.]],

         [[42.]],

         [[43.]],

         [[44.]],

         [[45.]],

         [[46.]],

         [[47.]],

         [[48.]],

         [[49.]],

  

In [6]:
# now we do the forward pass with our 3 cnn layers
def compute_result(dummy, dilations):
    results = []
    for dil in dilations:
        results.append(dil(dummy))
    return results

In [7]:
results_all_one = compute_result(dummy, conv_all_one)
print(results_all_one)

[tensor([[[[  4.4085]],

         [[  0.6269]],

         [[-12.6424]],

         [[  6.8819]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[ 1.6467]],

         [[-0.5371]],

         [[-2.9924]],

         [[-6.8759]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[  1.0048]],

         [[  6.6578]],

         [[  3.4085]],

         [[-12.9482]]]], grad_fn=<SlowConvDilated2DBackward>)]


If we now sett all weights to 0.1 the output will just be a tensor of shape (1,4,1,1) with all values being 0.1 * 2016.

In [8]:
conv_all_point_one = []
for conv in [dilation3, dilation5, dilation7]:
    conv.weight = nn.Parameter(torch.ones(conv.weight.shape))
    conv.weight[:] = 0.1
    print(f"{conv} weights: {conv.weight} ")
    conv_all_point_one.append(conv)

Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(3, 3), dilation=(3, 3), bias=False) weights: Parameter containing:
tensor([[[[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         ...,

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000]]],


        [[[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.1000],
          [0.100

In [9]:
results_all_point_one = compute_result(dummy, conv_all_point_one)
print(results_all_point_one)

[tensor([[[[201.6000]],

         [[201.6000]],

         [[201.6000]],

         [[201.6000]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[201.6000]],

         [[201.6000]],

         [[201.6000]],

         [[201.6000]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[201.6000]],

         [[201.6000]],

         [[201.6000]],

         [[201.6000]]]], grad_fn=<SlowConvDilated2DBackward>)]


In [10]:
# now we set all weights to 0.1 except the one in the iddle which will be 1
conv_one_one = []
for conv in [dilation3, dilation5, dilation7]:
    conv.weight[:] = 0.1
    # the middle weight determines the output
    # 1 -> 2016
    conv.weight[:,:,1,1] = 1
    # everything else will not matter
    print(f"{conv} weights: {conv.weight} ")
    conv_one_one.append(conv)

Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(3, 3), dilation=(3, 3), bias=False) weights: Parameter containing:
tensor([[[[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         ...,

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.1000, 0.1000, 0.1000]]],


        [[[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.1000, 0.1000, 0.1000]],

         [[0.1000, 0.1000, 0.1000],
          [0.1000, 1.0000, 0.1000],
          [0.100

In [11]:
results_one_one = compute_result(dummy, conv_one_one)
print(results_one_one)

[tensor([[[[2016.]],

         [[2016.]],

         [[2016.]],

         [[2016.]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[2016.]],

         [[2016.]],

         [[2016.]],

         [[2016.]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[2016.]],

         [[2016.]],

         [[2016.]],

         [[2016.]]]], grad_fn=<SlowConvDilated2DBackward>)]


In fact we can choos all the weights we want for the non-center weights of our kernel.

In [12]:
# now we set all weights to 0.1 except the one in the iddle which will be 1
conv_one_one_rest_random = []
for conv in [dilation3, dilation5, dilation7]:
    conv.weight[:] = 0.1
    # the middle weight determines the output
    # 1 -> 2016
    conv.weight[:,:,1,1] = 1
    # everything else will not matter
    conv.weight[:,0,2,2] = 10000000000000000
    conv.weight[2,1,0,0] = 0
    conv.weight[1,1,2,2] = np.pi
    print(f"{conv} weights: {conv.weight} ")
    conv_one_one_rest_random.append(conv)

Conv2d(64, 4, kernel_size=(3, 3), stride=(1, 1), padding=(3, 3), dilation=(3, 3), bias=False) weights: Parameter containing:
tensor([[[[1.0000e-01, 1.0000e-01, 1.0000e-01],
          [1.0000e-01, 1.0000e+00, 1.0000e-01],
          [1.0000e-01, 1.0000e-01, 1.0000e+16]],

         [[1.0000e-01, 1.0000e-01, 1.0000e-01],
          [1.0000e-01, 1.0000e+00, 1.0000e-01],
          [1.0000e-01, 1.0000e-01, 1.0000e-01]],

         [[1.0000e-01, 1.0000e-01, 1.0000e-01],
          [1.0000e-01, 1.0000e+00, 1.0000e-01],
          [1.0000e-01, 1.0000e-01, 1.0000e-01]],

         ...,

         [[1.0000e-01, 1.0000e-01, 1.0000e-01],
          [1.0000e-01, 1.0000e+00, 1.0000e-01],
          [1.0000e-01, 1.0000e-01, 1.0000e-01]],

         [[1.0000e-01, 1.0000e-01, 1.0000e-01],
          [1.0000e-01, 1.0000e+00, 1.0000e-01],
          [1.0000e-01, 1.0000e-01, 1.0000e-01]],

         [[1.0000e-01, 1.0000e-01, 1.0000e-01],
          [1.0000e-01, 1.0000e+00, 1.0000e-01],
          [1.0000e-01, 1.0000e-01,

In [13]:
results_one_one_rest_random = compute_result(dummy, conv_one_one_rest_random)
print(results_one_one_rest_random)

[tensor([[[[2016.]],

         [[2016.]],

         [[2016.]],

         [[2016.]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[2016.]],

         [[2016.]],

         [[2016.]],

         [[2016.]]]], grad_fn=<SlowConvDilated2DBackward>), tensor([[[[2016.]],

         [[2016.]],

         [[2016.]],

         [[2016.]]]], grad_fn=<SlowConvDilated2DBackward>)]


We see that each output has the same dimensions and the same values.
Morover, only changing the dilation and the padding value will render all weights of our dilation useless except the one in the middle which will be multiplied with our 1x1 input.
    We can demonstrate that by setting all weights to 0 except the one in the middle. Thus we could achive the same when using convolutions with 1x1 kernels instead.

In [14]:
channels_in = 64
reduction = 16
channels_out = channels_in // reduction
simple_layers = []
for i in range(3):
    simple_layers.append(nn.Conv2d(channels_in, channels_out, 1, bias=False))
    simple_layers[i].weight = nn.Parameter(torch.ones(simple_layers[i].weight.shape))
    simple_layers[i].weight[:] = 0.1 
    print(f"{simple_layers[i]} weights: {simple_layers[i].weight.shape} ")
    

Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1), bias=False) weights: torch.Size([4, 64, 1, 1]) 
Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1), bias=False) weights: torch.Size([4, 64, 1, 1]) 
Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1), bias=False) weights: torch.Size([4, 64, 1, 1]) 


In [15]:
results = compute_result(dummy, simple_layers)
print(results)



[tensor([[[[201.6000]],

         [[201.6000]],

         [[201.6000]],

         [[201.6000]]]], grad_fn=<MkldnnConvolutionBackward>), tensor([[[[201.6000]],

         [[201.6000]],

         [[201.6000]],

         [[201.6000]]]], grad_fn=<MkldnnConvolutionBackward>), tensor([[[[201.6000]],

         [[201.6000]],

         [[201.6000]],

         [[201.6000]]]], grad_fn=<MkldnnConvolutionBackward>)]
