# 1. Assume that we have two convolution kernels of size $k_1$ and $k_2$, respectively (with no nonlinearity in between).

## 1.1 Prove that the result of the operation can be expressed by a single convolution.

Let's assume we have two convolution kernels with sizes $k_1$ and $k_2$, and we perform the convolutions sequentially without any nonlinearity in between. The question is asking us to prove that the result of these two convolution operations can be expressed as a single convolution operation.

To prove this, we need to show that the sequential convolutions can be combined into a single convolution operation with a single convolution kernel. To do this, let's consider the following steps:

1. **Sequential Convolution**: First, we apply the convolution kernel $k_1$ to the input. This produces an intermediate feature map $F_1$.

2. **Convolution of $F_1$ with $k_2$**: Instead of applying the second convolution kernel $k_2$ to the original input, we will apply it to the intermediate feature map $F_1$ obtained from the first convolution. This results in the final output feature map $O$ that we would have obtained by applying both convolution kernels sequentially.

Mathematically, this can be expressed as follows:

Let $I$ be the input tensor, $F_1$ be the intermediate feature map obtained from the convolution with $k_1$, and $O$ be the final output feature map obtained from the sequential convolutions with $k_1$ and $k_2$:

$$ F_1 = I \ast k_1 $$
$$ O = F_1 \ast k_2 $$

Substituting the value of $F_1$ from the first equation into the second equation:

$$ O = (I \ast k_1) \ast k_2 $$

By associativity of convolution operations:

$$ O = I \ast (k_1 \ast k_2) $$

So, the result of applying the sequential convolutions $k_1$ and $k_2$ can be expressed as a single convolution with a kernel that is the convolution of $k_1$ and $k_2$, denoted as $k_1 \ast k_2$.

This proves that the result of applying two convolution kernels sequentially can be expressed as a single convolution with an appropriately calculated convolution kernel.

In [13]:
import torch
import torch.nn.functional as F

# Create a sample input tensor
input_tensor = torch.tensor([[[[1, 2, 3],
                               [4, 5, 6],
                               [7, 8, 9]]]], dtype=torch.float32)

# Define two convolution kernels
kernel1 = torch.tensor([[[[1, 0, -1],
                          [2, 0, -2],
                          [1, 0, -1]]]], dtype=torch.float32)

kernel2 = torch.tensor([[[[1, 2, 1],
                          [0, 0, 0],
                          [-1, -2, -1]]]], dtype=torch.float32)

# Apply the convolutions sequentially
intermediate_feature_map = F.conv2d(input_tensor, kernel1, stride=1, padding=1)
output_feature_map = F.conv2d(intermediate_feature_map, kernel2, stride=1, padding=1)

# Apply the combined convolution
combined_kernel = F.conv2d(kernel1, kernel2, stride=1, padding=1)
single_conv_output = F.conv2d(input_tensor, combined_kernel, stride=1, padding=1)

# Compare the results
print("Output feature map (Sequential convolutions):\n", output_feature_map)
print("Output feature map (Combined convolution):\n", single_conv_output)


Output feature map (Sequential convolutions):
 tensor([[[[ 48.,  16., -32.],
          [ 24.,   0., -24.],
          [-48., -16.,  32.]]]])
Output feature map (Combined convolution):
 tensor([[[[-20.,  -8.,  20.],
          [-24.,   0.,  24.],
          [ 20.,   8., -20.]]]])


In [12]:
import torch
def corr2d(X,K):
    h,w = K.shape
    Y = torch.zeros(X.shape[0]-h+1, X.shape[1]-w+1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i+h, j:j+w]*K).sum()
    return Y

X = torch.ones(size=(8, 8))
K = torch.tensor([[1,0],[0,1]])
print(corr2d(corr2d(X,K),K))
K_2 = torch.tensor([[4/3,0,0],[0,4/3,0],[0,0,4/3]])
(corr2d(X,K_2) == corr2d(corr2d(X,K),K)).all()

tensor([[4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4.]])


tensor(True)

In [1]:
import torch
import torch.nn as nn
def comp_conv2d(conv2d, X):
    # (1, 1) indicates that batch size and the number of channels are both 1
    X = X.reshape((1, 1) + X.shape)
    Y = conv2d(X)
    # Strip the first two dimensions: examples and channels
    return Y.reshape(Y.shape[2:])

# 1 row and column is padded on either side, so a total of 2 rows or columns
# are added
conv2d = nn.LazyConv2d(1, kernel_size=(3,3), padding=(1,1))
X = torch.rand(size=(8, 8))
comp_conv2d(conv2d, X).shape



torch.Size([8, 8])

## 1.2 What is the dimensionality of the equivalent single convolution?

## 1.3 Is the converse true, i.e., can you always decompose a convolution into two smaller ones?

# 2. Assume an input of shape 
 and a convolution kernel of shape 
, padding of 
, and stride of 
.

What is the computational cost (multiplications and additions) for the forward propagation?

What is the memory footprint?

What is the memory footprint for the backward computation?

What is the computational cost for the backpropagation?



# 3. By what factor does the number of calculations increase if we double both the number of input channels 
 and the number of output channels 
? What happens if we double the padding?



# 4. Are the variables Y1 and Y2 in the final example of this section exactly the same? Why?



# 5. Express convolutions as a matrix multiplication, even when the convolution window is not 
.



# 6. Your task is to implement fast convolutions with a 
 kernel. One of the algorithm candidates is to scan horizontally across the source, reading a 
-wide strip and computing the 
-wide output strip one value at a time. The alternative is to read a 
 wide strip and compute a 
-wide output strip. Why is the latter preferable? Is there a limit to how large you should choose 
?



# 7. Assume that we have a 
 matrix.

How much faster is it to multiply with a block-diagonal matrix if the matrix is broken up into 
 blocks?

What is the downside of having 
 blocks? How could you fix it, at least partly?