# 7.2 Convolutions for Images

In [None]:
import torch
from torch import nn
from d2l import torch as d2l

### 7.2.1 The Cross-Correlation Operation

In [None]:
def corr2d(X,K):#Input tensor X, kernel tensor K
  """Compute 2D cross-correlation"""
  h, w = K.shape
  Y = torch.zeros((X.shape[0] - h +1, X.shape[1] - w + 1))
  for i in range(Y.shape[0]):
    for j in range(Y.shape[1]):
      Y[i,j] = (X[i:i+h, j:j + w]*K).sum()
  return Y

In [None]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
corr2d(X, K)

tensor([[19., 25.],
        [37., 43.]])

### 7.2.2 Convolutional Layers

In [None]:
class Conv2D(nn.Module):
  def __init__(self, kernel_size):
    super().__init__()
    self.weight = nn.Parameter(torch.rand(kernel_size))
    self.bias = nn.Parameter(torch.zeros(1))

  def forward(self, x):
    return corr2d(x, self.weight) + self.bias

### 7.2.3 Object Edge Detection in Images

In [None]:
X = torch.ones((6,8))
X[:,2:6] = 0
X

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])

In [None]:
K = torch.tensor([[1.0,-1.0]])

In [None]:
Y = corr2d(X,K)
Y

tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])

In [None]:
corr2d(X.t(),K)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

### 7.2.4 Learning a Kernel

In [None]:
# Construct a two-dimensional convolutional layer with 1 output channel and a
# kernel of shape (1, 2). For the sake of simplicity, we ignore the bias here
conv2d = nn.LazyConv2d(1,kernel_size=(1,2),bias=False)

# The two-dimensional convolutional layer uses four-dimensional input and
# output in the format of (example, channel, height, width), where the batch
# size (number of examples in the batch) and the number of channels are both 1
X = X.reshape((1,1,6,8))
Y = Y.reshape((1,1,6,7))
lr = 3e-2 #Learning rate

for i in range(10):
  Y_hat = conv2d(X)
  l = (Y_hat - Y)**2
  conv2d.zero_grad()
  l.sum().backward()
  #Update the kernel
  conv2d.weight.data[:] -= lr * conv2d.weight.grad
  if(i+1) % 2 == 0:
    print(f'epoch {i+1}, loss {l.sum():.3f}')

epoch 2, loss 8.644
epoch 4, loss 2.690
epoch 6, loss 0.959
epoch 8, loss 0.369
epoch 10, loss 0.147


In [None]:
conv2d.weight.data.reshape((1,2))

tensor([[ 0.9510, -1.0295]])

## Discussions

7.2.1 The Cross-Correlation Operation
- Convolutional layers are more accurately described as cross-correlations. In such a layer, an input tensor and a kernel tensor are combined to produce an output tensor through a cross-correlation operation.
- The shape of the kernel window (or convolution window) is given by the height and width of the kernel.
- In the two-dimensional cross-correlation operation, we begin with the convolution window positioned at the upper-left corner of the input tensor and slide it across the input tensor, both from left to right and top to bottom. When the convolution window slides to a certain position, the input subtensor contained in that window and the kernel tensor are multiplied elementwise and the resulting tensor is summed up yielding a single scalar value.
- Note that along each axis, the output size is slightly smaller than the input size.
- The output size is given by the input size $n_h$ x $n_w$ minus the size of the convolution kernel $k_h$ x $k_w$ via
$(n_h - k_h+1)$x$(n_w - k_w +1)$

7.2.2 Convolutional Layers
- A convolutional layer cross-correlates the input and kernel and adds a scalar bias to produce an output.
- The two parameters of a convolutional layer are the kernel and the scalar bias.

7.2.3 Object Edge Detection in Images
- When we perform the cross-correlation operation with the input, if the horizontally adjacent elements are the same, the output is 0. Otherwise, the output is nonzero.

7.2.5 Cross-Correlation and Convolution
- In order to obtain the output of the strict convolution operation, we only need to flip the two-dimensional kernel tensor both horizontally and vertically and then perform the cross-correlation operation with the input tensor.

7.2.6 Feature Map and Receptive Field
- The convolutional layer output is sometimes called a feature map, as it can be regarded as the learned representations (features) in the spatial dimensions to the subsequent layer.
- In CNNs, for any element $x$
 of some layer, its receptive field refers to all the elements (from all the previous layers) that may affect the calculation of $x$ during the forward propagation.
 - Receptive fields detive their name from neurophysiology

 - The core computation required for a convolutional layer is a cross-correlation operation.
 - Convolutions can be used for many purposes, for example detecting edges and lines, blurring images, or sharpening them.