mainly generated by chatgpt: https://chatgpt.com/share/6711d36b-2da8-8004-ad10-37341331c93a

A Conv2d layer applies convolutional filters to an input, which can be thought of as sliding a filter (kernel) over the input image to produce an output called the feature map.

Key Parameters of torch.nn.Conv2d
- in_channels: Number of channels in the input image (e.g., 3 for RGB images).
- out_channels: Number of filters (feature detectors) the convolution will apply.
- kernel_size: Size of the filter (e.g., 3x3 or 5x5).
- stride: Step size for the filter (default: 1).
- padding: Adds padding around the input image (default: 0).

Example:
1. Input tensor shape:

- Let's assume an input of size (1, 1, 5, 5). This is a batch of 1 image, with 1 channel (grayscale), and a 5x5 pixel grid.

2. Filter (kernel):

- We will use a 3x3 filter (kernel) and set out_channels=1, which means the convolution will produce a single output feature map.
Here’s a code snippet to demonstrate how it works:

reference:
https://www.youtube.com/watch?v=n8Mey4o8gLc

In [8]:
import torch
import torch.nn as nn

# Input tensor of shape (batch_size, in_channels, height, width)
input_tensor = torch.tensor([[[[1.0, 2.0, 3.0, 0.0, 1.0],
                               [0.0, 1.0, 2.0, 3.0, 0.0],
                               [3.0, 0.0, 1.0, 2.0, 1.0],
                               [0.0, 2.0, 3.0, 0.0, 2.0],
                               [1.0, 1.0, 0.0, 3.0, 0.0]]]])

# Conv2d layer with 1 input channel, 1 output channel, and 3x3 kernel
conv_layer = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=0)

# Initialize weights and bias (for demonstration)
conv_layer.weight = torch.nn.Parameter(torch.tensor([[[[1.0, 0.0, -1.0],
                                                       [1.0, 0.0, -1.0],
                                                       [1.0, 0.0, -1.0]]]]))
conv_layer.bias = torch.nn.Parameter(torch.tensor([0.0]))

# Apply the convolution
output = conv_layer(input_tensor)

print("Input tensor shape:", input_tensor.shape)
print("Output tensor shape:", output.shape)
print("Output feature map:\n", output)


Input tensor shape: torch.Size([1, 1, 5, 5])
Output tensor shape: torch.Size([1, 1, 3, 3])
Output feature map:
 tensor([[[[-2., -2.,  4.],
          [-3., -2.,  3.],
          [ 0., -2.,  1.]]]], grad_fn=<ConvolutionBackward0>)


Manually verify the output of a Conv2d layer

In [12]:
# verification
tensor1 = torch.tensor([[1.0, 2.0, 3.0],
                        [0.0, 1.0, 2.0],
                        [3.0, 0.0, 1.0]])

tensor2 = torch.tensor([[2.0, 3.0, 0.0],
                        [1.0, 2.0, 3.0],
                        [0.0, 1.0, 2.0]])

kernel = torch.tensor([[1.0, 0.0, -1.0],
                      [1.0, 0.0, -1.0],
                      [1.0, 0.0, -1.0]])

# Element-wise multiplication
elementwise_product1 = torch.mul(tensor1, kernel)
# Sum of the element-wise multiplication
sum_result1 = torch.sum(elementwise_product1)
print("Element-wise product:\n", elementwise_product1)
print("Sum of element-wise product:", sum_result1)

# Element-wise multiplication
elementwise_product2 = torch.mul(tensor2, kernel)
# Sum of the element-wise multiplication
sum_result2 = torch.sum(elementwise_product2)
print("Element-wise product:\n", elementwise_product2)
print("Sum of element-wise product:", sum_result2)


Element-wise product:
 tensor([[ 1.,  0., -3.],
        [ 0.,  0., -2.],
        [ 3.,  0., -1.]])
Sum of element-wise product: tensor(-2.)
Element-wise product:
 tensor([[ 2.,  0., -0.],
        [ 1.,  0., -3.],
        [ 0.,  0., -2.]])
Sum of element-wise product: tensor(-2.)


In [18]:
# sample with padding
# Define the input tensor (batch_size, channels, height, width)
input_tensor = torch.randn(1, 1, 5, 5)  # Example: batch_size = 1, 1 channel, 5x5 image

# Define the convolutional layer
conv_layer = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=1)

# Perform the convolution operation
output_tensor = conv_layer(input_tensor)

print(output_tensor.shape)


torch.Size([1, 1, 5, 5])


In [13]:
# sample with 3 input channels, 2 output channels
# Input tensor of shape (batch_size, in_channels, height, width)
# Let's assume batch_size=1, in_channels=3 (RGB image), and each channel is 3x3
input_tensor = torch.tensor([[[[1.0, 2.0, 3.0],
                               [4.0, 5.0, 6.0],
                               [7.0, 8.0, 9.0]],   # Red Channel
                             
                              [[9.0, 8.0, 7.0],
                               [6.0, 5.0, 4.0],
                               [3.0, 2.0, 1.0]],   # Green Channel
                             
                              [[0.0, 1.0, 0.0],
                               [1.0, 0.0, 1.0],
                               [0.0, 1.0, 0.0]]]]) # Blue Channel

# Conv2d layer with 3 input channels and 2 output channels, kernel size 3x3
conv_layer = nn.Conv2d(in_channels=3, out_channels=2, kernel_size=3, stride=1, padding=0)

# Initialize weights and bias for the 2 output channels (filters)
conv_layer.weight = torch.nn.Parameter(torch.tensor([[[[1.0, 0.0, -1.0],
                                                       [1.0, 0.0, -1.0],
                                                       [1.0, 0.0, -1.0]],   # Filter for Red Channel

                                                      [[0.0, 1.0, 0.0],
                                                       [0.0, 1.0, 0.0],
                                                       [0.0, 1.0, 0.0]],   # Filter for Green Channel
                                                    
                                                      [[1.0, 0.0, 1.0],
                                                       [1.0, 0.0, 1.0],
                                                       [1.0, 0.0, 1.0]]],  # Filter for Blue Channel

                                                     [[[1.0, -1.0, 1.0],
                                                       [1.0, -1.0, 1.0],
                                                       [1.0, -1.0, 1.0]],  # Filter for Red Channel

                                                      [[-1.0, 1.0, -1.0],
                                                       [-1.0, 1.0, -1.0],
                                                       [-1.0, 1.0, -1.0]], # Filter for Green Channel

                                                      [[0.0, 0.0, 1.0],
                                                       [0.0, 0.0, 1.0],
                                                       [0.0, 0.0, 1.0]]]]))  # Filter for Blue Channel

conv_layer.bias = torch.nn.Parameter(torch.tensor([0.0, 0.0])) # Bias for both filters

# Apply the convolution
output = conv_layer(input_tensor)

print("Input tensor shape:", input_tensor.shape)
print("Output tensor shape:", output.shape)
print("Output feature maps:\n", output)


Input tensor shape: torch.Size([1, 3, 3, 3])
Output tensor shape: torch.Size([1, 2, 1, 1])
Output feature maps:
 tensor([[[[11.]],

         [[ 1.]]]], grad_fn=<ConvolutionBackward0>)


In [16]:
# verification
r = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

r_filter1 = torch.tensor([[1.0, 0.0, -1.0],
                          [1.0, 0.0, -1.0],
                          [1.0, 0.0, -1.0]])

r_product1 = torch.mul(r, r_filter1)
r_sum1 = torch.sum(r_product1)
print("Element-wise product:\n", r_product1)
print("Sum of element-wise product:", r_sum1)

g = torch.tensor([[9.0, 8.0, 7.0],
                  [6.0, 5.0, 4.0],
                  [3.0, 2.0, 1.0]])

g_filter1 = torch.tensor([[0.0, 1.0, 0.0],
                          [0.0, 1.0, 0.0],
                          [0.0, 1.0, 0.0]])

g_product1 = torch.mul(g, g_filter1)
g_sum1 = torch.sum(g_product1)
print("Element-wise product:\n", g_product1)
print("Sum of element-wise product:", g_sum1)

b = torch.tensor([[0.0, 1.0, 0.0],
                  [1.0, 0.0, 1.0],
                  [0.0, 1.0, 0.0]])

b_filter1 = torch.tensor([[1.0, 0.0, 1.0],
                          [1.0, 0.0, 1.0],
                          [1.0, 0.0, 1.0]])

b_product1 = torch.mul(b, b_filter1)
b_sum1 = torch.sum(b_product1)
print("Element-wise product:\n", b_product1)
print("Sum of element-wise product:", b_sum1)

sum = r_sum1 + g_sum1 + b_sum1
print("Sum of all channels:", sum)



Element-wise product:
 tensor([[ 1.,  0., -3.],
        [ 4.,  0., -6.],
        [ 7.,  0., -9.]])
Sum of element-wise product: tensor(-6.)
Element-wise product:
 tensor([[0., 8., 0.],
        [0., 5., 0.],
        [0., 2., 0.]])
Sum of element-wise product: tensor(15.)
Element-wise product:
 tensor([[0., 0., 0.],
        [1., 0., 1.],
        [0., 0., 0.]])
Sum of element-wise product: tensor(2.)
Sum of all channels: tensor(11.)


In [17]:
# Input tensor of shape (batch_size, in_channels, height, width)
# Let's assume batch_size=1, in_channels=3 (RGB image), and each channel is 3x3
input_tensor = torch.tensor([[[[1.0, 2.0, 3.0],
                               [4.0, 5.0, 6.0],
                               [7.0, 8.0, 9.0]],  # Red Channel
                             
                              [[9.0, 8.0, 7.0],
                               [6.0, 5.0, 4.0],
                               [3.0, 2.0, 1.0]],  # Green Channel
                             
                              [[0.0, 1.0, 0.0],
                               [1.0, 0.0, 1.0],
                               [0.0, 1.0, 0.0]]]])  # Blue Channel

# Conv2d layer with 3 input channels and 3 output channels, kernel size 3x3
conv_layer = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=0)

# Initialize weights for 3 output channels (each has 3 filters for 3 input channels)
conv_layer.weight = torch.nn.Parameter(torch.tensor([
    # Filters for output channel 1
    [[[1.0, 0.0, -1.0], [1.0, 0.0, -1.0], [1.0, 0.0, -1.0]],   # Filter for Red Channel
     [[0.0, 1.0, 0.0], [0.0, 1.0, 0.0], [0.0, 1.0, 0.0]],     # Filter for Green Channel
     [[1.0, 0.0, 1.0], [1.0, 0.0, 1.0], [1.0, 0.0, 1.0]]],    # Filter for Blue Channel
    
    # Filters for output channel 2
    [[[1.0, -1.0, 1.0], [1.0, -1.0, 1.0], [1.0, -1.0, 1.0]],  # Filter for Red Channel
     [[-1.0, 1.0, -1.0], [-1.0, 1.0, -1.0], [-1.0, 1.0, -1.0]], # Filter for Green Channel
     [[0.0, 0.0, 1.0], [0.0, 0.0, 1.0], [0.0, 0.0, 1.0]]],   # Filter for Blue Channel
    
    # Filters for output channel 3
    [[[0.0, 1.0, 0.0], [1.0, 0.0, 1.0], [0.0, 1.0, 0.0]],     # Filter for Red Channel
     [[1.0, 0.0, -1.0], [1.0, 0.0, -1.0], [1.0, 0.0, -1.0]],   # Filter for Green Channel
     [[-1.0, 1.0, -1.0], [-1.0, 1.0, -1.0], [-1.0, 1.0, -1.0]]]]))  # Filter for Blue Channel

conv_layer.bias = torch.nn.Parameter(torch.tensor([0.0, 0.0, 0.0]))  # Bias for all 3 output channels

# Apply the convolution
output = conv_layer(input_tensor)

print("Input tensor shape:", input_tensor.shape)
print("Output tensor shape:", output.shape)
print("Output feature maps:\n", output)


Input tensor shape: torch.Size([1, 3, 3, 3])
Output tensor shape: torch.Size([1, 3, 1, 1])
Output feature maps:
 tensor([[[[11.]],

         [[ 1.]],

         [[26.]]]], grad_fn=<ConvolutionBackward0>)


torch.nn.functional.conv2d
Key points:
- Input tensor: Needs to be of shape (batch_size, in_channels, height, width).
- Kernel: Needs to be of shape (out_channels, in_channels, kernel_height, kernel_width).
- Output: The result of the convolution is an output tensor of reduced size, depending on the kernel size and stride.

In [19]:
# nn.functional sample

import torch.nn.functional as F

# Define the input tensor (batch_size, channels, height, width)
input_tensor = torch.randn(1, 1, 5, 5)  # Example: batch_size=1, 1 channel, 5x5 image

# Define the kernel (filter)
kernel = torch.randn(1, 1, 3, 3)  # 1 output channel, 1 input channel, 3x3 kernel

# Perform the convolution (with no padding or stride by default)
output_tensor = F.conv2d(input_tensor, kernel, stride=1, padding=0)

print(output_tensor.shape)


torch.Size([1, 1, 3, 3])


In NumPy, you can perform convolution using the numpy.convolve() function for 1D arrays, or numpy's scipy.signal.convolve() for higher-dimensional arrays (e.g., 2D or 3D).

In [20]:
import numpy as np

# Define two 1D arrays
a = np.array([1, 2, 3])
b = np.array([0, 1, 0.5])

# Perform convolution
result = np.convolve(a, b, mode='full')

print(result)


[0.  1.  2.5 4.  1.5]


Parameters of numpy.convolve():
- a: First input array.
- b: Second input array (the kernel).
- mode:
  - 'full' (default): Returns the full convolution result.
  - 'valid': Returns only elements where the arrays fully overlap.
  - 'same': Returns the output of the same size as a.

In [22]:
from scipy import signal
import numpy as np

# Define a 2D array (input image)
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Define a 2D kernel
kernel = np.array([[0, 1, 0],
                   [1, -4, 1],
                   [0, 1, 0]])

# Perform 2D convolution
result = signal.convolve(a, kernel, mode='same')

print(result)


[[  2   1  -4]
 [ -3   0  -7]
 [-16 -11 -22]]


Step-by-Step Process:
1. Padding the Input Array:

Since we're using mode='same', the output array will have the same size as the input array (3x3). To achieve this, we need to pad the input array so that the kernel can be applied to every element in the original array without going out of bounds.
The kernel is 3x3, so we pad the input array with a 1-pixel border of zeros. The padding is necessary because the kernel will partially overlap the edges of the input.

The padded input array looks like this:

In [None]:
[[0, 0, 0, 0, 0],
 [0, 1, 2, 3, 0],
 [0, 4, 5, 6, 0],
 [0, 7, 8, 9, 0],
 [0, 0, 0, 0, 0]]

2. Sliding the Kernel:

Now, we slide the kernel over the padded input array. For each position of the kernel, we take the region of the input that overlaps with the kernel, multiply each corresponding element of the kernel and the input, and sum the results.
The result of each position is stored in the corresponding location in the output array.