# Learning about Convolutions

Learning about convolutions step-by-step.

In [1]:
import torch
import torch.nn as nn

  import pynvml  # type: ignore[import]


In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

In [3]:
device

device(type='cuda', index=0)

## 1D Convolutions

Lets start with a 3x4 tensor

In [4]:
x = torch.tensor([[1., -2., 3., -4.],
                  [1., -2., 3., -4.],
                  [1., -2., 3., -4.]])

In [5]:
x.shape

torch.Size([3, 4])

For `Conv1d()`, `in_channels` has to equal 3 since we have 3 rows

In [6]:
conv = torch.nn.Conv1d(in_channels = 3, out_channels = 3, kernel_size = 1)
conv(x)

tensor([[-1.1443,  2.6112, -3.6480,  5.1149],
        [ 0.1373, -1.4772,  1.2136, -2.5536],
        [-0.5008,  1.3594, -1.7409,  2.5995]], grad_fn=<SqueezeBackward1>)

If we change the `out_channels` we control the number of rows of the output tensor

In [7]:
conv = torch.nn.Conv1d(in_channels = 3, out_channels = 1, kernel_size = 1)
conv(x)

tensor([[ 0.5230, -2.5763,  2.5892, -4.6425]], grad_fn=<SqueezeBackward1>)

Now let's change the input tensor just a little bit at `x[1,0]`

In [8]:
x = torch.tensor([[1., -2., 3., -4.],
                  [5., -2., 3., -4.],
                  [1., -2., 3., -4.]])

Now run the same `Conv1d()` over this tensor. 

(Note that we have not re-initialized the Conv1d function so it is the same Conv1d instance as above.)

In [9]:
conv(x)

tensor([[ 2.2879, -2.5763,  2.5892, -4.6425]], grad_fn=<SqueezeBackward1>)

Notice that only the first item of the output tensor has changed. So Conv1d processes the tensor "column-by-column" with an expectation of how many rows/dimensions are in each column based on value set to `in_channels`.


### Kernel Size Changed

But now let's run everything again with a different kernel size.

In [10]:
x = torch.tensor([[1., -2., 3., -4.],
                  [1., -2., 3., -4.],
                  [1., -2., 3., -4.]])

In [11]:
conv = torch.nn.Conv1d(in_channels = 3, out_channels = 1, kernel_size = 2)
conv(x)

tensor([[ 0.6715, -1.9609,  2.0787]], grad_fn=<SqueezeBackward1>)

Note the size/shape of the output has changed from (1,4) to (1,3).

 1. The first item of the output is processed from the first and second columns of x
 2. Second item processed from second and third.
 3. Third item processed from third and fourth.

This is because the default `stride` value of `Conv1d()` is set equal to 1. `kernel_size` determines the size of the window that passes over the tensor. `stride` controls how the window passes over.

In [12]:
x = torch.tensor([[1., -2., 3., -4.],
                  [5., -2., 3., -4.],
                  [1., -2., 3., -4.]])

In [13]:
conv(x)

tensor([[ 1.5697, -1.9609,  2.0787]], grad_fn=<SqueezeBackward1>)

Again only the first value has changed. But now let's change numbers in the second column of x

In [14]:
x = torch.tensor([[1., -2., 3., -4.],
                  [1., 2.5, 3., -4.],
                  [1., -2., 3., -4.]])

In [15]:
conv(x)

tensor([[ 2.3560, -0.9505,  2.0787]], grad_fn=<SqueezeBackward1>)

Now the first and second values are different. Since both the first and second values are dependent on values in the second column of x (i.e., the kernel for the first and second outputs both include the second column) this makes sense.

### Stride Changed

In [16]:
x = torch.tensor([[1., -2., 3., -4.],
                  [1., -2., 3., -4.],
                  [1., -2., 3., -4.]])

In [17]:
conv = torch.nn.Conv1d(in_channels = 3, out_channels = 1, kernel_size = 2, stride =2)
conv(x)

tensor([[-0.1272, -0.3867]], grad_fn=<SqueezeBackward1>)

Notice that changing stride affects the size of the output.

In [18]:
x = torch.tensor([[1., -2., 3., -4.],
                  [1., 2.5, 3., -4.],
                  [1., -2., 3., -4.]])

In [19]:
conv(x)

tensor([[ 0.5512, -0.3867]], grad_fn=<SqueezeBackward1>)

Now the first output value depends on x's first and second columns' values, and second output depends on x's third and fourth columns' values. So changing the second column's values only affects the first output value.

## 2D Convolutions

So wait a minute...a 1D convolution can still "handle" a 2-dimensional input or even higher. The "1D" in 1D convolutions doesn't refer to the the 1D size of the input, but the 1D size of the kernel.

Let's try everything again with `Conv2d`

In [20]:
# Notice the tensor is now 3D instead of 2D.
x = torch.tensor([[[1., -2., 3., -4.],
                  [1., -2., 3., -4.],
                  [1., -2., 3., -4.]]])

In [21]:
x.shape

torch.Size([1, 3, 4])

In [22]:
# NOTE: this won't work! in_channels has to equal 1, the first dimension of torch.Size. Think of this as a grayscale image. in_channels = 3 for RGB images. 
# conv = torch.nn.Conv2d(in_channels = 3, out_channels = 1, kernel_size = 2)

In [23]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 2)

In [24]:
conv(x)

tensor([[[ 0.1128,  0.0286, -0.0631],
         [ 0.1128,  0.0286, -0.0631]]], grad_fn=<SqueezeBackward1>)

In [25]:
conv(x).shape

torch.Size([1, 2, 3])

Now let's change `out_channels`

In [26]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 2, kernel_size = 2)

In [27]:
conv(x)

tensor([[[ 0.9924, -1.3988,  2.5015],
         [ 0.9924, -1.3988,  2.5015]],

        [[-0.1061,  1.2732, -0.9195],
         [-0.1061,  1.2732, -0.9195]]], grad_fn=<SqueezeBackward1>)

### Kernel Size Change

In [28]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = (2,3))

In [29]:
conv(x)

tensor([[[ 1.2949, -2.4376],
         [ 1.2949, -2.4376]]], grad_fn=<SqueezeBackward1>)

In [30]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = (3,3))

In [31]:
conv(x)

tensor([[[ 1.0203, -1.4125]]], grad_fn=<SqueezeBackward1>)

At the risk of confusing things, you can say that a Conv1D for a 2D tensor of size (R,C) with `kernel_size` = X is the same as a Conv2D where `kernel_size` = (R,X) for a 3D tensor of size (1, R, C) for any value of X

### Stride Changed

In [32]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = (3,3), stride=2)

In [33]:
conv(x)

tensor([[[1.6488]]], grad_fn=<SqueezeBackward1>)

Notice that when the kernel "window goes outside the tensor" it doesn't break, but just doesn't return any values

In [34]:
# Notice the tensor is now 3D instead of 2D.
x = torch.tensor([[[1., -2., 3.],
                  [1., -2., 3.],
                  [1., -2., 3.]]])

In [35]:
conv(x)

tensor([[[1.6488]]], grad_fn=<SqueezeBackward1>)

Given the (3,3) kernel, this means that the fourth column's values essentially don't matter to the output of the 2D convolution

### Padding Changed

Start with original tensor and a (3,3) kernel convolution

In [36]:
# Notice the tensor is now 3D instead of 2D.
x = torch.tensor([[[1., -2., 3., -4.],
                  [1., -2., 3., -4.],
                  [1., -2., 3., -4.]]])

In [37]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 3)

In [38]:
conv(x)

tensor([[[ 0.2441, -0.9203]]], grad_fn=<SqueezeBackward1>)

Change padding from 0 to 1. This adds 1 extra cell on the "border" of the tensor. Each cell is filled with value of 0 (`padding_mode='zeros'`)

In [39]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 3, padding=1)

In [40]:
conv(x)

tensor([[[-0.7595,  1.1359, -1.5402,  0.5786],
         [-1.5924,  2.3230, -3.0816,  0.8236],
         [-1.1929,  1.6140, -2.0629, -0.2663]]], grad_fn=<SqueezeBackward1>)

### Dilation

Start with original tensor and a (2,2) kernel convolution

NOTE: Default `dilation=1` not 0!!!!

In [41]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 2)

In [42]:
conv(x)

tensor([[[-0.5161,  1.7815, -1.4938],
         [-0.5161,  1.7815, -1.4938]]], grad_fn=<SqueezeBackward1>)

In [43]:
conv = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 2, dilation=2)

In [44]:
conv(x)

tensor([[[ 2.3377, -2.9674]]], grad_fn=<SqueezeBackward1>)

See "Dilated convolution animations" in https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md for visualization of dilations

## Conv3D

In [45]:
x = torch.rand((7,100,100,100))
x.to(device)

tensor([[[[9.0849e-02, 2.5379e-01, 7.5377e-01,  ..., 6.2287e-01,
           9.6807e-01, 1.3761e-01],
          [6.0349e-01, 4.6936e-01, 4.2243e-01,  ..., 4.3136e-01,
           6.6477e-01, 7.3146e-01],
          [1.3589e-02, 1.5968e-01, 5.6006e-01,  ..., 3.0035e-02,
           6.3368e-01, 8.4405e-01],
          ...,
          [7.3968e-01, 9.6545e-03, 3.2401e-01,  ..., 9.5531e-01,
           5.2746e-01, 8.2502e-01],
          [8.6631e-01, 5.6567e-01, 2.4629e-01,  ..., 9.9745e-01,
           8.5796e-01, 2.6964e-01],
          [4.4381e-01, 1.5597e-01, 7.7750e-01,  ..., 9.2091e-01,
           5.7407e-01, 3.3720e-01]],

         [[5.4880e-01, 4.2156e-02, 2.9177e-01,  ..., 2.2486e-01,
           1.8513e-01, 6.5476e-01],
          [2.5460e-01, 1.6827e-01, 6.1202e-01,  ..., 9.6766e-01,
           2.2583e-01, 7.5692e-01],
          [9.8859e-01, 2.0411e-01, 7.8294e-01,  ..., 2.7890e-01,
           6.7005e-01, 7.2905e-02],
          ...,
          [5.5856e-01, 8.3296e-01, 2.7099e-01,  ..., 9.9296

In [46]:
x = x.unsqueeze(0)
x.shape

torch.Size([1, 7, 100, 100, 100])

In [47]:
(N, C, D, H, W) = x.shape

In [48]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=13, stride=3)
conv(x).shape # 0.6s

torch.Size([1, 1, 30, 30, 30])

See Shape section in: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html

In [49]:
# D_out = ((D + 2*padding - dilation*(kernel_size-1) - 1)/stride) + 1
# H_out = ((H + 2*padding - dilation*(kernel_size-1) - 1)/stride) + 1
# W_out = ((W + 2*padding - dilation*(kernel_size-1) - 1)/stride) + 1

In [50]:
# D = 500

# padding = 0
# dilation = 1
# stride = 16

# kernel_size = 260

# D_out = ((D + 2*padding - dilation*(kernel_size-1) - 1)/stride) + 1

# D_out = 16

In [51]:
D = 30
D_out = 16
padding = 0
dilation = 1
stride = 1

kernel_size = (((D_out - 1)*stride - D + 1 - 2*padding)/(-dilation)) + 1

kernel_size

15.0

In [52]:
D = 64
D_out = 4
padding = 0
dilation = 1
stride = 1

kernel_size = (((D_out - 1)*stride - D + 1 - 2*padding)/(-dilation)) + 1

kernel_size

61.0

In [53]:
D = 4
D_out = 1
padding = 0
dilation = 1
stride = 1

kernel_size = (((D_out - 1)*stride - D + 1 - 2*padding)/(-dilation)) + 1

kernel_size

4.0

In [54]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=10)
conv(x).shape # 0.6s

torch.Size([1, 1, 91, 91, 91])

In [55]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=50)
conv(x).shape # 22.9s

torch.Size([1, 1, 51, 51, 51])

In [56]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=10, stride=10)
conv(x).shape # 0.0s

torch.Size([1, 1, 10, 10, 10])

### Larger Input Tensor

Note that code has been commented out because of their long runtimes.

In [57]:
x = torch.rand((7,500,500,500))

In [58]:
x = x.unsqueeze(0)
x.shape

torch.Size([1, 7, 500, 500, 500])

In [59]:
(N, C, D, H, W) = x.shape

In [60]:
x.to(device)

tensor([[[[[8.6870e-01, 5.5088e-01, 2.8842e-01,  ..., 3.8341e-01,
            7.5726e-01, 4.9601e-02],
           [9.5913e-01, 9.6857e-01, 6.3369e-02,  ..., 4.3685e-01,
            7.5039e-01, 7.6559e-01],
           [3.4455e-01, 2.7932e-01, 1.7804e-01,  ..., 1.3035e-01,
            4.4856e-01, 6.6264e-02],
           ...,
           [2.0809e-01, 2.0383e-01, 3.8212e-01,  ..., 2.6912e-01,
            8.5388e-01, 3.7634e-01],
           [2.1563e-01, 1.6354e-01, 9.9298e-01,  ..., 3.1063e-01,
            9.0901e-01, 9.6609e-01],
           [6.4401e-02, 5.4281e-01, 8.9284e-01,  ..., 6.6488e-01,
            6.2300e-01, 8.8792e-01]],

          [[1.9460e-02, 2.3498e-01, 2.0282e-01,  ..., 8.7067e-01,
            8.7540e-01, 6.6644e-01],
           [2.5885e-01, 1.4670e-01, 5.2012e-02,  ..., 2.0492e-01,
            9.3198e-01, 4.1850e-01],
           [1.5403e-01, 6.0615e-01, 6.9388e-01,  ..., 3.0092e-01,
            2.8021e-01, 9.3303e-01],
           ...,
           [4.5967e-01, 5.8648e-01, 6.7

In [61]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=10)

In [62]:
# x - torch.Size([1, 7, 500, 500, 500])

# conv(x).shape # torch.Size([1, 1, 491, 491, 491])

# and this takes ~2min to complete

In [63]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=100)

In [64]:
# x - torch.Size([1, 7, 500, 500, 500])

# conv(x).shape 

# takes more than 8 min!!!

In [65]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=100, stride=50)

In [66]:
conv(x).shape # torch.Size([1, 1, 9, 9, 9])

# takes 2 seconds!

torch.Size([1, 1, 9, 9, 9])

In [67]:
conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=100, stride=10)

In [68]:
# conv(x).shape # torch.Size([1, 1, 41, 41, 41])

# and this takes ~2min to complete

## Playing with 3D data

In [69]:
NUM_X = 100
NUM_Y = 100
NUM_Z = 100
BLOCK_INFO = 7
NUM_ORIENTATION = 2

BLOCK_TYPES = 6

In [70]:
x = torch.rand((BLOCK_INFO,NUM_X,NUM_Y,NUM_Z))
x = x.unsqueeze(0)

(N, C, D, H, W) = x.shape

x.to(device)
x.shape

torch.Size([1, 7, 100, 100, 100])

In [71]:
conv1 = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=1)
conv1(x).shape # torch.Size([1, 1, 100, 100, 100])

torch.Size([1, 1, 100, 100, 100])

In [72]:
conv2 = nn.Conv3d(in_channels=1, out_channels=BLOCK_TYPES * NUM_ORIENTATION, kernel_size=1)
conv2(conv1(x)).shape # torch.Size([1, 10, 100, 100, 100])

torch.Size([1, 12, 100, 100, 100])

In [73]:
x = conv1(x)
x = conv2(x)
x = torch.reshape(x, (1, BLOCK_TYPES, NUM_ORIENTATION, NUM_X, NUM_Y, NUM_Z))
x.shape

torch.Size([1, 6, 2, 100, 100, 100])

In [74]:
# conv = nn.Conv3d(in_channels=C, out_channels=1, kernel_size=13, stride=3)
# conv(x).shape # torch.Size([1, 1, 30, 30, 30])

In [75]:
# conv2 = nn.Conv3d(in_channels=1, out_channels=BLOCK_TYPES * NUM_ORIENTATION, kernel_size=15, stride=1)
# conv2(conv(x)).shape # torch.Size([1, 10, 16, 16, 16])

In [76]:
# conv_t = torch.nn.ConvTranspose3d(in_channels=5, out_channels=10, kernel_size=1)
# conv_t(conv(x)).shape

In [77]:
# x = conv(x)
# x = conv2(x)
# x = torch.reshape(x, (1, 5, 2, 16, 16, 16))
# x.shape

In [78]:
x=x.squeeze()

# Find the index of the maximum value
max_index = torch.argmax(x)

# Convert the flat index to multidimensional indices
indices = []
for dim_size in reversed(x.shape):
    indices.append((max_index % dim_size).item())
    max_index //= dim_size

# Reverse the list of indices to match the tensor's shape
indices.reverse()

print("Indices of the maximum value:", indices)

Indices of the maximum value: [2, 1, 79, 96, 48]


In [79]:
max_value = torch.max(x)
max_value.item()

1.3865199089050293

In [80]:
x[3, 1, 89, 32, 15]

tensor(0.3554, grad_fn=<SelectBackward0>)