# 1D Convolution Layer Parameters

We compare the behavior of 1D convolution layers in PyTorch on single-channel vs. multi-channel data. The conclusion is that each channel receives its *own* set of weights but all channels share one bias.

In [1]:
import torch
from torch import nn

## Single-channel convolution

First, we create a simple yet illustrative sample.

In [2]:
X1 = torch.tensor([[[1.0, 2.0, 3.0]]])
print(X1)

tensor([[[1., 2., 3.]]])


Next, we take a look at the parameter shapes of a convolution layer with kernel size $2$.

In [3]:
single_conv = nn.Conv1d(1, 1, kernel_size=2)
print(single_conv.weight.shape, single_conv.bias.shape)

torch.Size([1, 1, 2]) torch.Size([1])


Since they are initialized randomly, we set the parameters to simple values, namely all weights to $1$ and the bias to $0$.

In [4]:
single_conv.weight[0] = 1
single_conv.bias[0] = 0
print(single_conv.weight, single_conv.bias, sep="\n")

Parameter containing:
tensor([[[1., 1.]]], grad_fn=<CopySlices>)
Parameter containing:
tensor([0.], grad_fn=<CopySlices>)


This should have the effect of replacing each value with the sum of all values in the convolution window. 
To confirm it, we execute a layer forward pass on the sample.

In [5]:
print(single_conv(X1))

tensor([[[3., 5.]]], grad_fn=<SqueezeBackward1>)


## Multi-channel convolution

Again, we create a sample, but this time two-dimensional.

In [6]:
X2 = torch.tensor([[[1.0, 2.0, 3.0],
                    [4.0, 5.0, 6.0]]])
print(X2)

tensor([[[1., 2., 3.],
         [4., 5., 6.]]])


Now it becomes interesting: What are the parameter shapes of a convolution layer with, again, filter size $2$ but two input channels?

In [7]:
dual_conv = nn.Conv1d(2, 1, kernel_size=2)
print(dual_conv.weight.shape, dual_conv.bias.shape)

torch.Size([1, 2, 2]) torch.Size([1])


We see that each channel of the input received its own set of weights, whereas the bias is shared. For comparability, we will again set all weights to the same value $1$.

In [8]:
dual_conv.weight[0] = 1
dual_conv.bias[0] = 0
print(dual_conv.weight, dual_conv.bias, sep="\n")

Parameter containing:
tensor([[[1., 1.],
         [1., 1.]]], grad_fn=<CopySlices>)
Parameter containing:
tensor([0.], grad_fn=<CopySlices>)


A forward pass on the multi-channel input shows that the one-dimensional convolution proceeds along the time axis with a "width" equal to the number of channels.

In [9]:
print(dual_conv(X2))

tensor([[[12., 16.]]], grad_fn=<SqueezeBackward1>)


The above mechanics are indeed a very sensible implementation of a one-dimensional convolution. Since each channel has its individual weights, it can be convolved differently from all the others. However, a statistical bias should apply to all channels in the same manner, and thus it is shared among the channels.