# Torch Layers

PyTorch is build around torch.nn, which have classes like `torch.nn.Module`, `torch.nn.Parameter`. Think of these as basic building blocks, or Lego blocks, that allow you to build complex structures. Layers, Models inherit from `torch.nn.Module`, which already defines a lot of useful methods and allows us to build our own blocks with as little as defining a `__init__()` and a `forward()` methods.

For example, we can create a new layer with something as simple as:
```python
import torch
class BasicLinear(torch.nn.Module): # Inherits from nn.Module. Almost everything in PyTorch is a nn.Module
    def __init__(self):
        super().__init__()
        self.weights = torch.nn.Parameter( # defines weights Parameter
            data=torch.randn(1), # starts with random value
            requires_grad=True # activate autograd, which allows us to track and update this parameter during backward pass
        )
        self.bias = torch.nn.Parameter( # defines bias Parameter, same as above but with different syntax
            torch.randn(1, dtype=torch.float32), # explicitly sets the dtype to float32 (the default)
            requires_grad=True
        )
    
    def forward(self, x: torch.Tensor): # defines the way data will be transformed in the layer or block
        # linear = xA^T + b
        # A^T
        weights = self.weights.t() # transposes weights tensor
        # x
        input_tensor = x
        # b
        bias = self.bias

        # xA^T
        mul = torch.matmul(input_tensor, weights) # calculates tensor dot product between transposed weights and input

        # xA^T + b
        output = mul + bias # adds offset (bias)

        return output
```
In the forward method we follow the [implementation of the `nn.functional.linear` layer](https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html), which is implemented in C++. The [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) uses the ``nn.functional.linear` layer (written in C++) in its forward method. But at the end of the day, the transformation being made is:

$$y=x A^{T}+b.$$

In this notebook we are gonna see how we can create our own layers, and explore some of the predefined layers included with PyTorch.

In [123]:
# Ensures versions are correct
! pip install torch==2.3.0 numpy==1.25.2 pillow==9.4.0 torchvision==0.18

import torch
import numpy as np
import PIL

print(f"Torch version: {torch.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"PIL version: {PIL.__version__}")
print(f"GPU enabled: {torch.cuda.is_available()}")

Torch version: 2.3.0+cu121
Numpy version: 1.25.2
PIL version: 9.4.0
GPU enabled: False


## Basic Building Blocks



### Custom Layers

You can build your own layers using class definitions that inherit from torch.nn.Module. These can be used as building blocks for larger networks.

In [124]:
import torch

torch.manual_seed(42)


class BasicLinear(
    torch.nn.Module
):  # Inherits from nn.Module. Almost everything in PyTorch is a nn.Module
    def __init__(self, input_features, output_features):
        super().__init__()
        self.weights = torch.nn.Parameter(  # defines weights Parameter
            data=torch.randn(
                size=(output_features, input_features)
            ),  # starts with random value
            requires_grad=True,  # activate autograd, which allows us to track and update this parameter during backward pass
        )
        self.bias = torch.nn.Parameter(  # defines bias Parameter, same as above but with different syntax
            torch.randn(
                size=(output_features,), dtype=torch.float32
            ),  # explicitly sets the dtype to float32 (the default)
            requires_grad=True,
        )

    def forward(
        self, x: torch.Tensor
    ) -> torch.Tensor:  # defines the way data will be transformed in the layer or block
        # linear = xA^T + b
        # A^T
        weights = self.weights.t()  # transposes weights tensor
        # x
        input_tensor = x
        # b
        bias = self.bias

        # xA^T
        mul = torch.matmul(
            input_tensor, weights
        )  # calculates tensor dot product between transposed weights and input

        # xA^T + b
        output = mul + bias  # adds offset (bias)

        return output

Don't worry about the dimensions yet, we will understand it more later on.

In [125]:
layer = BasicLinear(input_features=10, output_features=5)
layer

BasicLinear()

In [126]:
for param in layer.parameters():
    print(param, param.shape)

Parameter containing:
tensor([[ 1.9269,  1.4873,  0.9007, -2.1055,  0.6784, -1.2345, -0.0431, -1.6047,
         -0.7521,  1.6487],
        [-0.3925, -1.4036, -0.7279, -0.5594, -0.7688,  0.7624,  1.6423, -0.1596,
         -0.4974,  0.4396],
        [-0.7581,  1.0783,  0.8008,  1.6806,  1.2791,  1.2964,  0.6105,  1.3347,
         -0.2316,  0.0418],
        [-0.2516,  0.8599, -1.3847, -0.8712,  0.0780,  0.5258, -0.4880,  1.1914,
         -0.8140, -0.7360],
        [-0.8371, -0.9224, -0.0635,  0.6756, -0.0978,  1.8446, -1.1845,  1.3835,
         -1.2024,  0.7078]], requires_grad=True) torch.Size([5, 10])
Parameter containing:
tensor([-0.5687,  1.2580, -1.5890, -1.1208,  0.8423], requires_grad=True) torch.Size([5])


In [127]:
data = torch.randn(
    size=(5, 10),
)
data

tensor([[ 0.3383,  1.6992,  0.0109, -0.3387, -1.3407, -0.5854,  0.5362,  0.5246,
         -1.4692,  1.4332],
        [ 0.7440, -0.4816, -1.0495,  0.6039, -1.7223, -0.8278, -0.4976,  0.4747,
         -2.5095,  0.4880],
        [ 0.7846,  0.0286,  0.6408,  0.5832,  0.2191,  0.5526, -0.1853,  0.7528,
          0.4048,  0.1785],
        [ 0.2649,  1.2732, -0.8905,  0.4098,  1.9312,  1.0119, -1.4364, -1.1299,
         -0.1360,  1.6354],
        [ 0.6547,  0.5760,  1.1415,  0.0186, -1.8058,  0.9254, -0.3753,  1.0331,
         -0.6867,  0.6368]])

In [128]:
output = layer(data)
print(output.shape)
output

torch.Size([5, 5])


tensor([[ 5.7495,  1.6639, -1.6199,  0.6273,  0.6853],
        [-0.2628,  3.3308, -4.8425,  1.1273,  4.3891],
        [-1.4089, -0.1772,  1.1424, -1.8549,  2.4110],
        [ 4.9050, -2.3195,  1.0556, -0.2721,  2.9166],
        [ 0.0959,  1.2854, -0.2936, -0.5371,  4.7370]], grad_fn=<AddBackward0>)

### Building Networks and complex blocks

Now that we have our custom layer, we can now use it to build more complex structures, like blocks or even entire networks.

Let's start by defining a small neural network using our previously defined custom layer.

In [129]:
import torch

torch.manual_seed(42)


class NeuralNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = BasicLinear(
            input_features=10,  # expects input with shape (n, 10)
            output_features=15,
        )
        self.layer2 = BasicLinear(
            input_features=15,
            output_features=20,  # should output with shape (n, 20)
        )

    def forward(self, x: torch.Tensor):
        output_layer1 = self.layer1(x)
        output_layer2 = self.layer2(output_layer1)
        return output_layer2

In [130]:
model = NeuralNet()
for param in model.parameters():
    print(param, param.shape)
model

Parameter containing:
tensor([[ 1.9269e+00,  1.4873e+00,  9.0072e-01, -2.1055e+00,  6.7842e-01,
         -1.2345e+00, -4.3067e-02, -1.6047e+00, -7.5214e-01,  1.6487e+00],
        [-3.9248e-01, -1.4036e+00, -7.2788e-01, -5.5943e-01, -7.6884e-01,
          7.6245e-01,  1.6423e+00, -1.5960e-01, -4.9740e-01,  4.3959e-01],
        [-7.5813e-01,  1.0783e+00,  8.0080e-01,  1.6806e+00,  1.2791e+00,
          1.2964e+00,  6.1047e-01,  1.3347e+00, -2.3162e-01,  4.1759e-02],
        [-2.5158e-01,  8.5986e-01, -1.3847e+00, -8.7124e-01, -2.2337e-01,
          1.7174e+00,  3.1888e-01, -4.2452e-01,  3.0572e-01, -7.7459e-01],
        [-1.5576e+00,  9.9564e-01, -8.7979e-01, -6.0114e-01, -1.2742e+00,
          2.1228e+00, -1.2347e+00, -4.8791e-01, -9.1382e-01, -6.5814e-01],
        [ 7.8024e-02,  5.2581e-01, -4.8799e-01,  1.1914e+00, -8.1401e-01,
         -7.3599e-01, -1.4032e+00,  3.6004e-02, -6.3477e-02,  6.7561e-01],
        [-9.7807e-02,  1.8446e+00, -1.1845e+00,  1.3835e+00,  1.4451e+00,
          

NeuralNet(
  (layer1): BasicLinear()
  (layer2): BasicLinear()
)

In [131]:
data = torch.randn(
    size=(5, 10),
)
data

tensor([[-0.1678,  1.6433,  0.5163,  1.6060, -0.9815,  0.5361,  0.9226,  0.4872,
         -0.9770, -0.0336],
        [-0.7983, -0.2648, -0.1666,  0.2518,  1.2571,  1.2173,  0.3034, -0.4501,
         -0.1739,  0.0299],
        [-0.0140, -0.0102,  0.2337,  1.4083, -0.1743,  0.6092,  0.2254, -0.2793,
          0.6702,  0.1188],
        [-0.6119,  0.6026, -0.4403,  2.1848,  0.5258,  1.6828,  0.0967,  0.2571,
          0.4728,  0.3640],
        [-0.2812, -1.0375, -0.4976, -0.1823, -0.2120,  0.8162,  0.8982, -0.1539,
         -0.5682, -0.0868]])

In [132]:
output = model(data)
output, output.shape

(tensor([[ -3.1472, -29.9071,   8.7836,   3.4756,  15.5517,  23.3254, -13.9781,
          -16.4666,  11.5610,  26.2780, -12.2689, -20.9407,  -4.5172,  15.7965,
           12.2754, -17.7336,  22.0269,  18.4722, -17.1346,  -5.1706],
         [  4.1676, -14.8314,   3.4650,   6.1778,  -1.4382,   4.5206,   0.0622,
           -9.7195,  10.3086,   6.7338,  -5.2735,  -7.4848,   3.2957,   2.4533,
            1.2410,   4.0957,  13.7237,   4.0555, -17.8447,   4.4652],
         [ -6.3661, -20.4782,  -3.6122,  -2.0613,  -1.2317,  14.0660,  -6.6215,
           -4.6526,   2.7820,  14.8900,  -4.2800, -12.9097,  -0.8511,  -0.8365,
            6.9943,  -4.1665,  15.3426,  11.1509, -13.1717,  -4.7658],
         [ -5.4236, -28.9025,  -4.7860,   6.7843,  -1.0098,  27.0510,  -2.1414,
          -10.6474,   5.8732,  19.9806,  -5.5848, -10.5738,  -9.7988,   1.8492,
            2.7274,   5.9610,  28.6326,  21.9810, -26.0162,   4.0024],
         [  5.9202, -11.4263,  14.1665,   0.8641,   3.0614,   4.1301,  -1.48

Cool! The output matches the expected shape

Now let's try to implement a block and use it to build a network

In [133]:
import torch

torch.manual_seed(42)


class SimpleResidualBlock(torch.nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.fc1 = torch.nn.Linear(in_features=in_features, out_features=out_features)
        self.fc2 = torch.nn.Linear(in_features=out_features, out_features=out_features)
        self.relu = torch.nn.ReLU()
        self.downsample = None

        # allows us to ensure the identity have the same shape
        # as the output
        if in_features != out_features:
            self.downsample = torch.nn.Linear(
                in_features=in_features, out_features=out_features
            )

    def forward(self, x: torch.Tensor):
        identity = x
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)

        if self.downsample is not None:
            identity = self.downsample(identity)
        out += identity
        return out

In [134]:
block = SimpleResidualBlock(in_features=10, out_features=20)
block

SimpleResidualBlock(
  (fc1): Linear(in_features=10, out_features=20, bias=True)
  (fc2): Linear(in_features=20, out_features=20, bias=True)
  (relu): ReLU()
  (downsample): Linear(in_features=10, out_features=20, bias=True)
)

In [135]:
data = torch.randn(
    size=(5, 10),
)
data

tensor([[-1.3382,  0.4742, -2.2940,  0.7744, -0.5453, -2.1582, -1.6608, -0.6637,
         -0.2670,  0.2584],
        [ 0.7758, -0.1000, -0.5615, -0.5949,  1.2687,  1.2904,  0.6930,  1.1980,
          1.3964, -0.7150],
        [ 1.4109, -1.3144, -1.3162, -1.2524, -1.6489, -0.2800, -1.2407,  0.7410,
          0.7378, -0.8505],
        [ 0.0361,  1.3407,  0.9860,  0.1132, -0.4233, -1.9508,  1.8619, -1.0779,
          0.8849, -0.8342],
        [ 1.0301, -0.8681,  0.2418,  1.3824,  1.1285, -1.2123,  2.6024, -0.0957,
         -0.0811,  1.2587]])

In [136]:
output = block(data)
output

tensor([[-0.4739, -1.9992,  1.0667, -1.4696,  0.4428, -0.8600,  0.2264, -1.3219,
          0.2252,  0.6634,  0.0587,  0.2295,  0.3469,  0.8965,  0.0336,  1.0089,
          1.1820, -0.3060,  0.1728,  0.5566],
        [-0.7263, -0.6590,  0.3153, -1.6100,  0.0537, -0.3231, -0.4889,  0.3054,
         -0.8394,  0.5122, -1.0412, -0.0720, -1.4181,  0.3687, -1.0107, -0.8467,
          0.2860, -0.1833,  0.4961,  0.4846],
        [ 0.5692, -1.1106, -0.5407, -1.6185,  1.0318, -1.6888,  0.5215, -1.2556,
         -1.0459,  0.9259, -1.0388, -0.6898, -0.3632,  1.4558, -0.7626, -0.7203,
          0.7005, -1.4810,  0.3062,  1.5153],
        [ 0.3583, -1.3263, -0.3940, -1.0462, -1.2971, -0.6285, -0.0323,  0.1265,
          0.2430, -1.2454,  0.8915, -0.8012,  0.0533,  0.3077, -0.1811,  0.4346,
          0.8679, -0.7155,  1.0592, -0.4466],
        [ 0.2313, -0.1927, -0.9310, -1.5572, -1.4468,  0.2535,  0.6084, -0.5691,
         -0.3841,  0.3957,  1.6790,  1.0765, -0.0834, -0.4265,  0.9681,  0.5116,
      

Now that we have our block, let's use it to build a model!

In [137]:
class NeuralNet(torch.nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.block1 = SimpleResidualBlock(
            in_features=in_features, out_features=out_features
        )
        self.block2 = SimpleResidualBlock(
            in_features=out_features, out_features=out_features
        )

    def forward(self, x):
        out = self.block1(x)
        out = self.block2(out)
        return out

In [138]:
model = NeuralNet(in_features=10, out_features=50)
model

NeuralNet(
  (block1): SimpleResidualBlock(
    (fc1): Linear(in_features=10, out_features=50, bias=True)
    (fc2): Linear(in_features=50, out_features=50, bias=True)
    (relu): ReLU()
    (downsample): Linear(in_features=10, out_features=50, bias=True)
  )
  (block2): SimpleResidualBlock(
    (fc1): Linear(in_features=50, out_features=50, bias=True)
    (fc2): Linear(in_features=50, out_features=50, bias=True)
    (relu): ReLU()
  )
)

In [139]:
data = torch.randn(
    size=(5, 10),
)
data

tensor([[ 0.2182,  0.8823,  0.5390,  1.3357,  0.8349, -1.0390, -0.4415, -0.4136,
          0.6149,  0.5247],
        [ 0.1156,  0.9289, -1.1753,  1.4462,  0.2890, -0.5746,  0.4203,  0.3187,
         -0.1949, -0.7710],
        [-1.0754, -0.6555, -0.5378, -0.3390,  1.3493, -1.5745, -0.5291,  2.2761,
          0.2758,  0.4236],
        [-1.7807, -0.2473,  1.3181,  1.8177,  1.5550,  1.1142, -0.2878, -1.0536,
         -1.5974, -0.1525],
        [ 0.2308,  1.0065,  0.1740,  1.5454, -0.8084,  1.7691,  0.1786,  0.5163,
         -0.4629, -0.6336]])

In [140]:
output = model(data)
output.shape

torch.Size([5, 50])

As you can see, we can use and add any class that inherits from the `torch.nn.Module` as lego blocks for larger blocks or networks. This allows for a lot of flexibility when designing networks.

## Layers

In this section we will explore some of the layers available in the pytorch ecosystem.

### Linear Layer

This is one of the most basic layers, which is a fully connected layer that applies a linear transformation to the input. It is used, for example, as the final layer of Convolutional Neural Networks to output the classes probabilities.

The output is given by:
$$ y = x\cdot{A^T} + b $$

Where:
- $x$ is the input of the layer
- $A$ is the weights matrix, which in this case is transposed ($A^T$)
- $b$ is the bias term

In other words, the linear layer is a dot product of the input tensor and the weights tensor, plus the bias term to offset the weights and inputs

**Documentation**

https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html

In [141]:
torch.manual_seed(42)
linear = torch.nn.Linear(
    in_features=10,  # in_features = matches inner dimension of input
    out_features=5,
)  # out_features = describes outer value of output
# this layer expects a (n, 10) input and will output a (n, 5). See below.

x = torch.randn(
    size=(5, 10),
)
output = linear(x)
output

tensor([[ 0.7910, -0.6975,  0.4384,  0.7299,  1.0319],
        [-0.2977,  0.5749, -0.3397,  0.9044,  1.0887],
        [ 0.3056, -0.6722, -0.0591,  0.5983, -0.2779],
        [ 0.1036, -0.0570,  0.0212,  0.9951,  0.7813],
        [ 0.5157,  0.1053, -1.0661,  1.8080,  0.9811]],
       grad_fn=<AddmmBackward0>)

In [142]:
output.shape

torch.Size([5, 5])

In [143]:
for param in linear.parameters():
    print(param, param.shape)

Parameter containing:
tensor([[ 0.2418,  0.2625, -0.0741,  0.2905, -0.0693,  0.0638, -0.1540,  0.1857,
          0.2788, -0.2320],
        [ 0.2749,  0.0592,  0.2336,  0.0428,  0.1525, -0.0446,  0.2438,  0.0467,
         -0.1476,  0.0806],
        [-0.1457, -0.0371, -0.1284,  0.2098, -0.2496, -0.1458, -0.0893, -0.1901,
          0.0298, -0.3123],
        [ 0.2856, -0.2686,  0.2441,  0.0526, -0.1027,  0.1954,  0.0493,  0.2555,
          0.0346, -0.0997],
        [ 0.0850, -0.0858,  0.1331,  0.2823,  0.1828, -0.1382,  0.1825,  0.0566,
          0.1606, -0.1927]], requires_grad=True) torch.Size([5, 10])
Parameter containing:
tensor([-0.3130, -0.1222, -0.2426,  0.2595,  0.0911], requires_grad=True) torch.Size([5])


Let's try with different dimensions this time!

In [144]:
torch.manual_seed(42)
linear = torch.nn.Linear(
    in_features=7,  # in_features = matches inner dimension of input
    out_features=3,
)  # out_features = describes outer value
x = torch.randn(
    size=(9, 7),
)
output = linear(x)
output

tensor([[ 0.4960, -0.2588,  0.8303],
        [ 0.1278, -0.1437,  0.2210],
        [-0.5078, -0.1890, -0.4775],
        [-1.1359, -0.5379, -0.2967],
        [-0.4981, -0.2988, -0.2436],
        [ 0.3976,  0.0798,  0.2285],
        [-0.1890, -0.3357,  0.3579],
        [-0.3978, -0.8552,  0.5626],
        [-0.0656, -0.3253,  0.0333]], grad_fn=<AddmmBackward0>)

In [145]:
output.shape

torch.Size([9, 3])

In [146]:
for param in linear.parameters():
    print(param, param.shape)

Parameter containing:
tensor([[ 0.2890,  0.3137, -0.0885,  0.3472, -0.0828,  0.0763, -0.1840],
        [ 0.2220,  0.3332, -0.2773,  0.3285,  0.0707,  0.2792,  0.0512],
        [ 0.1822, -0.0534,  0.2914,  0.0559, -0.1764,  0.0963, -0.1741]],
       requires_grad=True) torch.Size([3, 7])
Parameter containing:
tensor([-0.0443, -0.1535,  0.2507], requires_grad=True) torch.Size([3])


#### Shapes

Let's try to understand the shapes we defined.

When creating the layer, we defined 2 parameters:
- in_features
- out_features

The figure below explains how the shapes of the inputs and outputs are obtained

<img src="../assets/layers/linear layer.png" height="700">

### Bilinear

The bilinear layer correlates 2 tensors.

The output is given by:
$$ y=x_{1}^{T}A x_{2}+b $$

Where:
- $x_1$ is the input tensor 1 of the layer, in this case transposed $(x_1^T)$
- $x_2$ is the input tensor 2 of the layer
- $A$ is the weights matrix
- $b$ is the bias term

One thing to note is that *matrix dot products are associative* which means that

$$(x_{1}^{T}A) x_{2} = x_{1}^{T} (Ax_{2})$$

**Documentation**

https://pytorch.org/docs/stable/generated/torch.nn.Bilinear.html

https://pytorch.org/docs/stable/generated/torch.nn.functional.bilinear.html

In [147]:
torch.manual_seed(42)
bilinear = torch.nn.Bilinear(
    in1_features=10,  # in_features = matches inner dimension of input
    in2_features=20,
    out_features=7,  # out_features = describes outer value of output
)
# this layer expects a (n, 10) input and will output a (n, 20). See below.

x1 = torch.randn(
    size=(5, 10),
)
x2 = torch.randn(
    size=(5, 20),
)
output = bilinear(x1, x2)
output, output.shape

(tensor([[ 0.2347, -1.5629,  1.7704, -0.0386,  0.5411,  0.5182, -0.6519],
         [-0.0134, -1.7769,  0.0528, -1.5926,  1.6024, -2.5127,  0.5474],
         [-0.3513, -3.4169, -0.6370, -6.0918,  1.4707,  1.2421, -5.9193],
         [ 0.6226, -1.2700,  1.2536, -0.3394, -1.7838,  1.3753, -5.1026],
         [-1.1170, -1.3467, -1.2738, -2.9952, -2.9509,  0.4995,  1.9577]],
        grad_fn=<AddBackward0>),
 torch.Size([5, 7]))

In [148]:
# the weights tensor is 3d. (out_features, in1_features)
bilinear.weight.shape

torch.Size([7, 10, 20])

In [149]:
bilinear.bias.shape

torch.Size([7])

Manually computing for 2d tensors

In [150]:
# Extract learnable weights
A = bilinear.weight  # Shape: (out_features, in1_features, in2_features)
b = bilinear.bias  # Shape: (out_features), if bias=True

# Manual computation of the bilinear transformation
# Initialize output matrix
output_manual = torch.zeros((5, 7))  # batch, out_features

# Compute y = x1^T * A * x2 + b for each batch and output feature
for i in range(5):  # Iterate over each sample in the batch
    for o in range(7):  # Iterate over each output feature
        temp_result = (
            x1[i].T @ A[o] @ x2[i]
        )  # Compute dot product for specific batch element and output feature
        # x1[i].T = transpose of each row, which is a vector in this case
        # a vector of len = 7

        # A[o] = 2d matrice. Each matrice corresponds to a row in the 3d tensor
        # a 2d tensor of shape in1_feature (10) and in2_feature (20)
        # x2[i] = each row in x2, which is a vector in this case
        # a vector of len = 20

        # temp result will be a scalar
        temp_result += b[o]  # Add bias for the specific output feature

        output_manual[i, o] = temp_result  # Store the result in the output tensor

# Automatic computation with PyTorch
output_auto = bilinear(x1, x2)

# Comparing manual and automatic computations
are_close = torch.allclose(output_manual, output_auto)
are_close, output_auto, output_manual


(True,
 tensor([[ 0.2347, -1.5629,  1.7704, -0.0386,  0.5411,  0.5182, -0.6519],
         [-0.0134, -1.7769,  0.0528, -1.5926,  1.6024, -2.5127,  0.5474],
         [-0.3513, -3.4169, -0.6370, -6.0918,  1.4707,  1.2421, -5.9193],
         [ 0.6226, -1.2700,  1.2536, -0.3394, -1.7838,  1.3753, -5.1026],
         [-1.1170, -1.3467, -1.2738, -2.9952, -2.9509,  0.4995,  1.9577]],
        grad_fn=<AddBackward0>),
 tensor([[ 0.2347, -1.5629,  1.7704, -0.0386,  0.5411,  0.5182, -0.6519],
         [-0.0134, -1.7769,  0.0528, -1.5926,  1.6024, -2.5127,  0.5474],
         [-0.3513, -3.4169, -0.6370, -6.0918,  1.4707,  1.2421, -5.9193],
         [ 0.6226, -1.2700,  1.2536, -0.3394, -1.7838,  1.3753, -5.1026],
         [-1.1170, -1.3467, -1.2738, -2.9952, -2.9509,  0.4995,  1.9577]],
        grad_fn=<CopySlices>))

In 3D Tensors...

In [156]:
torch.manual_seed(42)
bilinear = torch.nn.Bilinear(
    in1_features=10,  # in_features = matches inner dimension of input
    in2_features=20,
    out_features=7,  # out_features = describes outer value of output
)
# this layer expects a (n, 10) input and will output a (n, 20). See below.

x1 = torch.randn(
    size=(5, 15, 10),
)
x2 = torch.randn(
    size=(5, 15, 20),
)
output = bilinear(x1, x2)
output.shape

torch.Size([5, 15, 7])

In [157]:
bilinear.weight.shape

torch.Size([7, 10, 20])

#### Shapes

Let's try to understand the shapes we defined.

When creating the layer, we defined 2 parameters:
- in_features
- out_features

The figure below explains how the shapes of the inputs and outputs are obtained

<img src="../assets/layers/Bilinear layer.png" height="700">

### Conv1d

This applies a 1d convolution. This can be used for 1 dimensional data, like sequences.

Considering a input of size $(N, C_{in}, L)$ that outputs a tensor with shape $(N, C_{out}, L_{out})$, the output is given by:
$$ {\mathrm{out}}(N_{i},C_{\mathrm{out}_{j}})={\mathrm{bias}}(C_{\mathrm{out}_{j}})+\sum_{k=0}^{C_{\mathrm{in}}-1}{\mathrm{weight}}(C_{\mathrm{out}_{j}},k)\star{\mathrm{input}}(N_{i},k) $$

Where:
- $N$ is the batch size
- $C$ is the number of channels
- $L$ is the signal sequence
- $\star$ is the valid [cross correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator

In other words, the linear layer is a dot product of the input tensor and the weights tensor, plus the bias term to offset the weights and inputs

Inputs:
- *in_channels* is the number of channels in the input
- *out_channels* is the number of channels produced
- *kernel_size* is the size of the convolutional kernel 

To compute the expected size of $L_{out}$ you can use the following equation:
$$ 
L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1 \right\rfloor 
$$

or the following code:
```python
L_out = ((L_in + 2 * padding - dilation * (kernel_size - 1) - 1) // stride) + 1
```

Variables:
- *weight*: The learnable weight with shape $\left(\mathrm{out\_channels},\,\frac{\mathrm{in\_channels}}{\mathrm{groups}},\,\mathrm{kernel\_size}\right)$
- *bias*: The learnable bias with shape $(\mathrm{out\_channels})$


**Documentation**

https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html

https://pytorch.org/docs/stable/generated/torch.nn.functional.conv1d.html

In [217]:
import torch

torch.manual_seed(42)

conv1d = torch.nn.Conv1d(
    in_channels=3,
    out_channels=5,
    kernel_size=4,
    stride=1,
    padding=0,
    dilation=1,
    groups=1,
)
x = torch.randn((6, 3, 8))
output = conv1d(x)
print(output.shape)

torch.Size([6, 5, 5])


In [225]:
# with groups = 1, shape is:
# (out_channels, in_channels, kernel_size)
conv1d.weight.shape

torch.Size([7, 3, 2])

In [226]:
# shape is (out_channels)
conv1d.bias.shape

torch.Size([7])

Calculating/estimating $L_{out}$:

In [203]:
import torch


def calculate_L_out(conv1d, x):
    """
    Calculate the length of the output feature map (L_out) for a 1D convolution.

    Parameters:
        conv1d (torch.nn.Conv1d): The Conv1d layer.
        x (Tensor): Input tensor with shape (batch_size, channels, L_in).

    Returns:
        int: Length of the output feature map (L_out).
    """
    # Get the attributes from the convolution layer
    kernel_size = conv1d.kernel_size[0]
    stride = conv1d.stride[0]
    padding = conv1d.padding[0]
    dilation = conv1d.dilation[0]

    # Calculate L_in using the input tensor shape
    L_in = x.size(-1)  # The length of the input feature map (L_in)

    # Compute L_out using the formula
    L_out = ((L_in + 2 * padding - dilation * (kernel_size - 1) - 1) // stride) + 1

    return int(L_out)


conv1d = torch.nn.Conv1d(
    in_channels=1, out_channels=32, kernel_size=3, stride=2, padding=1, dilation=1
)

# Create an input tensor with shape (batch_size, channels, L_in)
batch_size = 4
L_in = 50
x = torch.randn(batch_size, 1, L_in)

# Calculate L_out
L_out = calculate_L_out(conv1d, x)
print(f"calculated L_out: {L_out}")
print(f"conv1d L_out: {conv1d(x).shape[-1]}")

calculated L_out: 25
conv1d L_out: 25


In [223]:
import torch

torch.manual_seed(42)

conv1d = torch.nn.Conv1d(
    in_channels=3,
    out_channels=7,
    kernel_size=2,
    stride=1,
    padding=0,
    dilation=1,
    groups=1,
)

x = torch.randn((6, 3, 8))
output = conv1d(x)
print(x[0].shape)
print(output[0].shape)

torch.Size([3, 8])
torch.Size([7, 7])


In [224]:
conv1d.weight.shape

torch.Size([7, 3, 2])

### MaxPool

This is one of the most basic layers, which is a fully connected layer that applies a linear transformation to the input. It is used, for example, as the final layer of Convolutional Neural Networks to output the classes probabilities.

The output is given by:
$$ y = x\cdot{A^T} + b $$

Where:
- $x$ is the input of the layer
- $A$ is the weights matrix, which in this case is transposed ($A^T$)
- $b$ is the bias term

In other words, the linear layer is a dot product of the input tensor and the weights tensor, plus the bias term to offset the weights and inputs