# Torch Layers

PyTorch is build around torch.nn, which have classes like `torch.nn.Module`, `torch.nn.Parameter`. Think of these as basic building blocks, or Lego blocks, that allow you to build complex structures. Layers, Models inherit from `torch.nn.Module`, which already defines a lot of useful methods and allows us to build our own blocks with as little as defining a `__init__()` and a `forward()` methods.

For example, we can create a new layer with something as simple as:
```python
import torch
class BasicLinear(torch.nn.Module): # Inherits from nn.Module. Almost everything in PyTorch is a nn.Module
    def __init__(self):
        super().__init__()
        self.weights = torch.nn.Parameter( # defines weights Parameter
            data=torch.randn(1), # starts with random value
            requires_grad=True # activate autograd, which allows us to track and update this parameter during backward pass
        )
        self.bias = torch.nn.Parameter( # defines bias Parameter, same as above but with different syntax
            torch.randn(1, dtype=torch.float32), # explicitly sets the dtype to float32 (the default)
            requires_grad=True
        )
    
    def forward(self, x: torch.Tensor): # defines the way data will be transformed in the layer or block
        # linear = xA^T + b
        # A^T
        weights = self.weights.t() # transposes weights tensor
        # x
        input_tensor = x
        # b
        bias = self.bias

        # xA^T
        mul = torch.matmul(input_tensor, weights) # calculates tensor dot product between transposed weights and input

        # xA^T + b
        output = mul + bias # adds offset (bias)

        return output
```
In the forward method we follow the [implementation of the `nn.functional.linear` layer](https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html), which is implemented in C++. The [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) uses the ``nn.functional.linear` layer (written in C++) in its forward method. But at the end of the day, the transformation being made is:

$$y=x A^{T}+b.$$

In this notebook we are gonna see how we can create our own layers, and explore some of the predefined layers included with PyTorch.

In [1]:
# Ensures versions are correct
! pip install torch==2.3.0 numpy==1.25.2 pillow==9.4.0 torchvision==0.18

import torch
import numpy as np
import PIL

print(f"Torch version: {torch.__version__}")
print(f"Numpy version: {np.__version__}")
print(f"PIL version: {PIL.__version__}")
print(f"GPU enabled: {torch.cuda.is_available()}")

Torch version: 2.3.0+cu121
Numpy version: 1.25.2
PIL version: 9.4.0
GPU enabled: True


## Basic Building Blocks



### Custom Layers

You can build your own layers using class definitions that inherit from torch.nn.Module. These can be used as building blocks for larger networks.

In [2]:
import torch
torch.manual_seed(42)
class BasicLinear(torch.nn.Module): # Inherits from nn.Module. Almost everything in PyTorch is a nn.Module
    def __init__(self, input_features, output_features):
        super().__init__()
        self.weights = torch.nn.Parameter( # defines weights Parameter
            data=torch.randn(size=(output_features, input_features)), # starts with random value
            requires_grad=True # activate autograd, which allows us to track and update this parameter during backward pass
        )
        self.bias = torch.nn.Parameter( # defines bias Parameter, same as above but with different syntax
            torch.randn(size=(output_features,), dtype=torch.float32), # explicitly sets the dtype to float32 (the default)
            requires_grad=True
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor: # defines the way data will be transformed in the layer or block
        # linear = xA^T + b
        # A^T
        weights = self.weights.t() # transposes weights tensor
        # x
        input_tensor = x
        # b
        bias = self.bias

        # xA^T
        mul = torch.matmul(input_tensor, weights) # calculates tensor dot product between transposed weights and input

        # xA^T + b
        output = mul + bias # adds offset (bias)

        return output

Don't worry about the dimensions yet, we will understand it more later on.

In [3]:
layer = BasicLinear(input_features=10, output_features=5)
layer

BasicLinear()

In [4]:
for param in layer.parameters():
    print(param, param.shape)

Parameter containing:
tensor([[ 1.9269,  1.4873,  0.9007, -2.1055,  0.6784, -1.2345, -0.0431, -1.6047,
         -0.7521,  1.6487],
        [-0.3925, -1.4036, -0.7279, -0.5594, -0.7688,  0.7624,  1.6423, -0.1596,
         -0.4974,  0.4396],
        [-0.7581,  1.0783,  0.8008,  1.6806,  1.2791,  1.2964,  0.6105,  1.3347,
         -0.2316,  0.0418],
        [-0.2516,  0.8599, -1.3847, -0.8712,  0.0780,  0.5258, -0.4880,  1.1914,
         -0.8140, -0.7360],
        [-0.8371, -0.9224, -0.0635,  0.6756, -0.0978,  1.8446, -1.1845,  1.3835,
         -1.2024,  0.7078]], requires_grad=True) torch.Size([5, 10])
Parameter containing:
tensor([-0.5687,  1.2580, -1.5890, -1.1208,  0.8423], requires_grad=True) torch.Size([5])


In [5]:
data = torch.randn(
    size=(5,10),
)
data

tensor([[ 0.3383,  1.6992,  0.0109, -0.3387, -1.3407, -0.5854,  0.5362,  0.5246,
         -1.4692,  1.4332],
        [ 0.7440, -0.4816, -1.0495,  0.6039, -1.7223, -0.8278, -0.4976,  0.4747,
         -2.5095,  0.4880],
        [ 0.7846,  0.0286,  0.6408,  0.5832,  0.2191,  0.5526, -0.1853,  0.7528,
          0.4048,  0.1785],
        [ 0.2649,  1.2732, -0.8905,  0.4098,  1.9312,  1.0119, -1.4364, -1.1299,
         -0.1360,  1.6354],
        [ 0.6547,  0.5760,  1.1415,  0.0186, -1.8058,  0.9254, -0.3753,  1.0331,
         -0.6867,  0.6368]])

In [6]:
output = layer(data)
print(output.shape)
output

torch.Size([5, 5])


tensor([[ 5.7495,  1.6639, -1.6199,  0.6273,  0.6853],
        [-0.2628,  3.3308, -4.8425,  1.1273,  4.3891],
        [-1.4089, -0.1772,  1.1424, -1.8549,  2.4110],
        [ 4.9050, -2.3195,  1.0556, -0.2721,  2.9166],
        [ 0.0959,  1.2854, -0.2936, -0.5371,  4.7370]], grad_fn=<AddBackward0>)

## Layers

### Linear Layer

The output is given by:
$$ y = x\cdot{A^T} + b $$

Where:
- $x$ is the input of the layer
- $A$ is the weights matrix, which in this case is transposed ($A^T$)
- $b$ is the bias term

In other words, the linear layer is a dot product of the input tensor and the weights tensor, plus the bias term to offset the weights and inputs

In [7]:
torch.manual_seed(42)
linear = torch.nn.Linear(in_features=10, # in_features = matches inner dimension of input 
                         out_features=5) # out_features = describes outer value 
x = torch.randn(
    size=(5,10),
)
output = linear(x)
output

tensor([[ 0.7910, -0.6975,  0.4384,  0.7299,  1.0319],
        [-0.2977,  0.5749, -0.3397,  0.9044,  1.0887],
        [ 0.3056, -0.6722, -0.0591,  0.5983, -0.2779],
        [ 0.1036, -0.0570,  0.0212,  0.9951,  0.7813],
        [ 0.5157,  0.1053, -1.0661,  1.8080,  0.9811]],
       grad_fn=<AddmmBackward0>)

In [8]:
output.shape

torch.Size([5, 5])

In [9]:
for param in linear.parameters():
    print(param, param.shape)

Parameter containing:
tensor([[ 0.2418,  0.2625, -0.0741,  0.2905, -0.0693,  0.0638, -0.1540,  0.1857,
          0.2788, -0.2320],
        [ 0.2749,  0.0592,  0.2336,  0.0428,  0.1525, -0.0446,  0.2438,  0.0467,
         -0.1476,  0.0806],
        [-0.1457, -0.0371, -0.1284,  0.2098, -0.2496, -0.1458, -0.0893, -0.1901,
          0.0298, -0.3123],
        [ 0.2856, -0.2686,  0.2441,  0.0526, -0.1027,  0.1954,  0.0493,  0.2555,
          0.0346, -0.0997],
        [ 0.0850, -0.0858,  0.1331,  0.2823,  0.1828, -0.1382,  0.1825,  0.0566,
          0.1606, -0.1927]], requires_grad=True) torch.Size([5, 10])
Parameter containing:
tensor([-0.3130, -0.1222, -0.2426,  0.2595,  0.0911], requires_grad=True) torch.Size([5])


Let's try with different dimensions this time!

In [10]:
torch.manual_seed(42)
linear = torch.nn.Linear(in_features=7, # in_features = matches inner dimension of input 
                         out_features=3) # out_features = describes outer value 
x = torch.randn(
    size=(9,7),
)
output = linear(x)
output

tensor([[ 0.4960, -0.2588,  0.8303],
        [ 0.1278, -0.1437,  0.2210],
        [-0.5078, -0.1890, -0.4775],
        [-1.1359, -0.5379, -0.2967],
        [-0.4981, -0.2988, -0.2436],
        [ 0.3976,  0.0798,  0.2285],
        [-0.1890, -0.3357,  0.3579],
        [-0.3978, -0.8552,  0.5626],
        [-0.0656, -0.3253,  0.0333]], grad_fn=<AddmmBackward0>)

In [11]:
output.shape

torch.Size([9, 3])

In [12]:
for param in linear.parameters():
    print(param, param.shape)

Parameter containing:
tensor([[ 0.2890,  0.3137, -0.0885,  0.3472, -0.0828,  0.0763, -0.1840],
        [ 0.2220,  0.3332, -0.2773,  0.3285,  0.0707,  0.2792,  0.0512],
        [ 0.1822, -0.0534,  0.2914,  0.0559, -0.1764,  0.0963, -0.1741]],
       requires_grad=True) torch.Size([3, 7])
Parameter containing:
tensor([-0.0443, -0.1535,  0.2507], requires_grad=True) torch.Size([3])


#### Shapes

Let's try to understand the shapes we defined.

When creating the layer, we defined 2 parameters:
- in_features
- out_features

The figure below explains how the shapes of the inputs and outputs are obtained

<img src="../assets/layers/linear layer.png" height="700">