# Layers

Layer in torch is some transformation with some inputs and some outputs.

In [3]:
import torch
from torch import nn

import matplotlib.pyplot as plt

To perform a layer transformation on the tensor `X`, simply use the syntax `layer(X)`.

## Linear

The `torch.nn.Linear` layer performs the following operation:

$$X_{n \times l} \cdot \omega_{l \times k} + b_k$$

Where:

- $l$ - number of inputs
- $k$ - number of outputs
- $n$ - number of input samples
- $X_{n \times l}$ - input tensor
- $\omega_{l \times k}$ - weight matrix of the layer
- $b_k$ - bias vector of the layer

---

The following cell applies the tensor to some data and manually performs the same transformation. The results should be identical.

In [2]:
in_features = 5
out_features = 3

linear = nn.Linear(
    in_features = in_features, 
    out_features = out_features
)

X = torch.rand(in_features)

print("Layer transformation")
print(linear(X).tolist())
print("X@w+b")
print((linear.weight@X + linear.bias).tolist())

Layer transformation
[-0.6235317587852478, -1.1335619688034058, -0.70122230052948]
X@w+b
[-0.6235317587852478, -1.1335619688034058, -0.70122230052948]


### Define values

To define custom values for tensors you have to use access `weight` and `bias` fater layer creation. They are belongs to the `type(linear_layer.weight)` data type. So you have to use method `copy_` under `torch.no_grad` context.

---

Here’s an example of how to do it:

In [3]:
linear_layer = nn.Linear(in_features=3, out_features=4)

default_weights = torch.ones_like(linear_layer.weight)
default_biases = torch.zeros_like(linear_layer.bias)

with torch.no_grad():
    linear_layer.weight.copy_(default_weights)
    linear_layer.bias.copy_(default_biases)

After completing the process, we have the `weight` tensor initialized with ones and the `bias` tensor initialized with zeros:

In [4]:
print(linear_layer.weight)
print(linear_layer.bias)

Parameter containing:
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], requires_grad=True)
Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)


### dtype

By default, tensors used in `torch.nn.Linear` have a `float32` data type. This can lead to issues when processing tensors with different data types. 

---

The following cell defines a tensor with a `float16` data type. 

In [27]:
tensor_size = 3

tensor = torch.empty(
    size=(tensor_size, tensor_size), 
    dtype=torch.float16
)
tensor

tensor([[ 4.3904e+04,  3.6992e+00,  3.5763e-07],
        [ 0.0000e+00, -5.1200e+02,         nan],
        [-5.1200e+02,  4.3789e+00,  2.0000e+00]], dtype=torch.float16)

The following cell defines a tensor with a `float32` data type. 

In [28]:
layer = nn.Linear(tensor_size, tensor_size)
for p in layer.parameters():
    print(p.dtype)

torch.float32
torch.float32


Trying to apply the `layer` to the tensor will result in an error stating that the data types are incompatible.

In [29]:
layer(tensor)

RuntimeError: mat1 and mat2 must have the same dtype, but got Half and Float

The following cell demonstrates how to change the data type of tensors used in `nn.Linear`. After modifying the data types, you can successfully apply the `layer` to the input tensor.

In [30]:
for p in layer.parameters():
    p.data = p.data.to(torch.float16)

layer(tensor)

tensor([[-6652.0000, 24528.0000, -9664.0000],
        [       nan,        nan,        nan],
        [   76.9375,  -286.0000,   113.6875]], dtype=torch.float16,
       grad_fn=<AddmmBackward0>)

## Dropout

A dropout layer randomly sets some components of the input tensor to zero with a given probability $p$. During training, the remaining non-zero components are scaled by a factor of $ \frac{1}{1-p}$ to prevent signal attenuation. Formally, if we start with a tensor $x_i$, where $i \in \mathbb{N}^k$ represents the indices of the $k$-dimensional tensor, the output after applying dropout is given by:

$$
x'_i = x_i \cdot p_i \cdot \frac{1}{1-p},
$$

where $p_i$ is sampled from a Bernoulli distribution with parameter $p$, i.e., $p_i \sim \text{Bernoulli}(p)$.

---

This example demonstrates the transformation of a tensor after passing through a dropout layer. 

In [72]:
torch.manual_seed(111)

dropout_layer = nn.Dropout(p=0.3)
tensor = (torch.arange(3 * 3, dtype=float) + 1).reshape((3, 3))

print("Original tensor:")
print(tensor)
print()
print("Dropout result:")
print(dropout_layer(tensor))

Original tensor:
tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]], dtype=torch.float64)

Dropout result:
tensor([[ 1.4286,  2.8571,  0.0000],
        [ 5.7143,  0.0000,  8.5714],
        [10.0000,  0.0000, 12.8571]], dtype=torch.float64)
