# Torch

Is a great library that provides really convenient and flexible interfaces for building neural networks.

For installation check [this page](https://pytorch.org/get-started/locally/).

In [1]:
import torch
import torch.nn.functional as F

## Tensor

Tensor is a generalisation of a matrix to the case of arbitrary dimensionality. Basic entity with wich torch operates is tensor. FInd out more in [specific page](torch/tensor.ipynb).

---

The following example demonstrates how to create a specific tensor. In this tensor, the elements are denoted as $\left[ijk\right]$, where $i$ represents the layer index in the third dimension, $j$ denotes the row index, and $k$ indicates the column index.

In [5]:
torch.tensor([
    [
        [111,112,113,114],
        [121,122,123,124],
        [131,132,133,134]
    ],
    [
        [211,212,213,214],
        [221,222,223,224],
        [231,232,233,244]
    ],
])

tensor([[[111, 112, 113, 114],
         [121, 122, 123, 124],
         [131, 132, 133, 134]],

        [[211, 212, 213, 214],
         [221, 222, 223, 224],
         [231, 232, 233, 244]]])

## Gradient

A key feature of PyTorch that sets it apart from NumPy is its ability to automatically compute gradients for tensors involved in computations. You just need to call the `backward` method on the result of your computations. The tensors that participated in these computations will then have a `grad` attribute containing the gradients. Find out more on the [relevant page](torch/differentiation.ipynb).

---

As example consider fuction:

$$f(\overline{X})=\sum_i x_i^2, \overline{X} = (x_1, x_2, x_3)$$

Suppose we want to calculate the gradient of the $f$ on $x$ in point $(1,2,3)$:

$$\nabla f=(2x_1, 2x_2, 2x_3) \Rightarrow \nabla f(1,2,3)=(2,4,6)$$

Now repeat the same procedure with the torch.

In [8]:
X = torch.tensor([1,2,3], dtype=torch.float, requires_grad=True)
res = (X**2).sum()
res.backward()
X.grad

tensor([2., 4., 6.])

## Loss functions

Torch implements common loss functions. The following table shows some of them:

| Loss Function                         | Description                              |
|--------------------------------------|------------------------------------------|
| `torch.nn.functional.binary_cross_entropy` | Binary Cross Entropy                     |
| `torch.nn.functional.binary_cross_entropy_with_logits` | Binary Cross Entropy with Logits        |
| `torch.nn.functional.cross_entropy`       | Cross Entropy Loss                       |
| `torch.nn.functional.hinge_embedding_loss` | Hinge Embedding Loss                     |
| `torch.nn.functional.kl_div`              | Kullback-Leibler Divergence Loss         |
| `torch.nn.functional.l1_loss`             | Mean Absolute Error Loss                |
| `torch.nn.functional.mse_loss`            | Mean Squared Error Loss                  |
| `torch.nn.functional.margin_ranking_loss` | Margin Ranking Loss                      |
| `torch.nn.functional.multi_label_margin_loss` | Multi-Label Margin Loss                |
| `torch.nn.functional.multi_label_soft_margin_loss` | Multi-Label Soft Margin Loss           |
| `torch.nn.functional.smooth_l1_loss`      | Smooth L1 Loss                           |
| `torch.nn.functional.triplet_margin_loss` | Triplet Margin Loss                      |
| `torch.nn.functional.nll_loss`            | Negative Log Likelihood Loss            |
| `torch.nn.functional.cosine_embedding_loss` | Cosine Embedding Loss                   |


---

The followgin cell shows applying `mse_loss`.

In [17]:
F.mse_loss(
    torch.tensor([1,2,3], dtype=torch.float),
    torch.tensor([2,3,4], dtype=torch.float)
)

tensor(1.)

#### Reduction

The `reduction` parameter allows you to specify the type of aggregation to apply to the results of the function. The three commonly used values are `none`, `mean`, and `sum`.

---

The following cell demonstrates how different types of reduction are applied to the same inputs:

In [25]:
tens1 = torch.tensor([1,2,3], dtype=torch.float)
tens2 = torch.tensor([2,3,4], dtype=torch.float)

for reduction in ["mean", "sum", "none"]:
    res = F.mse_loss(tens1, tens2, reduction=reduction)
    print(f"reduction - {reduction}, res={res}")

reduction - mean, res=1.0
reduction - sum, res=3.0
reduction - none, res=tensor([1., 1., 1.])


## Layers

PyTorch provides a variety of tools for creating neural network layers. Find out more on the [relevant page](torch/layers.ipynb).

**Note:** In theory, the term "layer" often refers to a combination of connections and activation functions. However, PyTorch has a more specific abstraction where there are dedicated layers for different functionalities. It's important to keep this in mind to avoid confusion.

The following table lists the layers available in PyTorch:

| Category          | Layer                    | Description                                  |
|-------------------|--------------------------|----------------------------------------------|
| **Linear Layers** | `torch.nn.Linear`        | Fully connected (dense) layer               |
| **Convolutional Layers** | `torch.nn.Conv1d`      | 1D convolutional layer                      |
|                   | `torch.nn.Conv2d`        | 2D convolutional layer                      |
|                   | `torch.nn.Conv3d`        | 3D convolutional layer                      |
| **Pooling Layers**| `torch.nn.MaxPool1d`     | 1D max pooling layer                       |
|                   | `torch.nn.MaxPool2d`     | 2D max pooling layer                       |
|                   | `torch.nn.MaxPool3d`     | 3D max pooling layer                       |
|                   | `torch.nn.AvgPool1d`     | 1D average pooling layer                   |
|                   | `torch.nn.AvgPool2d`     | 2D average pooling layer                   |
|                   | `torch.nn.AvgPool3d`     | 3D average pooling layer                   |
| **Normalization Layers** | `torch.nn.BatchNorm1d` | 1D batch normalization                      |
|                   | `torch.nn.BatchNorm2d`   | 2D batch normalization                      |
|                   | `torch.nn.BatchNorm3d`   | 3D batch normalization                      |
|                   | `torch.nn.LayerNorm`     | Layer normalization                         |
|                   | `torch.nn.InstanceNorm1d`| 1D instance normalization                   |
|                   | `torch.nn.InstanceNorm2d`| 2D instance normalization                   |
|                   | `torch.nn.InstanceNorm3d`| 3D instance normalization                   |
| **Activation Functions** | `torch.nn.ReLU`        | Rectified Linear Unit                       |
|                   | `torch.nn.Sigmoid`       | Sigmoid activation function                 |
|                   | `torch.nn.Tanh`          | Hyperbolic tangent activation function       |
|                   | `torch.nn.LeakyReLU`     | Leaky Rectified Linear Unit                 |
|                   | `torch.nn.Softmax`       | Softmax activation function                 |
|                   | `torch.nn.Softplus`      | Softplus activation function                |
|                   | `torch.nn.Softshrink`    | Softshrink activation function              |
| **Recurrent Layers** | `torch.nn.RNN`          | Recurrent Neural Network layer              |
|                   | `torch.nn.LSTM`          | Long Short-Term Memory layer                |
|                   | `torch.nn.GRU`           | Gated Recurrent Unit layer                  |
| **Other Layers**  | `torch.nn.Embedding`     | Lookup table for embeddings                 |
|                   | `torch.nn.Dropout`       | Dropout layer for regularization            |
|                   | `torch.nn.Transformer`   | Transformer model                           |
|                   | `torch.nn.TransformerEncoder` | Transformer encoder                     |
|                   | `torch.nn.TransformerDecoder` | Transformer decoder                     |


Consider typical features of such objets. As an example, let's take a linear layer without going into its peculiarities.

---

The following cell shows that you can apply layer to the operand.

In [7]:
layer = torch.nn.Linear(10, 3)
layer(torch.rand(3, 10))

tensor([[ 0.5347, -0.0643, -0.2821],
        [ 0.2541,  0.2737, -0.2114],
        [ 0.4634,  0.2516, -0.2575]], grad_fn=<AddmmBackward0>)

You can use a layer as part of the computation, and it can participate in the `backward` pass to compute gradients. The following cell demonstrates how to obtain the gradient for the `weight` attribute of a layer.

In [10]:
layer(torch.rand(3, 10)).sum().backward()
layer.weight.grad

## Network composition

Neural networks are built by composing layers. PyTorch provides powerful tools and concepts for building these compositions. This section will delve into these important concepts. 

### Sequential

You can use `torch.nn.Sequential` to combine multiple network layers into a sequential chain. Find out more in the [specific page](torch/sequential.ipynb).

---

The following cell demonstrates a basic example where a linear transformation is applied to the input, followed by a ReLU activation function.

In [18]:
size = 3

sequential = torch.nn.Sequential(
    torch.nn.Linear(size, size, bias=False),
    torch.nn.ReLU()
)

X = torch.randn([3, 3])
sequential(X)

tensor([[0.0000, 0.0000, 0.8781],
        [0.4362, 0.0000, 0.7350],
        [0.0000, 0.0000, 1.1225]], grad_fn=<ReluBackward0>)

### Separate class

You can define a neural network as a separate class, which allows you to add custom logic for initialization or network-specific procedures. To create a network class, follow these rules:

- **Inherit from `torch.nn.Module`:** This establishes your class as a PyTorch module, providing access to its functionality.
- **Call `super().__init__()` in the constructor:** This initializes the base `nn.Module` class, ensuring proper setup.
- **Define a `forward` method:** This method implements the computational procedure of your network. It defines how input data flows through your layers to produce output. 

---

The following cell defines a set of Linear layers whose size is determined during class creation. The forward method standardizes the data before applying the network. 

In [12]:
class ExampleNetwork(torch.nn.Module):
    def __init__(self, layers_number: int, neurons: int):

        super().__init__()

        self.network = torch.nn.Sequential(*[
            torch.nn.Linear(neurons, neurons)
            for i in range(layers_number)
        ])
    
    def forward(self, X: torch.Tensor):
        X = (X - X.mean(axis=0, keepdim=True))/X.std(axis=0, keepdim=True)
        return self.network(X)

Let's check if the network we've defined works as expected. 

In [14]:
ExampleNetwork(layers_number=10, neurons=3)(X = torch.randn([5, 3]))

tensor([[-0.2482,  0.0882,  0.4507],
        [-0.2465,  0.0897,  0.4466],
        [-0.2531,  0.0827,  0.4587],
        [-0.2463,  0.0899,  0.4459],
        [-0.2461,  0.0892,  0.4429]], grad_fn=<AddmmBackward0>)

## Device

For tensors and the model you are using, you can select the device in which the tensor is to be used. Find out more in the [specific page](torch/devices.ipynb).

---

The following example shows how to check the `device` for your tensor. By default it's cpu.

In [None]:
torch.randn([5, 5]).device

device(type='cpu')

In PyTorch, most objects typically encapsulate tensors, allowing you to access their device through the tensors' devices. The following example demonstrates how to access the devices for the weights and biases of two linear layers.

In [None]:
example_sequential = torch.nn.Sequential(
    torch.nn.Linear(3,3),
    torch.nn.Linear(3,3)
)

for param in example_sequential.parameters():
    print(param.device)

cpu
cpu
cpu
cpu
