# Recurrent layers

This page considers relasilation of the recurent layers in torch. Find out more at the [specific page](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) of the torch documentation.

In [1]:
import torch
from torch.nn import RNN
from IPython.display import Latex

## Equivalent realisation

On the [RNN](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html) page of the PyTorch documentation, you can find a function that implements a transformation equivalent to `torch.nn.RNN`. To explore different parameters of the recurrent layer, it's convenient to have a function that we can modify to gain a better understanding. This section consider how basic version of this function can be used.

---

The following cell shows a modification of this function that takes only parts of the transformation as parameters, making it more convenient.

In [84]:
def forward(
    x: torch.Tensor,
    hidden_size: int,
    weight_ih: list[torch.Tensor],
    bias_ih: list[torch.Tensor],
    weight_hh: list[torch.Tensor],
    bias_hh: list[torch.Tensor],
    h_0 : torch.Tensor | None = None,
    num_layers: int = 1,
    batch_first: bool = False
):
    if batch_first:
        x = x.transpose(0, 1)
    seq_len, batch_size, _ = x.size()
    if h_0 is None:
        h_0 = torch.zeros(num_layers, batch_size, hidden_size)
    h_t_minus_1 = h_0
    h_t = h_0
    output = []
    for t in range(seq_len):
        for layer in range(num_layers):
            h_t[layer] = torch.tanh(
                x[t] @ weight_ih[layer].T
                + bias_ih[layer]
                + h_t_minus_1[layer] @ weight_hh[layer].T
                + bias_hh[layer]
            )
        output.append(h_t[-1].clone())
        h_t_minus_1 = h_t
    output = torch.stack(output)
    if batch_first:
        output = output.transpose(0, 1)
    return output, h_t

Following code creates `RNN` layer and typical input for it:

In [85]:
sequence_len = 4
batch_size = 5

rnn = RNN(input_size=2, hidden_size=3)
x = torch.randn(sequence_len, batch_size, rnn.input_size)

To use custom `forward`, you must pass weights as lists (there may be more than one layer under `torch.nn.RNN`), and all other parameters of the layer.

In [90]:
function_out = forward(
    x=x,
    hidden_size=rnn.hidden_size,
    weight_ih=[rnn.weight_ih_l0],
    bias_ih=[rnn.bias_ih_l0],
    weight_hh=[rnn.weight_hh_l0],
    bias_hh=[rnn.bias_hh_l0]
)

layer_out = rnn(x)

The following cell shows that the results of the custom realization and `torch.nn.RNN` are the same.

In [94]:
torch.testing.assert_close(
    actual=function_out[0],
    expected=layer_out[0]
)
torch.testing.assert_close(
    actual=function_out[1],
    expected=layer_out[1]
)

## Batch dimention

By default `torch.nn.RNN` is supposed to work on tensors with dimensionality $(L, N, H_{in})$, which can be considered as **sequence of batches**. But there is `batch_first` parameter which makes `torch.nn.RNN` layer to work with dimensionality $(N, L, H_{in})$, so it can be considered as **batch of sequences** - which actually is convenied in most of the cases.

Here:

- $L$: lenght of the seuqnce.
- $N$: batch size.
- $H_{in}$: dimentionality of the element of the sequence.

---

Consider the difference using the tensor generated in the next cell:

In [11]:
X = torch.empty(5, 7, 3)

By default we got last state for 7 items in batch.

In [14]:
rnn = RNN(input_size=3, hidden_size=10)
rnn(X)[1].shape

torch.Size([1, 7, 10])

But the same code, with only difference in the `batch_first=True` argument, resulted in last state for 5 items in batch.

In [13]:
rnn = RNN(input_size=3, hidden_size=10, batch_first=True)
rnn(X)[1].shape

torch.Size([1, 5, 10])

## Layers number

In [2]:
rnn = RNN(input_size=2, hidden_size=10, num_layers=4)
input = torch.randn(3, 10, rnn.input_size)

In [3]:
display(rnn.weight_hh_l0.shape)
display(rnn.weight_hh_l1.shape)
display(rnn.weight_hh_l2.shape)

torch.Size([10, 10])

torch.Size([10, 10])

torch.Size([10, 10])

In [4]:
display(rnn.weight_ih_l0.shape)
display(rnn.weight_ih_l1.shape)
display(rnn.weight_ih_l2.shape)

torch.Size([10, 2])

torch.Size([10, 10])

torch.Size([10, 10])

In [7]:
def forward(
    x, 
    hidden_size, 
    weight_ih: list[torch.Tensor], 
    bias_ih: list[torch.Tensor], 
    weight_hh: list[torch.Tensor], 
    bias_hh: list[torch.Tensor], 
    num_layers=1
):
    seq_len, batch_size, _ = x.size()
    h_0 = torch.zeros(num_layers, batch_size, hidden_size)
    h_t_minus_1 = h_0
    h_t = h_0
    output = []
    for t in range(seq_len):
        print("="*80)
        display(Latex(f"Processing $x_{{{t}}}$ elemnt of sequence."))
        for layer in range(num_layers):
            h_t[layer] = torch.tanh(
                x[t] @ weight_ih[layer]
                + bias_ih[layer]
                + h_t_minus_1[layer] @ weight_hh[layer]
                + bias_hh[layer]
            )
            display(Latex("$x_i$"))
        output.append(h_t[-1])
        h_t_minus_1 = h_t
    output = torch.stack(output)
    return output, h_t

In [8]:
weight_hh = [rnn.weight_hh_l0, rnn.weight_hh_l1, rnn.weight_hh_l2]
weight_ih = [rnn.weight_ih_l0, rnn.weight_ih_l1, rnn.weight_ih_l2]
bias_hh = [rnn.bias_hh_l0, rnn.bias_hh_l1, rnn.bias_hh_l2]
bias_ih = [rnn.bias_ih_l0, rnn.bias_ih_l1, rnn.bias_ih_l2]

forward(
    x=input,
    hidden_size=rnn.hidden_size,
    weight_ih=weight_ih,
    bias_ih=bias_ih,
    weight_hh=weight_hh,
    bias_hh=bias_hh,
    num_layers=rnn.num_layers
)



<IPython.core.display.Latex object>

RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x2 and 10x2)

In [15]:
input[0] @ weight_ih[0]

RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x2 and 10x2)