# Recurrent layer

This page explains the concept of a recurrent layer.

The key idea is to create a mechanism where each input affects the processing and outcome of subsequent inputs.

![](recurent_layer_files/recurent_schema.svg)

An RNN is essentially a single layer. At each step, it uses $h_{t-1}$, a special state vector from the previous step.

Strictly speaking, the deduction is as follows:  

$$h_t = f(x_t W^T_1 + b_1 + h_{t-1} W^T_2 + b_2)$$  

Where:  
- $x_t$: input at the $t$-th step.  
- $h_t$: vector that describes hidden state at the $t$-th step.  
- $W_1$: weights associated with the input.  
- $W_2$: weights associated with the state.  
- $b_1$: bias associated with the input.  
- $b_2$: bias associated with the state.  
- $f$: activation function, typically a hyperbolic tangent.  

## Realization on python

In this section, we will step by step implement the computations performed by a recurrent layer and compare them with `torch.nn.RNN` as the reference.

The following cell defines the parameters of the recurrent procedure that we will use as an example.

We have:  

- A sequence of 15 elements: $\left\{ x_1, x_2, \dots, x_{15} \right\}$.  
- Each element is a vector of 5 elements: $x_t \in \mathbb{R}^5$.  
- We are working with a sample containing 15 sequences.  
- The state vector is a 3-element vector: $h_t \in \mathbb{R}^3$.  
- The activation function $f(x)$ is the hyperbolic tangent.  

In [11]:
import torch

samples_size = 10
element_size = 5
sequence_size = 15
state_size = 3
activation = torch.nn.Tanh()

input_data = torch.rand(samples_size, sequence_size, element_size)

For given inputs:

- Input weights $W_1$ should be a $3 \times 5$ matrix.  
- Input bias $b_1$ should be a vector with $3$ elements.  
- State weights $W_2$ should be a $3 \times 3$ matrix.  
- State bias $b_2$ should be a vector with $3$ elements.  

In [13]:
W_1 = torch.rand(state_size, element_size)
b_1 = torch.rand(state_size)
W_2 = torch.rand(state_size, state_size)
b_2 = torch.rand(state_size)

Realisation of $x_1 W^T_1 + b_1 + h_{0} W^T_2 + b_2$ will take form:

In [12]:
state = torch.zeros(samples_size, state_size)
(input_data[:, 0, :] @ W_1.T) + b_1 + (state @ W_2.T) + b_2

tensor([[3.1063, 2.5003, 2.6427],
        [3.3605, 2.6550, 2.9545],
        [3.1519, 2.5479, 2.1526],
        [2.9270, 2.3460, 2.6681],
        [3.4005, 2.7839, 2.4954],
        [3.5589, 2.9070, 2.9149],
        [3.3401, 2.7549, 2.4623],
        [3.0115, 2.4006, 2.3053],
        [3.2647, 2.6268, 2.4131],
        [2.8917, 2.3394, 2.4433]])

The implementation of the full recurrent procedure for all 15 elements of the sequence is provided in the following cell:

In [15]:
states = [state]

for i in range(input_data.shape[1]):
    res = activation( 
        (input_data[:, i, :] @ W_1.T) + b_1
        + (states[-1] @ W_2.T) + b_2
    )
    states.append(res)

my_ans = (
    torch.stack(states[1:]).permute((1, 0, 2)), 
    states[-1][None, ...]
)

Now, we will do the same using the ready-made `torch.nn.RNN` class. Before computing, we need to set the weights of the instance to match those used in the custom procedure:

In [20]:
rnn = torch.nn.RNN(element_size, state_size, batch_first=True)

with torch.no_grad():
    # Copying the parameters that were used earlier:
    rnn.weight_ih_l0.copy_(W_1)
    rnn.bias_ih_l0.copy_(b_1)
    rnn.weight_hh_l0.copy_(W_2)
    rnn.bias_hh_l0.copy_(b_2)

    torch_ans = rnn(input_data)

The following cell verifies that both outputs are identical.

In [19]:
torch.testing.assert_close(torch_ans[0], my_ans[0])
torch.testing.assert_close(torch_ans[1], my_ans[1])