# Usage Patterns for LSTMs in PyTorch

The video material introduces three usage patterns for recurrent neural networks: as an *encoder*, as a *transducer*, and as a *decoder*. In this notebook you will learn how to realise the *Encoder* and the *Transducer* patterns in PyTorch with an LSTM architecture.

In [None]:
import torch

from torch import nn as nn

## Sample input

To illustrate the two patterns, we use an input batch `x` containing a single sequence with three elements, each of which is a vector of size five.

In [None]:
torch.manual_seed(42)

In [None]:
x = torch.rand(1, 3, 5)

Here is how our concrete `x` looks like:

In [None]:
x

## Model

Next, we define the LSTM model. In PyTorch, the LSTM architecture is implemented by the class [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html).

We use an LSTM with *input_size* of&nbsp;5 and a *hidden_size* of&nbsp;2. This LSTM will process the sequence of 5-dimensional vectors in&nbsp;`x` and map each input vector to an hidden state in the form of a 2-dimensional vector.

Per default, an [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) expects its input to have the shape (*sequence_length*, *batch_size*, *input_size*). For our purposes, it is easier to instead take the input in the form (*batch_size*, *sequence_length*, *input_size*). To get this behaviour, we set the `batch_first` argument to `True`.

In [None]:
model = nn.LSTM(5, 2, batch_first=True)

## Output

We are now ready to feed the example input to our model:

In [None]:
output, (h_n, c_n) = model(x)

The result of the `forward()` method has two components:

The first component is a tensor `output` that holds the hidden states computed by the LSTM, for each position of the input sequence. Consequently, the shape of `output` is (*batch_size*, *sequence_length*, *hidden_size*).

In [None]:
output.shape

In [None]:
output

The second component is a pair of tensors `h_n` and `c_n` which represent the final hidden state and cell state of the LSTM, respectively. These are the hidden state and cell state computed at the last position of the input sequence. Their common shape is (1, *batch_size*, *hidden_size*):

In [None]:
h_n.shape

In [None]:
c_n.shape

We can verify that (the only element of) `h_n` is indeed identical to the last row of `output`:

In [None]:
h_n[0]

**🤔 Question 1: Batch size**

> How do the concrete shapes of `output`, `h_n` and `c_n` change when you process a batch of seven sequences instead of just one?

**🤔 Question 2: Stacked LSTMs**

> The [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class supports stacked LSTMs with multiple layers. How do the shapes of `output`, `h_n` and `c_n` change when you define the model to have three layers? How can you then get the final state of the final layer?

## Encoder

To realise the *Encoder* pattern, we simply return the final hidden state:

In [None]:
def encode(model, x):
    output, (h_n, c_n) = model.forward(x)
    return h_n[-1]

In [None]:
y = encode(model, x)

In [None]:
y.shape

In [None]:
y

**🤔 Question 3: Bi-directional LSTMs**

> In addition to stacking, the [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) class also supports bi-directional networks. How do the shapes of `output`, `h_n` and `c_n` change in that case? How can you get the final states for the two uni-directional networks?

## Transducer

To realise a *Transducer*, we return the complete output tensor `output`.

In [None]:
def transduce(model, x):
    output, (h_n, c_n) = model.forward(x)
    return output

In [None]:
y = transduce(model, x)

In [None]:
y.shape

In [None]:
y

## Manual unrolling

Recall from Lecture&nbsp;2.3 that an RNN implements a recursive computation on sequences: Starting from an initial hidden state $h_0$, at each sequence position&nbsp;$i$, it consumes the previous hidden state $h_{i-1}$ and the current input $x_i$ to compute an output $y_i$ and a next hidden state $h_i$. We say that the RNN is ‘unrolled’ over a sequence of inputs.

In both the encoder and the transducer, the unrolling happened ‘behind the scenes’ when calling the `forward()` method. In some use cases, however, we may want to have more control and do the unrolling manually. (One example is the Encoder–Decoder architecture that you will learn about in Unit&nbsp;5.)

The code in the next cell implements a function `unroll()` that computes the unrolling step-by-step, and at each position&nbsp;$i$ yields the next output $y_i$.

In [None]:
def unroll(model, h_0, c_0, x):
    # Maintain the previous hidden state and cell state
    h, c = h_0, c_0

    # Loop over all positions in the sequence
    for i in range(x.shape[1]):
        # Get the one-element sub-sequence of x for the current position i
        x_i = x[:, i:i+1, :]

        # Do one step of the unrolling
        output, (h, c) = model.forward(x_i, (h, c))

        # Yield the current output
        yield output

When calling the `unroll()` function, we need to specify an initial hidden state and cell state. The default initial states are tensors of zeros.

In [None]:
h_0, c_0 = torch.zeros(1, 1, 2), torch.zeros(1, 1, 2)

We can now verify that the manual unrolling produces the same output as the automatic unrolling that we used earlier:

In [None]:
for output in unroll(model, h_0, c_0, x):
    print(output)

That’s all, folks!