In [97]:
import torch
from torch.nn import Embedding, LSTM
from torch.nn.utils.rnn import pad_sequence,pack_padded_sequence, pad_packed_sequence

In this notebook we are going to explain how to make a variable-length sequence suitable to be fed into an RNN (LSTM, GRU, etc) layer.

Here are the steps:

1. Create a variable-length sequence.
2. Pad the sequences to make them of the same length
3. Create an embedding for them.
4. Pack the embeddings (to speedup the RNN calculations)
5. Feed the (now packed) embeddings to LSTM to get outputs

To achieve the goal, we are going to use two utility functions from PyTorch. 

- **pad_sequence** (Simply adds zeros to the sequences so that they all have the same size)
- **pack_padded_sequence** (Not necessarily required, but to be able to use the GPU more efficiently and speed up the RNN calculations)

In simple terms, the first methods pads (adds zeros) to the sequence, and the second one packs the previously padded sequence.

Lest first create our plain embedding and lstm layers:

In [95]:
embedding = Embedding(num_embeddings=11, embedding_dim=6)
lstm = LSTM(input_size=6, hidden_size=2, batch_first=True) 

Now, let's create a dummy sequence, You can assume these are our word indexes from a text.

In [112]:
sequences = [   
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9,10]
]

Before starting the padding, first we need to store the length of each sequence.
We need these lengths so that later on, we know exactly how to pack them and get rid of extra zeros in each sequence.
This way, we don't have to do additional calculations on some useless zeros(pad values) and this will speed up our RNN calculations.

In [113]:
#Just converting to tensor
sequences = [torch.LongTensor(sequence) for sequence in sequences]

sequence_lengths = torch.LongTensor([len(sequence) for sequence in sequences])

In [101]:
#Padding
sequences_padded = pad_sequence(sequences, batch_first=True)
sequences_padded

tensor([[ 1,  2,  3,  0,  0],
        [ 4,  5,  0,  0,  0],
        [ 6,  7,  8,  9, 10]])

In [103]:
#Embedding
sequences_embedded = embedding(sequences_padded)
sequences_embedded

tensor([[[ 0.0156, -1.5339, -1.2978,  0.1378,  1.8157, -1.3471],
         [ 0.0477, -0.4786,  0.3953, -0.2040,  0.0090,  0.2621],
         [-0.1940, -0.7023, -0.0594, -1.4842, -0.2198,  2.0790],
         [-0.5867,  0.1666, -0.3996,  0.1341, -0.1119,  0.7339],
         [-0.5867,  0.1666, -0.3996,  0.1341, -0.1119,  0.7339]],

        [[ 0.1332,  0.8129,  0.6422, -1.3937,  0.4574,  1.1438],
         [-1.0561, -0.0487, -1.0327,  1.3329, -0.3732,  0.2648],
         [-0.5867,  0.1666, -0.3996,  0.1341, -0.1119,  0.7339],
         [-0.5867,  0.1666, -0.3996,  0.1341, -0.1119,  0.7339],
         [-0.5867,  0.1666, -0.3996,  0.1341, -0.1119,  0.7339]],

        [[-0.0696,  0.6170, -1.0367,  1.3480, -0.2523,  1.3595],
         [ 0.1045, -0.9941,  0.3233,  0.6303, -0.1361,  0.7212],
         [ 1.0210,  0.4265,  1.2371, -0.1587, -0.6275, -0.1299],
         [-0.6500, -2.0193,  0.2288,  0.5275, -1.2682, -0.1638],
         [ 0.1543, -0.0838,  1.6952,  1.5009, -0.0633,  1.0448]]],
       grad_fn=<Emb

In [114]:
#Packing
sequences_packed = pack_padded_sequence(sequences_embedded, sequence_lengths.numpy(), batch_first=True,enforce_sorted=False)
sequences_packed

PackedSequence(data=tensor([[-0.0696,  0.6170, -1.0367,  1.3480, -0.2523,  1.3595],
        [ 0.0156, -1.5339, -1.2978,  0.1378,  1.8157, -1.3471],
        [ 0.1332,  0.8129,  0.6422, -1.3937,  0.4574,  1.1438],
        [ 0.1045, -0.9941,  0.3233,  0.6303, -0.1361,  0.7212],
        [ 0.0477, -0.4786,  0.3953, -0.2040,  0.0090,  0.2621],
        [-1.0561, -0.0487, -1.0327,  1.3329, -0.3732,  0.2648],
        [ 1.0210,  0.4265,  1.2371, -0.1587, -0.6275, -0.1299],
        [-0.1940, -0.7023, -0.0594, -1.4842, -0.2198,  2.0790],
        [-0.6500, -2.0193,  0.2288,  0.5275, -1.2682, -0.1638],
        [ 0.1543, -0.0838,  1.6952,  1.5009, -0.0633,  1.0448]],
       grad_fn=<PackPaddedSequenceBackward>), batch_sizes=tensor([3, 3, 2, 1, 1]), sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))

In [104]:
#LSTM
output_packed, (hidden,context) = lstm(sequences_packed)
output_packed

PackedSequence(data=tensor([[ 0.2482, -0.0007],
        [ 0.0330,  0.0671],
        [-0.5475, -0.1302],
        [ 0.0246, -0.4361],
        [ 0.1176,  0.0187],
        [-0.2820,  0.0623],
        [-0.0620, -0.0509],
        [ 0.1343,  0.0185],
        [ 0.1416, -0.0037],
        [ 0.0361, -0.2860]], grad_fn=<CatBackward>), batch_sizes=tensor([3, 3, 2, 1, 1]), sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))

LSTM returns a packed output since it has received a packed input. The *packed_output* is in fact is a Named Tuple which provides some additional information that we might not care about. The actual output that we want is in compress(packed) form in the *data* field. How should we uncompress it and get our actual output values? 


**UnPacking**

*pad_packed_sequence*  might seems a bit confusing at the beginning but its role is actually very simple. Whenever we pack something we need to be able to unpack it again, right? (Think of zip and unzip). So here, this function just un-packs a sequence. (Which obviously should have already been packed).

In [81]:
unpacked_out, input_sequence_sizes = pad_packed_sequence(output_packed, batch_first=True)
unpacked_out

tensor([[[ 0.1460, -0.0190],
         [ 0.0310,  0.1895],
         [-0.0563,  0.1111],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]],

        [[ 0.1946, -0.0966],
         [ 0.3288,  0.0354],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]],

        [[ 0.2749, -0.2889],
         [ 0.3955, -0.2364],
         [ 0.4496, -0.3379],
         [ 0.7398, -0.1794],
         [ 0.2583, -0.4079]]], grad_fn=<IndexSelectBackward>)

If we closely look at this unpacked output and compare it to the packed version above, we can see that the packed version does not have any results for the padded rows (i.e rows with zero, remember we used 0 for padding) . This  means that we have not done any unnecessary computations in the LSTM. Thats the benefit of using a packed input for our recurrent layer.

We can treat this output as a typical LSTM output and used it for any related calculations(e.g. encoder-decoder). Additionally, if we need the *hidden state* of the last layer of LSTM (context vector), we can get it like this:

In [116]:
hidden[-1]

tensor([[ 0.1343,  0.0185],
        [-0.2820,  0.0623],
        [ 0.0361, -0.2860]], grad_fn=<SelectBackward>)