In [2]:
import torch
from torch.nn import Embedding, LSTM
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

Let's create a dummy sequence, You can assume these are indexes of our words.
The objective of the whole process is to make a variable-length sequence suitable to be fed into an RNN(LSTM, GRU, etc) layer.
Here are the steps:

1. Create a variable-length sequence.
2. Pad the sequences to make them of the same length
3. Create an embedding for them.
4. Pack the embeddings (to speedup the RNN calculations)
5. Feed the (now packed) embeddings to LSTM to get outputs

To achieve the goal, we are going to use two utility functions from PyTorch. 

- **pad_sequence** (Simply adds zeros to the sequences so that they all have the same size)
- **pack_padded_sequence** ( Not necessarily required, but to be able to use the GPU more efficiently and speed up the RNN calculations) 

So the first methods pads (adds zeros) to the sequence, and the second method packs the previously padded sequence.

In [3]:
sequences = [   
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9,10]
]
sequences

[[1, 2, 3], [4, 5], [6, 7, 8, 9, 10]]

Before starting the padding thought, first we need to store the length of each sequence.
We need these lengths so that later on, we know exactly how to pack them and get rid of extra zeros in each sequence.
This way, we don't have to do additional calculations on some useless zeros(pad values) and this will speed up our RNN calculations.

In [4]:
sequence_lengths=torch.LongTensor([len(sequence) for sequence in sequences])

In [13]:
#embeding_dim = 6
#dictionary_size = 11

embedding = Embedding(num_embeddings=11, embedding_dim=6)
lstm = LSTM(input_size=6, hidden_size=2, batch_first=True) 

In [10]:
#Padding
sequences=[torch.LongTensor(sequence) for sequence in sequences]
sequences_padded = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)
sequences_padded

tensor([[ 1,  2,  3,  0,  0],
        [ 4,  5,  0,  0,  0],
        [ 6,  7,  8,  9, 10]])

In [11]:
#Embedding
sequences_embeded=embedding(sequences_padded)
sequences_embeded

tensor([[[-2.1085,  0.4195, -0.1613, -1.6155, -0.6733, -0.0546],
         [ 0.9498,  0.5771, -0.5823,  1.5888, -0.3406, -0.1107],
         [ 1.6079,  0.1302, -0.8248,  2.2530, -0.5772,  0.5517],
         [ 1.3814, -1.1445, -0.4823, -2.0504,  1.1077, -2.1600],
         [ 1.3814, -1.1445, -0.4823, -2.0504,  1.1077, -2.1600]],

        [[ 0.7458, -1.3027,  0.7759, -0.5413,  1.6839,  0.2756],
         [ 1.9954,  0.1202, -0.6038,  1.1295, -0.1091,  1.0352],
         [ 1.3814, -1.1445, -0.4823, -2.0504,  1.1077, -2.1600],
         [ 1.3814, -1.1445, -0.4823, -2.0504,  1.1077, -2.1600],
         [ 1.3814, -1.1445, -0.4823, -2.0504,  1.1077, -2.1600]],

        [[ 1.6938, -1.3024,  0.5783,  0.0236, -0.6317, -0.6636],
         [-0.0705,  1.1430, -1.3201,  1.0098, -0.5735, -0.3864],
         [-0.4300, -0.7471,  0.0043, -0.6478, -3.1524,  1.5213],
         [ 0.5949,  1.6353, -0.1353,  1.3254,  0.7167, -0.6114],
         [ 0.1142, -0.4261,  0.2744, -0.8125,  1.1292,  0.0725]]],
       grad_fn=<Emb

In [14]:
#Packing
sequences_packed = pack_padded_sequence(sequences_embeded, sequence_lengths.cpu().numpy(), batch_first=True,enforce_sorted=False)
sequences_packed

PackedSequence(data=tensor([[ 1.6938, -1.3024,  0.5783,  0.0236, -0.6317, -0.6636],
        [-2.1085,  0.4195, -0.1613, -1.6155, -0.6733, -0.0546],
        [ 0.7458, -1.3027,  0.7759, -0.5413,  1.6839,  0.2756],
        [-0.0705,  1.1430, -1.3201,  1.0098, -0.5735, -0.3864],
        [ 0.9498,  0.5771, -0.5823,  1.5888, -0.3406, -0.1107],
        [ 1.9954,  0.1202, -0.6038,  1.1295, -0.1091,  1.0352],
        [-0.4300, -0.7471,  0.0043, -0.6478, -3.1524,  1.5213],
        [ 1.6079,  0.1302, -0.8248,  2.2530, -0.5772,  0.5517],
        [ 0.5949,  1.6353, -0.1353,  1.3254,  0.7167, -0.6114],
        [ 0.1142, -0.4261,  0.2744, -0.8125,  1.1292,  0.0725]],
       grad_fn=<PackPaddedSequenceBackward>), batch_sizes=tensor([3, 3, 2, 1, 1]), sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))

In [15]:
#LSTM
outputput_packed, (hidden,context)=lstm(sequences_packed)
outputput_packed

PackedSequence(data=tensor([[ 0.0749, -0.1175],
        [ 0.0207,  0.1377],
        [ 0.1525,  0.0380],
        [-0.0025, -0.1506],
        [-0.0076, -0.0971],
        [ 0.1263, -0.0889],
        [ 0.0208, -0.5788],
        [-0.0066, -0.1679],
        [-0.0239, -0.0715],
        [ 0.0845, -0.0212]], grad_fn=<CatBackward>), batch_sizes=tensor([3, 3, 2, 1, 1]), sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))

The output is LSTM, *output_packed* is now is a Named Tuple which provides some additional information that we might not care about. The actual output that we want is in *data* field since we might need to use it as in input to our LSTM in the next iteration.
We can access the data (the actual output values of LSTM) in two ways

**1.Easy way**

In [16]:
outputput_packed.data

tensor([[ 0.0749, -0.1175],
        [ 0.0207,  0.1377],
        [ 0.1525,  0.0380],
        [-0.0025, -0.1506],
        [-0.0076, -0.0971],
        [ 0.1263, -0.0889],
        [ 0.0208, -0.5788],
        [-0.0066, -0.1679],
        [-0.0239, -0.0715],
        [ 0.0845, -0.0212]], grad_fn=<CatBackward>)

We might also need the *hidden state* of the last layer of LSTM (context vector).Possibly for your decoder.

In [17]:
hidden[-1:]

tensor([[[-0.0066, -0.1679],
         [ 0.1263, -0.0889],
         [ 0.0845, -0.0212]]], grad_fn=<SliceBackward>)

**2.Slightly more involved way**

*pad_packed_sequence*  might seems a bit confusing at the beginning but its role is actually very simple. Whenever we pack something we need to be able to unpack it again, right? (Think of zip and unzip). So here, this function just un-packs a sequence. (Which obviously should have already been packed). How does it do the unpacking? So, we just pass our packed sequence to the function to get its unpacked version back. It also returns the original length of each sequence for our convenience.

In [18]:
output, input_sequence_sizes = pad_packed_sequence(outputput_packed, batch_first=True)
output

tensor([[[ 0.0207,  0.1377],
         [-0.0076, -0.0971],
         [-0.0066, -0.1679],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]],

        [[ 0.1525,  0.0380],
         [ 0.1263, -0.0889],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]],

        [[ 0.0749, -0.1175],
         [-0.0025, -0.1506],
         [ 0.0208, -0.5788],
         [-0.0239, -0.0715],
         [ 0.0845, -0.0212]]], grad_fn=<IndexSelectBackward>)