In [1]:
import torch
from torch.nn import Embedding, LSTM
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

Let's create a dummy sequence, You can assume these are indexes of our words.
The objective of the whole process is to make a variable-length sequence suitable to be fed into an RNN(LSTM, GRU, etc) layer.
Here are the steps:

1. Create a variable-length sequence.
2. Pad the sequences to make them of the same length
3. Create an embedding for them.
4. Pack the embeddings (to speedup the RNN calculations)
5. Feed the (now packed) embeddings to LSTM to get outputs

To achieve the goal, we are going to use two utility functions from PyTorch. 

- **pad_packed_sequence** (Add zeros to the sequences so that they all have the same size)
- **pack_padded_sequence** ( Not necessarily required, but to be able to use GPU more efficiently and speed up the RNN calculations) 

In [2]:
sequences = [   
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9,10]
]
sequences

[[1, 2, 3], [4, 5], [6, 7, 8, 9, 10]]

First, we need to store the length of each sequence before attempting to do any paddings.
We need these lengths so that later on, we know exactly how to pack them and get rid of extra zeros in each sequence.
This way we don't have to do additional calculations on some useless zeros(pad values) and this will speed up our RNN calculations.

In [12]:
sequence_lengths=torch.LongTensor([len(sequence) for sequence in sequences])

In [10]:
embedding = Embedding(num_embeddings=11, embedding_dim=4)
lstm = LSTM(input_size=4, hidden_size=2, batch_first=True) 

In [13]:
#Padding
sequences=[torch.LongTensor(sequence) for sequence in sequences]
sequences_padded = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)
sequences_padded

tensor([[ 1,  2,  3,  0,  0],
        [ 4,  5,  0,  0,  0],
        [ 6,  7,  8,  9, 10]])

In [14]:
#Embedding
sequences_embeded=embedding(sequences_padded)
sequences_embeded

tensor([[[ 0.7392, -0.5116,  1.6044,  0.2385],
         [ 0.0548, -0.3227,  1.1412,  1.2520],
         [ 0.1933,  0.1486, -0.5080, -1.4867],
         [ 0.6410, -0.8795,  0.3313,  0.6747],
         [ 0.6410, -0.8795,  0.3313,  0.6747]],

        [[ 0.7591,  2.0299, -0.5631, -0.4719],
         [ 0.3002, -0.5866, -0.5725,  0.0149],
         [ 0.6410, -0.8795,  0.3313,  0.6747],
         [ 0.6410, -0.8795,  0.3313,  0.6747],
         [ 0.6410, -0.8795,  0.3313,  0.6747]],

        [[ 0.0741,  2.9282,  0.9944, -1.6001],
         [ 0.8746,  0.2755,  0.2825,  1.8457],
         [ 0.5073, -1.2891,  1.3445, -0.0612],
         [ 0.1452, -1.5582, -0.1543, -1.0414],
         [-1.4174, -0.7062, -0.4610, -1.4128]]], grad_fn=<EmbeddingBackward>)

In [15]:
#Packing
sequences_packed = pack_padded_sequence(sequences_embeded, sequence_lengths.cpu().numpy(), batch_first=True,enforce_sorted=False)
sequences_packed

PackedSequence(data=tensor([[ 0.0741,  2.9282,  0.9944, -1.6001],
        [ 0.7392, -0.5116,  1.6044,  0.2385],
        [ 0.7591,  2.0299, -0.5631, -0.4719],
        [ 0.8746,  0.2755,  0.2825,  1.8457],
        [ 0.0548, -0.3227,  1.1412,  1.2520],
        [ 0.3002, -0.5866, -0.5725,  0.0149],
        [ 0.5073, -1.2891,  1.3445, -0.0612],
        [ 0.1933,  0.1486, -0.5080, -1.4867],
        [ 0.1452, -1.5582, -0.1543, -1.0414],
        [-1.4174, -0.7062, -0.4610, -1.4128]],
       grad_fn=<PackPaddedSequenceBackward>), batch_sizes=tensor([3, 3, 2, 1, 1]), sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))

In [16]:
#LSTM
outputput_packed, (hidden,context)=lstm(sequences_packed)
outputput_packed

PackedSequence(data=tensor([[ 0.0060, -0.0090],
        [ 0.0550,  0.0006],
        [ 0.0413, -0.0685],
        [ 0.2042, -0.0397],
        [ 0.1922, -0.0140],
        [ 0.2770, -0.0185],
        [ 0.2018, -0.0015],
        [ 0.0879,  0.0264],
        [ 0.1439,  0.0429],
        [ 0.1171,  0.1591]], grad_fn=<CatBackward>), batch_sizes=tensor([3, 3, 2, 1, 1]), sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))

The output is LSTM, *output_packed* is now is a Named Tuple which provides some additional information that we might not care about. The actual output that we want is in *data* field since we might need to use it as in input to our LSTM in the next iteration.
We can access the data (the actual output values of LSTM) in two ways

**1.Easy way**

In [None]:
outputput_packed.data

We might also need the *hidden state* of the last layer of LSTM (context vector).Possibly for your decoder.

In [None]:
hidden[-1:]

**2.Slightly more involved way**

In [None]:
output, input_sequence_sizes = pad_packed_sequence(outputput_packed, batch_first=True)
output