# Pytorch PackedSequence Tutorial
---

## Contents

1. [Preprocessing](#1.-Preprocessing)
2. [How to use PackedSequence object in pytorch](#2.-How-to-use-PackedSequence-object-in-pytorch)

---

![fig1](./figs/0705img1.png)

figure from: https://medium.com/huggingface/understanding-emotions-from-keras-to-pytorch-3ccb61d5a983 

## 1. Preprocessing

Always have to do this preprocessing, while you are working on NLP.

* make vocabulary, one token matches single unique index.
* add <pad> token.
* change all tokens to vocabulary index that you made.

In [1]:
import torch
import torch.nn as nn
batch_data = ["I love Mom ' s cooking", "I love you too !", "No way", "This is the shit", "Yes"]
input_seq = [s.split() for s in batch_data]
max_len = 0
for s in input_seq:
    if len(s) >= max_len:
        max_len = len(s)
vocab = {w: i for i, w in enumerate(set([t for s in input_seq for t in s]), 1)}
vocab["<pad>"] = 0
input_seq = [s+["<pad>"]*(max_len-len(s)) if len(s) < max_len else s for s in input_seq]
input_seq2idx = torch.LongTensor([list(map(vocab.get, s)) for s in input_seq])

In [2]:
input_seq

[['I', 'love', 'Mom', "'", 's', 'cooking'],
 ['I', 'love', 'you', 'too', '!', '<pad>'],
 ['No', 'way', '<pad>', '<pad>', '<pad>', '<pad>'],
 ['This', 'is', 'the', 'shit', '<pad>', '<pad>'],
 ['Yes', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>']]

In [3]:
input_seq2idx

tensor([[ 13,   9,   3,  15,   1,   2],
        [ 13,   9,  10,   4,  16,   0],
        [  7,  11,   0,   0,   0,   0],
        [  6,  12,   8,  14,   0,   0],
        [  5,   0,   0,   0,   0,   0]])

---

## 2. How to use PackedSequence object in pytorch

1. [using pack_padded_sequence](#2.1-using-pack_padded_sequence)
2. [usage in RNN](#2.2-usage-in-RNN)
3. [unpack to get output](#2.3-unpack-to-get-output)
4. [last hidden state mapped to output](#2.4-last-hidden-state-mapped-to-output)

### 2.1 using pack_padded_sequence

Change batch matrix in a decreasing order of sentence length.

![fig2](./figs/0705img2.png)

figure from: https://medium.com/huggingface/understanding-emotions-from-keras-to-pytorch-3ccb61d5a983 

In [4]:
from torch.nn.utils.rnn import pack_padded_sequence

In [5]:
input_lengths = torch.LongTensor([torch.max(input_seq2idx[i, :].data.nonzero())+1 
                                  for i in range(input_seq2idx.size(0))])
input_lengths, sorted_idx = input_lengths.sort(0, descending=True)
input_seq2idx = input_seq2idx[sorted_idx]

In [6]:
input_seq2idx

tensor([[ 13,   9,   3,  15,   1,   2],
        [ 13,   9,  10,   4,  16,   0],
        [  6,  12,   8,  14,   0,   0],
        [  7,  11,   0,   0,   0,   0],
        [  5,   0,   0,   0,   0,   0]])

In [7]:
input_lengths  # length of each sentences in batch

tensor([ 6,  5,  4,  2,  1])

In [8]:
packed_input = pack_padded_sequence(input_seq2idx, input_lengths.tolist(), batch_first=True)

In [9]:
print(type(packed_input))
print(packed_input[0])  # packed data
print(packed_input[1])  # batch_sizes

<class 'torch.nn.utils.rnn.PackedSequence'>
tensor([ 13,  13,   6,   7,   5,   9,   9,  12,  11,   3,  10,   8,
         15,   4,  14,   1,  16,   2])
tensor([ 5,  4,  3,  3,  2,  1])


### 2.2 usage in RNN

Any RNN type(RNN, LSTM, GRU) that you use it's not matter.

Also, normaliy we use `Embedding layer` to map all tokens to a real number vector space. In traning step, let the network learn the suitable sapce to solve a task. If you don't familiar with `Embedding layer` search under references.

* Pytorch documentation: https://pytorch.org/docs/stable/nn.html?highlight=embedding#torch.nn.Embedding
* presented some picture how embedding works in my blog (korean) https://simonjisu.github.io/nlp/2018/04/20/allaboutwv2.html

In [10]:
vocab_size = len(vocab)
hidden_size = 2
embedding_size = 5
num_layers = 3

In [11]:
embed = nn.Embedding(vocab_size, embedding_size, padding_idx=0)
gru = nn.RNN(input_size=embedding_size, hidden_size=hidden_size, num_layers=num_layers, 
             bidirectional=False, batch_first=True)

In [12]:
embeded = embed(input_seq2idx)
packed_input = pack_padded_sequence(embeded, input_lengths.tolist(), batch_first=True)
packed_output, hidden = gru(packed_input)

In [13]:
packed_output[0].size(), packed_output[1]

(torch.Size([18, 2]), tensor([ 5,  4,  3,  3,  2,  1]))

### 2.3 unpack to get output

In [14]:
from torch.nn.utils.rnn import pad_packed_sequence

In [15]:
output, output_lengths = pad_packed_sequence(packed_output, batch_first=True)

In [16]:
output.size(), output_lengths

(torch.Size([5, 6, 2]), tensor([ 6,  5,  4,  2,  1]))

it fills all <pad\> output as zeros

In [17]:
packed_output[0]

tensor([[ 0.1857, -0.0526],
        [ 0.1857, -0.0526],
        [ 0.0081,  0.1482],
        [-0.2469,  0.2890],
        [ 0.0385,  0.0846],
        [ 0.2779,  0.0454],
        [ 0.2779,  0.0454],
        [-0.3101,  0.2290],
        [ 0.0850, -0.4064],
        [ 0.4876, -0.2039],
        [ 0.1064,  0.2356],
        [ 0.2403, -0.5423],
        [ 0.3256,  0.3797],
        [ 0.3147, -0.2829],
        [ 0.3291,  0.3525],
        [-0.1685,  0.2127],
        [ 0.2733,  0.3111],
        [ 0.0675, -0.3163]])

In [18]:
output

tensor([[[ 0.1857, -0.0526],
         [ 0.2779,  0.0454],
         [ 0.4876, -0.2039],
         [ 0.3256,  0.3797],
         [-0.1685,  0.2127],
         [ 0.0675, -0.3163]],

        [[ 0.1857, -0.0526],
         [ 0.2779,  0.0454],
         [ 0.1064,  0.2356],
         [ 0.3147, -0.2829],
         [ 0.2733,  0.3111],
         [ 0.0000,  0.0000]],

        [[ 0.0081,  0.1482],
         [-0.3101,  0.2290],
         [ 0.2403, -0.5423],
         [ 0.3291,  0.3525],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]],

        [[-0.2469,  0.2890],
         [ 0.0850, -0.4064],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]],

        [[ 0.0385,  0.0846],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000],
         [ 0.0000,  0.0000]]])

### 2.4 last hidden state mapped to output

|0|1|2|3|4|5|
|-|-|-|-|-|-|-|
|[ 0.1857, -0.0526]|[ 0.2779,  0.0454]|[ 0.4876, -0.2039]|[ 0.3256,  0.3797]|[-0.1685,  0.2127]|**[ 0.0675, -0.3163]**|
|[ 0.1857, -0.0526]|[ 0.2779,  0.0454]|[ 0.1064,  0.2356]|[ 0.3147, -0.2829]|**[ 0.2733,  0.3111]**||
|[ 0.0081,  0.1482]|[-0.3101,  0.2290]|[ 0.2403, -0.5423]|**[ 0.3291,  0.3525]**|||||
|[-0.2469,  0.2890]|**[ 0.0850, -0.4064]**|||||
|**[ 0.0385,  0.0846]**||||||

The **bolded vectors** are last hidden vectors.

In [19]:
hidden[-1]

tensor([[ 0.0675, -0.3163],
        [ 0.2733,  0.3111],
        [ 0.3291,  0.3525],
        [ 0.0850, -0.4064],
        [ 0.0385,  0.0846]])

In [20]:
packed_output[0], packed_output[1]

(tensor([[ 0.1857, -0.0526],
         [ 0.1857, -0.0526],
         [ 0.0081,  0.1482],
         [-0.2469,  0.2890],
         [ 0.0385,  0.0846],
         [ 0.2779,  0.0454],
         [ 0.2779,  0.0454],
         [-0.3101,  0.2290],
         [ 0.0850, -0.4064],
         [ 0.4876, -0.2039],
         [ 0.1064,  0.2356],
         [ 0.2403, -0.5423],
         [ 0.3256,  0.3797],
         [ 0.3147, -0.2829],
         [ 0.3291,  0.3525],
         [-0.1685,  0.2127],
         [ 0.2733,  0.3111],
         [ 0.0675, -0.3163]]), tensor([ 5,  4,  3,  3,  2,  1]))