# PyTorch-BLSTM处理变长序列

假设我们正在使用BLSTM模型完成句子情感极性判别的任务，BLSTM最后一个时刻的输出作为判断情感极性的依据。

例如以下实例：

    sentences = [['nice', 'day'], ['I', 'like', 'to', 'eat', 'apple'], ['I', 'hate', 'you']]

在实际操作时，考虑到计算性能，会将其padding成统一的长度（通常是一个batch中的最大长度）：

    sentences = [
        ['nice', 'day', '_PAD', '_PAD', '_PAD'],
        ['I', 'like', 'to', 'eat', 'apple'],
        ['I', 'hate', 'you', '_PAD', '_PAD'],
    ]

考虑序列`['I', 'hate', 'you', '_PAD', '_PAD']`，对于正向LSTM，我们仅需要在`you`处的`hidden state`；对于反向LSTM，仅需要从`you`编码到`I`，`_PAD`处的值则不需要计算。

PyTorch通过`torch.nn.utils.rnn.PackedSequence`类，以及以下两个函数处理上述变长序列问题：
  - `torch.nn.utils.rnn.pack_padded_sequence`
  - `torch.nn.utils.rnn.pad_packed_sequence`

接下来，我们通过一组模拟数据，介绍PyTorch中双向LSTM处理变长序列的方法，并对其正确性作了检验。

## 引入相关包

首先引入需要的包，并设置相关参数：

In [1]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

input_size = 64
hidden_size = 100
voc_size = 10

## 初始化BLSTM

接着初始化BLSTM，这里不使用bias是为了方便权重的初始化，同时为了方便演示，将`batch_first`设为`True`。

In [2]:
lstm = nn.LSTM(
    input_size=input_size, hidden_size=hidden_size, num_layers=1,
    bias=False, batch_first=True, bidirectional=True)

为了验证PyTorch计算双向lstm输出的正确性，我们将正、方向LSTM的权重设为相同值：

In [3]:
weight_i = getattr(lstm, 'weight_ih_l{0}'.format(0))  # 正向: (W_ii|W_if|W_ig|W_io)
weight_i_r = getattr(lstm, 'weight_ih_l{0}_reverse'.format(0))  # 反向: (W_ii|W_if|W_ig|W_io)
weight_i_r.data.copy_(weight_i.data)
weight_h = getattr(lstm, 'weight_hh_l{0}'.format(0))  # 正向: (W_hi|W_hf|W_ig|W_ho)
weight_h_r = getattr(lstm, 'weight_hh_l{0}_reverse'.format(0))  # 反向: (W_hi|W_hf|W_hg|W_ho)
weight_h_r.data.copy_(weight_h.data)
print('Initialization is done!')

Initialization is done!


这样，若某个序列的正、反向输入相同，则正、反方向最后一个时刻的输出应该一致；若反向LSTM计算了`_PAD`值，则输出结果会不一致。

## 构建模拟数据

In [8]:
# 设置模拟输入数据，0为padding值
sentences = [
    [3, 2, 1, 0, 0, 0, 0, 0, 0, 0],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
    [3, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    [1, 2, 3, 4, 3, 2, 1, 0, 0, 0]]
sentences = torch.LongTensor(sentences)
# 计算序列实际长度
lengths = torch.sum(torch.sign(sentences), dim=1)  # [3, 9, 2, 7]

# 按句子实际长度降序排序
lengths, indices = torch.sort(lengths, descending=True)
sentences = sentences[indices]

# 设置embedding层，其中padding_idx表示padding值的编号
embedding = nn.Embedding(voc_size, input_size, padding_idx=0)
inputs = embedding(sentences)  # size=[4, 10, 64]
print(inputs.size())

inputs_packed = pack_padded_sequence(inputs, lengths, True)
print('inputs_packed.data.size: {0}'.format(inputs_packed.data.size()))
print('batch_sizes: {0}'.format(inputs_packed.batch_sizes))

torch.Size([4, 10, 64])
inputs_packed.data.size: torch.Size([21, 64])
batch_sizes: tensor([ 4,  4,  3,  2,  2,  2,  2,  1,  1])


PackedSequence包含两个值，分别是`data`和`batch_sizes`。其中`data`根据`lengths`参数(即序列的实际长度)，记录了`inputs`中的tensor；`batch_sizes`长度等于实际长度的最大值，第`i`个值记录了第`i`时刻输入的batch size大小。

## 计算LSTM输出

In [5]:
lstm_output, lstm_hidden = lstm(inputs_packed)
lstm_hidden, lstm_cell_state = lstm_hidden[0], lstm_hidden[1]
lstm_hidden = lstm_hidden.transpose(0, 1)
print(lstm_hidden.size())  # torch.Size([batch_size, 2, hidden_size])

torch.Size([4, 2, 100])


`lstm_hidden`和`lstm_cell_state`分别记录了正、反向最后一个时刻的`hidden state`和`cell state`。

In [6]:
sent_idx = 1  # 序列[1, 2, 3, 4, 3, 2, 1]排完序之后的idx为1
print(lstm_hidden[sent_idx])  # [2, 100]

tensor([[-0.1895, -0.0591, -0.0064, -0.0480,  0.1802,  0.0137,  0.1195,
          0.0257,  0.1489, -0.1884, -0.0552,  0.0289,  0.0419,  0.1217,
          0.0829,  0.1347, -0.0646,  0.0147, -0.1876,  0.0510,  0.0343,
          0.1860, -0.1429,  0.0170,  0.0385, -0.1391, -0.1121,  0.0411,
          0.0431, -0.0673,  0.0521,  0.0839, -0.0124,  0.1610, -0.0013,
          0.0032,  0.1046, -0.1926,  0.2928, -0.0554, -0.1293, -0.0723,
          0.0040, -0.0405,  0.0169,  0.0761,  0.1834, -0.0678, -0.0866,
          0.1366,  0.0119,  0.2218, -0.0919, -0.1355,  0.2026, -0.2376,
         -0.0672,  0.1246,  0.0003,  0.0830, -0.1179,  0.0479, -0.0510,
          0.0578,  0.0679, -0.1204, -0.0149, -0.0913, -0.1322, -0.0127,
         -0.1617, -0.0983,  0.0193,  0.1545,  0.0088, -0.0458, -0.0972,
         -0.0192, -0.0621, -0.1678, -0.0300, -0.0309,  0.2895, -0.0182,
         -0.0695, -0.0671,  0.0769, -0.0576,  0.0898, -0.0772, -0.1087,
         -0.1412, -0.1832, -0.0380, -0.0752, -0.1688,  0.0132,  

可以看出正、反向的输出一致。

`lstm_output`记录的是BLSTM每个时刻的输出，根据句子实际长度，也可以取出在最后一个时刻的输出：

In [7]:
# 还原为原形状
lstm_output_pad, lengths = pad_packed_sequence(lstm_output, batch_first=True)
print(lstm_output_pad.size())  # size=[4, 9, 200]

# sent_idx=1句子正反向lstm输出
hidden_last = lstm_output_pad[sent_idx][lengths[sent_idx]-1][:hidden_size]  # 正向
hidden_last_r = lstm_output_pad[sent_idx][0][hidden_size:]  # 反向
print(hidden_last)
print(hidden_last_r)

torch.Size([4, 9, 200])
tensor([-0.1895, -0.0591, -0.0064, -0.0480,  0.1802,  0.0137,  0.1195,
         0.0257,  0.1489, -0.1884, -0.0552,  0.0289,  0.0419,  0.1217,
         0.0829,  0.1347, -0.0646,  0.0147, -0.1876,  0.0510,  0.0343,
         0.1860, -0.1429,  0.0170,  0.0385, -0.1391, -0.1121,  0.0411,
         0.0431, -0.0673,  0.0521,  0.0839, -0.0124,  0.1610, -0.0013,
         0.0032,  0.1046, -0.1926,  0.2928, -0.0554, -0.1293, -0.0723,
         0.0040, -0.0405,  0.0169,  0.0761,  0.1834, -0.0678, -0.0866,
         0.1366,  0.0119,  0.2218, -0.0919, -0.1355,  0.2026, -0.2376,
        -0.0672,  0.1246,  0.0003,  0.0830, -0.1179,  0.0479, -0.0510,
         0.0578,  0.0679, -0.1204, -0.0149, -0.0913, -0.1322, -0.0127,
        -0.1617, -0.0983,  0.0193,  0.1545,  0.0088, -0.0458, -0.0972,
        -0.0192, -0.0621, -0.1678, -0.0300, -0.0309,  0.2895, -0.0182,
        -0.0695, -0.0671,  0.0769, -0.0576,  0.0898, -0.0772, -0.1087,
        -0.1412, -0.1832, -0.0380, -0.0752, -0.1688, 

取出的值与`lstm_hidden`的值相等。