# BLSTM处理变长序列

假设我们正在使用BLSTM模型处理句子分类的任务，BLSTM最后一个时刻的输出作为句子的表示。

例如以下实例：

    sentences = [['nice', 'day'], ['I', 'like', 'to', 'eat', 'apple'], ['can', 'a', 'can']]

在实际操作时，考虑到计算性能，会将其padding成统一的长度（通常是一个batch中的最大长度）：

    sentences = [
        ['nice', 'day', '_PAD', '_PAD', '_PAD'],
        ['I', 'like', 'to', 'eat', 'apple'],
        ['can', 'a', 'can', '_PAD', '_PAD']
    ]

考虑序列`['nice', 'day', '_PAD', '_PAD', '_PAD']`，对于正向LSTM，我们仅需要在`day`处的`hidden state`；对于反向LSTM，仅需要从`day`编码到`nice`，`_PAD`处的值并不需要计算。

PyTorch通过`torch.nn.utils.rnn.PackedSequence`类，以及以下两个函数处理上述变长序列问题：
  - `torch.nn.utils.rnn.pack_padded_sequence`
  - `torch.nn.utils.rnn.pad_packed_sequence`

接下来，我们通过一组模拟数据，介绍PyTorch中双向LSTM处理变长序列的方法，并对其正确性作了检验。

## 引入相关包

首先引入需要的包，并设置相关参数：

In [1]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

input_size = 64
hidden_size = 100

## 初始化BLSTM

接着初始化BLSTM，为了方便演示，将`batch_first`设为`True`。

In [2]:
lstm = nn.LSTM(
    input_size=input_size, hidden_size=hidden_size, num_layers=1,
    bias=True, batch_first=True, bidirectional=True)

为了验证PyTorch计算双向lstm输出的正确性，我们将正、方向LSTM的权重设为相同值：

In [3]:
weight_i = getattr(lstm, 'weight_ih_l{0}'.format(0))  # 正向: (W_ii|W_if|W_ig|W_io)
weight_i_r = getattr(lstm, 'weight_ih_l{0}_reverse'.format(0))  # 反向: (W_ii|W_if|W_ig|W_io)
weight_i_r.data.copy_(weight_i.data)

weight_h = getattr(lstm, 'weight_hh_l{0}'.format(0))  # 正向: (W_hi|W_hf|W_ig|W_ho)
weight_h_r = getattr(lstm, 'weight_hh_l{0}_reverse'.format(0))  # 反向: (W_hi|W_hf|W_hg|W_ho)
weight_h_r.data.copy_(weight_h.data)

# init bias
bias_i = getattr(lstm, 'bias_hh_l{0}'.format(0))
torch.nn.init.constant_(bias_i, 0.1)
bias_h = getattr(lstm, 'bias_ih_l{0}'.format(0))
torch.nn.init.constant_(bias_h, 0.1)

bias_i_r = getattr(lstm, 'bias_hh_l{0}_reverse'.format(0))
bias_i_r.data.copy_(bias_i)
bias_h_r = getattr(lstm, 'bias_ih_l{0}_reverse'.format(0))
bias_h_r.data.copy_(bias_h)

print('Initialization is done!')

Initialization is done!


这样，若某个序列的正、反向输入相同，则正、反方向最后一个时刻的输出应该一致；若反向LSTM计算了`_PAD`值，则输出结果会不一致。

## 构建模拟数据

In [4]:
# 设置模拟输入数据
sentences = [['nice', 'day'], ['I', 'like', 'to', 'eat', 'apple'], ['can', 'a', 'can']]
test_sent_idx = 2  # 即['can', 'a', 'can']在sentences中的下标

# 构建alphabet
alphabet = {}
index = 1
for sentence in sentences:
    for word in sentence:
        if word not in alphabet:
            alphabet[word] = index
            index += 1
voc_size = len(alphabet) + 1

lengths = [len(s) for s in sentences]
max_len = max(lengths)
batch_size = len(sentences)

inputs = np.zeros((batch_size, max_len), dtype='int32')
for i, sentence in enumerate(sentences):
    ids = list(map(lambda w: alphabet[w], sentence))
    inputs[i, :lengths[i]] = ids
inputs = torch.LongTensor(inputs)
lengths = torch.LongTensor(lengths)

# 按句子实际长度降序排序
lengths, indices = torch.sort(lengths, descending=True)
inputs = inputs[indices]

# 设置embedding层，其中padding_idx表示padding值的编号
embedding = nn.Embedding(voc_size, input_size, padding_idx=0)
inputs = embedding(inputs)
print(inputs.size())  # [3, 5, 64]

inputs_packed = pack_padded_sequence(inputs, lengths, True)
print('inputs_packed.data.size: {0}'.format(inputs_packed.data.size()))
print('batch_sizes: {0}'.format(inputs_packed.batch_sizes))

torch.Size([3, 5, 64])
inputs_packed.data.size: torch.Size([10, 64])
batch_sizes: tensor([ 3,  3,  2,  1,  1])


PackedSequence包含两个值，分别是`data`和`batch_sizes`。其中`data`根据`lengths`参数(即序列的实际长度)，记录了`inputs`中的tensor；`batch_sizes`长度等于实际长度的最大值，第`i`个值记录了第`i`时刻输入的batch size大小。

## 计算LSTM输出

In [5]:
lstm_output, lstm_hidden = lstm(inputs_packed)
lstm_hidden, lstm_cell_state = lstm_hidden[0], lstm_hidden[1]
lstm_hidden = lstm_hidden.transpose(0, 1)
print(lstm_hidden.size())  # torch.Size([batch_size, 2, hidden_size])

torch.Size([3, 2, 100])


`lstm_hidden`和`lstm_cell_state`分别记录了正、反向最后一个时刻的`hidden state`和`cell state`。

In [6]:
# lstm_hidden还原为原来的顺序
_, indices_recover = torch.sort(indices)
lstm_hidden_recover = lstm_hidden[indices_recover]

# 句子['can', 'a', 'can']正反向lstm最后一个时刻的输出
print(lstm_hidden_recover[test_sent_idx])  # [2, 100]

tensor([[ 0.0625,  0.3070, -0.1729, -0.2379, -0.0848,  0.2300,  0.0295,
          0.3735,  0.0307,  0.0912,  0.1333,  0.1341,  0.2309,  0.0204,
         -0.0117,  0.1841,  0.1630,  0.4388,  0.0381,  0.0551,  0.1766,
         -0.0415,  0.2978,  0.2218,  0.2320,  0.2776,  0.1525,  0.0690,
          0.1018, -0.0519, -0.0773, -0.0358,  0.2640, -0.1285,  0.0648,
          0.4546,  0.1061, -0.1082,  0.1457,  0.2305,  0.1844,  0.0870,
          0.0993,  0.2757, -0.1717, -0.0297,  0.1028,  0.1016,  0.0387,
          0.1672,  0.3925,  0.1485,  0.4487,  0.1449, -0.0291,  0.1742,
          0.1658, -0.1843,  0.2707,  0.0617,  0.2482, -0.0416,  0.2566,
          0.2021,  0.3221,  0.1658,  0.2014, -0.0924,  0.2102,  0.1036,
          0.2312,  0.0517,  0.1294,  0.1551,  0.1262, -0.0096,  0.0372,
          0.1485,  0.0742,  0.1351, -0.0850,  0.3438, -0.1237,  0.0226,
         -0.1902,  0.0077,  0.1242,  0.0630, -0.0051,  0.0149, -0.0595,
          0.1055,  0.2078,  0.3090,  0.2792,  0.2109, -0.0175,  

可以看出正、反向的输出一致。

`lstm_output`记录的是BLSTM每个时刻的输出，根据句子实际长度，也可以取出在最后一个时刻的输出：

In [7]:
# 还原为原形状
lstm_output_pad, lengths = pad_packed_sequence(lstm_output, batch_first=True)
lstm_output_pad_recover = lstm_output_pad[indices_recover]
lengths_recover = lengths[indices_recover]
print(lstm_output_pad.size())  # size=[3, 5, 200]

# 句子['can', 'a', 'can']正反向lstm最后一个时刻的输出
hidden_last = lstm_output_pad_recover[test_sent_idx][lengths_recover[test_sent_idx]-1][:hidden_size]  # 正向
hidden_last_r = lstm_output_pad_recover[test_sent_idx][0][hidden_size:]  # 反向
print(hidden_last)
print(hidden_last_r)

torch.Size([3, 5, 200])
tensor([ 0.0625,  0.3070, -0.1729, -0.2379, -0.0848,  0.2300,  0.0295,
         0.3735,  0.0307,  0.0912,  0.1333,  0.1341,  0.2309,  0.0204,
        -0.0117,  0.1841,  0.1630,  0.4388,  0.0381,  0.0551,  0.1766,
        -0.0415,  0.2978,  0.2218,  0.2320,  0.2776,  0.1525,  0.0690,
         0.1018, -0.0519, -0.0773, -0.0358,  0.2640, -0.1285,  0.0648,
         0.4546,  0.1061, -0.1082,  0.1457,  0.2305,  0.1844,  0.0870,
         0.0993,  0.2757, -0.1717, -0.0297,  0.1028,  0.1016,  0.0387,
         0.1672,  0.3925,  0.1485,  0.4487,  0.1449, -0.0291,  0.1742,
         0.1658, -0.1843,  0.2707,  0.0617,  0.2482, -0.0416,  0.2566,
         0.2021,  0.3221,  0.1658,  0.2014, -0.0924,  0.2102,  0.1036,
         0.2312,  0.0517,  0.1294,  0.1551,  0.1262, -0.0096,  0.0372,
         0.1485,  0.0742,  0.1351, -0.0850,  0.3438, -0.1237,  0.0226,
        -0.1902,  0.0077,  0.1242,  0.0630, -0.0051,  0.0149, -0.0595,
         0.1055,  0.2078,  0.3090,  0.2792,  0.2109, 

取出的值与`lstm_hidden`的值相等。