# BLSTM处理变长序列

假设我们正在使用BLSTM模型处理句子分类的任务，BLSTM最后一个时刻的输出作为句子的表示。

例如以下实例：

    sentences = [['nice', 'day'], ['I', 'like', 'to', 'eat', 'apple'], ['can', 'a', 'can']]

在实际操作时，考虑到计算性能，会将其padding成统一的长度（通常是一个batch中的最大长度）：

    sentences = [
        ['nice', 'day', '_PAD', '_PAD', '_PAD'],
        ['I', 'like', 'to', 'eat', 'apple'],
        ['can', 'a', 'can', '_PAD', '_PAD']
    ]

考虑序列`['nice', 'day', '_PAD', '_PAD', '_PAD']`，对于正向LSTM，我们仅需要在`day`处的`hidden state`；对于反向LSTM，仅需要从`day`编码到`nice`，`_PAD`处的值并不需要计算。

PyTorch通过`torch.nn.utils.rnn.PackedSequence`类，以及以下两个函数处理上述变长序列问题：
  - `torch.nn.utils.rnn.pack_padded_sequence`
  - `torch.nn.utils.rnn.pad_packed_sequence`

接下来，我们通过一组模拟数据，介绍PyTorch中双向LSTM处理变长序列的方法，并对其正确性作了检验。

## 引入相关包

首先引入需要的包，并设置相关参数：

In [1]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

input_size = 64
hidden_size = 100

## 初始化BLSTM

接着初始化BLSTM，为了方便演示，将`batch_first`设为`True`。

In [2]:
lstm = nn.LSTM(
    input_size=input_size, hidden_size=hidden_size, num_layers=1,
    bias=True, batch_first=True, bidirectional=True)

为了验证PyTorch计算双向lstm输出的正确性，我们将正、方向LSTM的权重及偏置设为相同值：

In [3]:
weight_i = getattr(lstm, 'weight_ih_l{0}'.format(0))  # 正向: (W_ii|W_if|W_ig|W_io)
weight_i_r = getattr(lstm, 'weight_ih_l{0}_reverse'.format(0))  # 反向: (W_ii|W_if|W_ig|W_io)
weight_i_r.data.copy_(weight_i.data)

weight_h = getattr(lstm, 'weight_hh_l{0}'.format(0))  # 正向: (W_hi|W_hf|W_ig|W_ho)
weight_h_r = getattr(lstm, 'weight_hh_l{0}_reverse'.format(0))  # 反向: (W_hi|W_hf|W_hg|W_ho)
weight_h_r.data.copy_(weight_h.data)

# init bias
bias_i = getattr(lstm, 'bias_hh_l{0}'.format(0))
torch.nn.init.constant_(bias_i, 0.1)
bias_h = getattr(lstm, 'bias_ih_l{0}'.format(0))
torch.nn.init.constant_(bias_h, 0.1)

bias_i_r = getattr(lstm, 'bias_hh_l{0}_reverse'.format(0))
bias_i_r.data.copy_(bias_i)
bias_h_r = getattr(lstm, 'bias_ih_l{0}_reverse'.format(0))
bias_h_r.data.copy_(bias_h)

print('Initialization is done!')

Initialization is done!


这样，若某个序列的正、反向输入相同，则正、反方向最后一个时刻的输出应该一致；若反向LSTM计算了`_PAD`值，则输出结果会不一致。

## 构建模拟数据

In [12]:
# 设置模拟输入数据
sentences = [['nice', 'day'], ['I', 'like', 'to', 'eat', 'apple'], ['can', 'a', 'can']]
test_sent_idx = 2  # 即['can', 'a', 'can']在sentences中的下标

# 构建alphabet
alphabet = {}
index = 1
for sentence in sentences:
    for word in sentence:
        if word not in alphabet:
            alphabet[word] = index
            index += 1
voc_size = len(alphabet) + 1

lengths = [len(s) for s in sentences]
max_len = max(lengths)
batch_size = len(sentences)

inputs = np.zeros((batch_size, max_len), dtype='int32')
for i, sentence in enumerate(sentences):
    ids = list(map(lambda w: alphabet[w], sentence))
    inputs[i, :lengths[i]] = ids
inputs = torch.LongTensor(inputs)
lengths = torch.LongTensor(lengths)

# 按句子实际长度降序排序
lengths, indices = torch.sort(lengths, descending=True)
inputs = inputs[indices]

# 设置embedding层，其中padding_idx表示padding值的编号
embedding = nn.Embedding(voc_size, input_size, padding_idx=0)
inputs = embedding(inputs)
print(inputs.size())  # [3, 5, 64]

inputs_packed = pack_padded_sequence(inputs, lengths, True)
print('inputs_packed.data.size: {0}'.format(inputs_packed.data.size()))
print('batch_sizes: {0}'.format(inputs_packed.batch_sizes))

torch.Size([3, 5, 64])
inputs_packed.data.size: torch.Size([10, 64])
batch_sizes: tensor([ 3,  3,  2,  1,  1])


PackedSequence包含两个值，分别是`data`和`batch_sizes`。其中`data`根据`lengths`参数(即序列的实际长度)，记录了`inputs`中的tensor；`batch_sizes`长度等于实际长度的最大值，第`i`个值记录了第`i`时刻输入的batch size大小。

## 计算LSTM输出

In [13]:
lstm_output, lstm_hidden = lstm(inputs_packed)
lstm_hidden, lstm_cell_state = lstm_hidden[0], lstm_hidden[1]
lstm_hidden = lstm_hidden.transpose(0, 1)
print(lstm_hidden.size())  # torch.Size([batch_size, 2, hidden_size])

torch.Size([3, 2, 100])


`lstm_hidden`和`lstm_cell_state`分别记录了正、反向最后一个时刻的`hidden state`和`cell state`。

In [14]:
# lstm_hidden还原为原来的顺序
_, indices_recover = torch.sort(indices)
lstm_hidden_recover = lstm_hidden[indices_recover]

# 句子['can', 'a', 'can']正反向lstm最后一个时刻的输出
print(lstm_hidden_recover[test_sent_idx])  # [2, 100]

tensor([[-0.0466,  0.1979,  0.1674,  0.1115, -0.1394, -0.3406,  0.0461,
          0.1912,  0.0237, -0.0359, -0.0968, -0.2336, -0.0023, -0.1093,
          0.1821,  0.0722, -0.2315,  0.2709,  0.3707, -0.0824,  0.1123,
         -0.1798, -0.0687, -0.0292,  0.0763,  0.1033, -0.4348,  0.0839,
          0.2364,  0.2123,  0.2637, -0.1207,  0.1106,  0.2471,  0.0100,
          0.1297, -0.2408,  0.0088,  0.0758,  0.0115, -0.0822,  0.0283,
          0.1517,  0.1999,  0.0125, -0.0514, -0.0111,  0.0573,  0.1063,
          0.2532,  0.4012,  0.2466,  0.3506,  0.3177,  0.1393, -0.2369,
          0.0339,  0.1768, -0.0028,  0.2993,  0.3876, -0.1459,  0.1071,
         -0.0458,  0.1388, -0.1838, -0.0033,  0.2611, -0.0364,  0.0620,
          0.0998,  0.4584,  0.1556, -0.0932,  0.3981,  0.1540,  0.1624,
         -0.0202,  0.0649,  0.2544,  0.0112, -0.2623, -0.0341,  0.0671,
          0.0608, -0.1244, -0.1278, -0.0176,  0.2964, -0.1301,  0.2018,
          0.3965,  0.2164,  0.2954,  0.1335,  0.1810, -0.0348,  

可以看出正、反向的输出一致。

`lstm_output`记录的是BLSTM每个时刻的输出，根据句子实际长度，也可以取出在最后一个时刻的输出：

In [11]:
# 还原为原形状
lstm_output_pad, lengths = pad_packed_sequence(lstm_output, batch_first=True)
lstm_output_pad_recover = lstm_output_pad[indices_recover]
lengths_recover = lengths[indices_recover]
print(lstm_output_pad.size())  # size=[3, 5, 200]

# 句子['can', 'a', 'can']正反向lstm最后一个时刻的输出
hidden_last = lstm_output_pad_recover[test_sent_idx][lengths_recover[test_sent_idx]-1][:hidden_size]  # 正向
hidden_last_r = lstm_output_pad_recover[test_sent_idx][0][hidden_size:]  # 反向
print(hidden_last)
print(hidden_last_r)

torch.Size([3, 5, 200])
tensor([ 0.4026,  0.0825, -0.0540,  0.1774, -0.2190,  0.3484, -0.0861,
        -0.1873,  0.1525, -0.2399, -0.1848,  0.3643,  0.0609, -0.1586,
         0.2929,  0.2200,  0.0576,  0.3060,  0.1052,  0.1356,  0.0735,
         0.0590,  0.0248,  0.1299,  0.3680, -0.0334,  0.2734,  0.1144,
         0.2946, -0.1674, -0.0226,  0.1603, -0.0575,  0.2273,  0.0773,
         0.3728,  0.0548,  0.0149,  0.0774, -0.1232,  0.1091,  0.3588,
         0.1203,  0.2008,  0.0501,  0.1625,  0.2344,  0.4315,  0.1669,
         0.1597,  0.1626,  0.1740, -0.0047,  0.0115, -0.0905,  0.2055,
        -0.1133, -0.1211, -0.0808,  0.1940,  0.1274, -0.0705,  0.3216,
         0.1586,  0.1263,  0.0813,  0.1671,  0.0787,  0.1572,  0.0974,
         0.2064,  0.1374,  0.0428,  0.0720,  0.1243,  0.3860,  0.1179,
         0.0071, -0.0972,  0.0142,  0.0914,  0.2337,  0.2610, -0.1452,
         0.1883,  0.1320, -0.0422,  0.2024, -0.0048, -0.0715,  0.2454,
        -0.1260,  0.1222,  0.1351,  0.0488,  0.2583, 

取出的值与`lstm_hidden`的值相等。