[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/itmorn/AI.handbook/blob/main/DL/torch/nn/Recurrent/RNN.ipynb)

# LSTM和LSTMCell的区别 
在于前者一次能够处理整个序列，而后者一次只处理序列中一个时间点的数据，前者封装更完备更易于使用，后者更具灵活性。 实际上LSTM层的一种后端实现方式就是调用LSTMCell来实现的。  


# LSTM
**定义**：   
torch.nn.LSTM(*args, **kwargs)


**参数**：  
- input_size (int) – The number of expected features in the input x.  时间序列某一时刻的特征向量长度

- hidden_size (int) – The number of features in the hidden state h.  隐藏层向量长度

- num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1    循环层数。例如，设置num_layers=2意味着将两个RNN堆叠在一起形成一个堆叠的RNN，第二个RNN接收第一个RNN的输出并计算最终结果。默认值:1

- bias (bool) – If False, then the layer does not use bias weights b_ih and b_hh. Default: True.  是否加待学习的偏置项

- batch_first – If True, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default: False  是否要把N放在L之前

- dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0

- bidirectional – If True, becomes a bidirectional LSTM. Default: False  是否为双向RNN

- proj_size – If > 0, will use LSTM with projections of corresponding size. Default: 0  LSTM网络的变体，即LSTMP，减少LSTM的参数和计算量，进行h_t进行压缩，性能损失不大。


In [52]:
# LSTM调包计算
import torch
import torch.nn as nn
torch.manual_seed(666)

L = 2  # sequence_length  也可理解为time_steps
N = 1  # batch_size
H_in = 3  # input_size 输入层特征向量的长度
H_out = 4  # hidden_size 隐藏层向量的长度
num_layers = 1

input = torch.randn(L, N, H_in) # (time_steps, batch, input_size)
h = torch.randn(num_layers,N, H_out) # (batch, hidden_size) 负责决定如何改变“记忆”
c = torch.randn(num_layers,N, H_out) # (batch, hidden_size) 负责维护“记忆”
print("input:\n", input, "\n")
print("h_0:\n", h, "\n")
print("c_0:\n", c, "\n")

lstm = nn.LSTM(input_size=H_in, hidden_size=H_out, num_layers=num_layers,
               bias=True, batch_first=False, dropout=0, bidirectional=False)

output, hn = lstm(input, (h, c))
print("output:\n", output, "\n")
print("hn:\n", hn, "\n")


input:
 tensor([[[-2.1188,  0.0635, -1.4555]],

        [[-0.0126, -0.1548, -0.0927]]]) 

h_0:
 tensor([[[ 2.5916,  0.4542, -0.6890, -0.9962]]]) 

c_0:
 tensor([[[0.1856, 0.1476, 0.8628, 0.2379]]]) 

output:
 tensor([[[ 0.1933, -0.0371,  0.1991, -0.4055]],

        [[ 0.2121, -0.0874,  0.0834, -0.1616]]], grad_fn=<StackBackward0>) 

hn:
 (tensor([[[ 0.2121, -0.0874,  0.0834, -0.1616]]], grad_fn=<StackBackward0>), tensor([[[ 0.5562, -0.1607,  0.2118, -0.3740]]], grad_fn=<StackBackward0>)) 



# 参考资料
[Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)