## LSTM(Long short term memory)
* 解决了梯度离散的问题.  
* 解决了记忆长度的问题.

* Forget gate(遗忘门/记忆门)
* Input gate(输入门)  
* 输出门  

Ct理解为`memory`，ht理解为`输出`。  
有选择的过滤，有选择的输入，有选择的输出。  

$i_{t}=\sigma\left(W_{i} \cdot\left[h_{t-1}, x_{t}\right]+b_{i}\right)$  
$\tilde{C}_{t}=\tanh \left(W_{c} \cdot\left[h_{t-1}, x_{t}\right]+b_{c}\right)$  
$C_{t}=f_{t} * C_{t-1}+i_{t} * \tilde{C}_{t}$  

$o_{t}=\sigma\left(W_{o}\left[h_{t-1}, x_{t}\right]+b_{o}\right)$  
$h_{t}=o_{t} * \tanh \left(C_{t}\right)$  


$\begin{aligned}\left(\begin{array}{r}\mathbf{i}^{(t)} \\ \mathbf{f}^{(t)} \\ \mathbf{o}^{(t)} \\ \tilde{C}\end{array}\right) &=\left(\begin{array}{c}\sigma \\ \sigma \\ \sigma \\ \tanh \end{array}\right) \mathbf{W}\left(\begin{array}{c}\mathbf{x}^{(t)} \\ \mathbf{h}^{(t-1)}\end{array}\right) \\ \mathbf{c}^{(t)} &=\mathbf{f}^{(t)} \circ \mathbf{c}^{(t-1)}+\mathbf{i}^{(t)} \circ \tilde{C} \\ \mathbf{h}^{(t)} &=\mathbf{o}^{(t)} \circ \tanh \left(\mathbf{c}^{(t)}\right) \end{aligned}$


### LSTM类与LSTMCell

RNN:

out, ht = rnn(x, h0)

ht:`最后一个时间戳 所有层`的状态.  
out:`所有时间戳 最后一层`的状态.

LSTM:

out, (ht, ct) = lstm(x, \[ht_0, ct_0\])  

x: \[seq, b, vec\]  
h/c: \[num_layer, b, h\]  
out: \[seq, b, h\]

LSTMCell:  

 ht, ct = lstmcell(xt, \[ht_0, ct_0\])

 xt:\[b, vec\]  
 ht/ct:\[b, h\]  
 
 Cell每次送\[b, vec\], 一共送seq次.  
 返回结果没有了out项, 因为`out完全可以由ht叠加推导出来`.

In [1]:
import torch
import torch.nn as nn

In [4]:
lstm = nn.LSTM(input_size=100, hidden_size=20, num_layers=4)
print(lstm)
x = torch.randn(10, 3, 100)
print(x.shape)
out, (h, c) = lstm(x) # 不传入h0,则默认为0.
print(out.shape, h.shape, c.shape) # [10, 3, 20] [4, 3, 20] [4, 3, 20]

LSTM(100, 20, num_layers=4)
torch.Size([10, 3, 100])
torch.Size([10, 3, 20]) torch.Size([4, 3, 20]) torch.Size([4, 3, 20])


In [5]:
print('one layer lstm')
cell = nn.LSTMCell(input_size=100, hidden_size=20)
h = torch.zeros(3, 20)
c = torch.zeros(3, 20)
for xt in x:
    h, c = cell(xt, [h, c])
print(h.shape, c.shape)

one layer lstm
torch.Size([3, 20]) torch.Size([3, 20])


In [6]:
print('two layer lstm')
cell1 = nn.LSTMCell(input_size=100, hidden_size=30)
cell2 = nn.LSTMCell(input_size=30, hidden_size=20)
h1 = torch.zeros(3, 30)
c1 = torch.zeros(3, 30)
h2 = torch.zeros(3, 20)
c2 = torch.zeros(3, 20)
for xt in x:
    h1, c1 = cell1(xt, [h1, c1])
    h2, c2 = cell2(h1, [h2, c2])
print(h2.shape, c2.shape)

two layer lstm
torch.Size([3, 20]) torch.Size([3, 20])
