[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/itmorn/AI.handbook/blob/main/DL/torch/nn/Recurrent/GRUCell.ipynb)

# GRUCell  VS  LSTMCell
LSTM的一个稍微更具戏剧性的变化是Cho等人(2014)引入的门控循环单元(gate Recurrent Unit, GRU)。它将遗忘门和输入门合并为一个“更新门”。它还合并了记忆单元状态（cell state）和隐藏状态（hidden state），并进行了一些其他更改。由此产生的模型比标准LSTM模型更简单，并且越来越受欢迎。

# GRUCell

**定义**：   
torch.nn.GRUCell(input_size, hidden_size, bias=True, device=None, dtype=None)

**参数**：  
- input_size (int) – The number of expected features in the input x.  时间序列某一时刻的特征向量长度

- hidden_size (int) – The number of features in the hidden state h.  隐藏层向量长度

- bias (bool) – If False, then the layer does not use bias weights b_ih and b_hh. Default: True.  是否加待学习的偏置项


# 图解前向过程
<p align="center">
<img src="./imgs/LSTM2-notation.png"
    width="700" /></p>
    
<p align="center">
<img src="./imgs/LSTM3-var-GRU.png"
    width="1000" /></p>

由于整个过程和LSTM是一致的，所以就不做手工验证以及绘图了，这里只展示调包的结果。

In [101]:
# 调包计算
import torch
import torch.nn as nn
torch.manual_seed(666)

L = 2  # sequence_length  也可理解为time_steps
N = 1  # batch_size
H_in = 3  # input_size 输入层特征向量的长度
H_out = 4  # hidden_size 隐藏层向量的长度

input = torch.randn(L, N, H_in) # (time_steps, batch, input_size)
h = torch.randn(N, H_out) # (batch, hidden_size) 负责决定如何改变“记忆”
print("input:\n", input, "\n")
print("h_0:\n", h, "\n")

gru_cell = nn.GRUCell(H_in, H_out, bias=False) # (input_size, hidden_size) 为了画图简洁，不要偏置项
print("weight_hh:\n", gru_cell.weight_hh, "\n")
print("weight_ih:\n", gru_cell.weight_ih, "\n")

output = []  #保存每个时刻的隐藏层的数据
for i in range(input.size()[0]):
    h = gru_cell(input[i], h)
    print(f"h_{i+1}:\n", h, "\n")
    output.append(h)


input:
 tensor([[[-2.1188,  0.0635, -1.4555]],

        [[-0.0126, -0.1548, -0.0927]]]) 

h_0:
 tensor([[ 2.5916,  0.4542, -0.6890, -0.9962]]) 

weight_hh:
 Parameter containing:
tensor([[-0.0440,  0.2744, -0.1232,  0.2019],
        [ 0.0582,  0.3487, -0.1404,  0.3816],
        [ 0.1317,  0.1924, -0.3796,  0.4379],
        [ 0.1783,  0.1848, -0.2799, -0.0428],
        [ 0.1167, -0.0991, -0.2574, -0.4477],
        [-0.0506, -0.1730, -0.3312,  0.0733],
        [-0.1884, -0.2347,  0.1158,  0.3620],
        [-0.1595,  0.2099, -0.4129,  0.0649],
        [ 0.4904, -0.2916, -0.2753, -0.2733],
        [ 0.1248,  0.1446, -0.4906, -0.3950],
        [-0.4422,  0.3924, -0.4710, -0.3778],
        [ 0.2299,  0.0562, -0.3475,  0.2820]], requires_grad=True) 

weight_ih:
 Parameter containing:
tensor([[-0.2366, -0.1316,  0.0035],
        [ 0.4089, -0.1196,  0.2539],
        [-0.0837, -0.4911, -0.4336],
        [ 0.3111, -0.2333, -0.4399],
        [-0.1921, -0.2115,  0.3916],
        [ 0.1119, -0.0959, 

# 参考资料
[Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)