In [2]:
import torch
import torch.nn as nn

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For each element in the input sequence, each layer computes the following function
\begin{array}{ll}
i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\
f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\
g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{(t-1)} + b_{hg}) \\
o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\
c_t = f_t c_{(t-1)} + i_t g_t \\
h_t = o_t \tanh(c_t)
\end{array}
where $h_t$ is the hidden state at time $t$, $c_t$ is the cell
state at time $t$, $x_t$ is the input at time $t$, $h_{(t-1)}$
is the hidden state of the previous layer at time $t-1$ or the initial hidden
state at time $0$, and $i_t$, $f_t$, $g_t$,
$o_t$ are the input, forget, cell, and output gates, respectively.
$\sigma$ is the sigmoid function.

Args:

input_size: The number of expected features in the input $x$

hidden_size: The number of features in the hidden state $h$

num_layers: Number of recurrent layers. E.g., setting $num_layers=2$ would mean stacking two LSTMs together to form a $stacked LSTM$, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1

bias: If $False$, then the layer does not use bias weights $b_ih$ and $b_hh$. Default: $True$

batch_first: If $True$, then the input and output tensors are provided as (batch, seq, feature). Default: $False$

dropout: If non-zero, introduces a $Dropout$ layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to :attr:$dropout$. Default: 0

bidirectional: If $True$, becomes a bidirectional LSTM. Default: $False$

Inputs: input, ($h_0$, $c_0$)
    - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence.
      The input can also be a packed variable length sequence.
      See :func:`torch.nn.utils.rnn.pack_padded_sequence` or
      :func:`torch.nn.utils.rnn.pack_sequence` for details.
    - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
      containing the initial hidden state for each element in the batch.
    - **c_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
      containing the initial cell state for each element in the batch.

      If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero.


Outputs: output, (h_n, c_n)
    - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor
      containing the output features `(h_t)` from the last layer of the LSTM,
      for each t. If a :class:`torch.nn.utils.rnn.PackedSequence` has been
      given as the input, the output will also be a packed sequence.

      For the unpacked case, the directions can be separated
      using ``output.view(seq_len, batch, num_directions, hidden_size)``,
      with forward and backward being direction `0` and `1` respectively.
      Similarly, the directions can be separated in the packed case.
    - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
      containing the hidden state for `t = seq_len`.

      Like *output*, the layers can be separated using
      ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*.
    - **c_n** (num_layers * num_directions, batch, hidden_size): tensor
      containing the cell state for `t = seq_len`

Attributes:

    weight_ih_l[k] : the learnable input-hidden weights of the :math:$\text{k}^{th}$ layer
        $(W_ii|W_if|W_ig|W_io)$, of shape $(4*hidden_size x input_size)$
        
    weight_hh_l[k] : the learnable hidden-hidden weights of the :math:$\text{k}^{th}$ layer
        $(W_hi|W_hf|W_hg|W_ho)$, of shape $(4*hidden_size x hidden_size)$
        
    bias_ih_l[k] : the learnable input-hidden bias of the :math:$\text{k}^{th}$ layer
        $(b_ii|b_if|b_ig|b_io)$, of shape $(4*hidden_size)$
        
    bias_hh_l[k] : the learnable hidden-hidden bias of the :math:$\text{k}^{th}$ layer
        $(b_hi|b_hf|b_hg|b_ho)$, of shape $(4*hidden_size)$

In [11]:
rnn = nn.LSTM(input_size=1, hidden_size=2, num_layers=2, bidirectional=True)

In [20]:
# seq_len = 2, batch=1, input_size=1
i = torch.randn(2, 1, 1)
# num_layer * bi = 4, batch=1, hidden=2
h0 = torch.randn(4, 1, 2)
# c0 和 h0一样
c0 = torch.randn(4, 1, 2)
print(i)
print(h0)
print(c0)

tensor([[[-1.8739]],

        [[ 0.4936]]])
tensor([[[-0.5848, -1.9906]],

        [[-1.4846,  1.2139]],

        [[-1.3665,  1.0668]],

        [[ 0.1686, -0.9288]]])
tensor([[[ 0.6920, -0.0524]],

        [[-1.0819, -1.1228]],

        [[ 0.1523,  1.1516]],

        [[-0.2242, -0.2762]]])


In [22]:
output, (hn, cn) = rnn(i, (h0, c0))
# seq_len = 2, batch = 1, hidden=2 * 2 = 4
print(output)
# hn 和 cn 保持不变
print(hn)
print(cn)

tensor([[[ 0.0426,  0.5477, -0.0405,  0.0072]],

        [[-0.0620,  0.4924, -0.0441,  0.0820]]], grad_fn=<CatBackward>)
tensor([[[ 0.2159, -0.0716]],

        [[-0.0695, -0.0517]],

        [[-0.0620,  0.4924]],

        [[-0.0405,  0.0072]]], grad_fn=<ViewBackward>)
tensor([[[ 0.5696, -0.2519]],

        [[-0.2497, -0.5190]],

        [[-0.1223,  0.8234]],

        [[-0.1302,  0.0169]]], grad_fn=<ViewBackward>)


output[0][0:2] 是正向的第一词的输出， output[0][2:4]是正向的第二词的输出（第二层）
output[1][0:2] 是正向的第2词的输出， output[0][2:4]是正向的第1词的输出（第二层）

output[0][2:4] = hn[3][0]， output[1][0:2] = hn[2][0] 其是第二层正负向的隐层向量，h0h1为第一层。




Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

For each element in the input sequence, each layer computes the following
function:

\begin{array}{ll}
r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\
z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\
n_t = \tanh(W_{in} x_t + b_{in} + r_t (W_{hn} h_{(t-1)}+ b_{hn})) \\
h_t = (1 - z_t) n_t + z_t h_{(t-1)} \\
\end{array}

where 

$h_t$ is the hidden state at time $t$, 

$x_t$ is the inputat time $t$, $h_{(t-1)}$ is the hidden state of the previous layer at time $t-1$ or the initial hidden state at time $0$, 

$r_t$,$z_t$, $n_t$ are the reset, update, and new gates, respectively.

$\sigma$ is the sigmoid function.

Args:
    input_size: The number of expected features in the input `x`
    
    hidden_size: The number of features in the hidden state `h`
    
    num_layers: Number of recurrent layers. E.g., setting ``num_layers=2``
        would mean stacking two GRUs together to form a `stacked GRU`,
        with the second GRU taking in outputs of the first GRU and
        computing the final results. Default: 1
        
    bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
        Default: ``True``
        
    batch_first: If ``True``, then the input and output tensors are provided
        as (batch, seq, feature). Default: ``False``
        
    dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
        GRU layer except the last layer, with dropout probability equal to
        :attr:`dropout`. Default: 0
        
    bidirectional: If ``True``, becomes a bidirectional GRU. Default: ``False``

Inputs: input, h_0
    - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features
      of the input sequence. The input can also be a packed variable length
      sequence. See :func:`torch.nn.utils.rnn.pack_padded_sequence`
      for details.
    - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
      containing the initial hidden state for each element in the batch.
      Defaults to zero if not provided.

Outputs: output, h_n
    - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor
      containing the output features h_t from the last layer of the GRU,
      for each t. If a :class:`torch.nn.utils.rnn.PackedSequence` has been
      given as the input, the output will also be a packed sequence.
      For the unpacked case, the directions can be separated
      using ``output.view(seq_len, batch, num_directions, hidden_size)``,
      with forward and backward being direction `0` and `1` respectively.

      Similarly, the directions can be separated in the packed case.
    - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
      containing the hidden state for `t = seq_len`

      Like *output*, the layers can be separated using
      ``h_n.view(num_layers, num_directions, batch, hidden_size)``.

In [23]:
gru = nn.GRU(input_size=1, hidden_size=2, num_layers=2, bidirectional=True)

In [24]:
# seq_len = 2, batch=1, input_size=1
i = torch.randn(2, 1, 1)
# num_layer * bi = 4, batch=1, hidden=2
h0 = torch.randn(4, 1, 2)
print(i)
print(h0)

tensor([[[-0.0798]],

        [[-1.4793]]])
tensor([[[-0.3442,  1.2287]],

        [[-0.2276, -0.4527]],

        [[ 1.5438,  0.1433]],

        [[-0.0715, -0.4056]]])


In [25]:
output, hn = gru(i, h0)
# seq_len = 2, batch = 1, hidden=2 * 2 = 4
print(output)
# hn 和 cn 保持不变
print(hn)

tensor([[[ 1.0777,  0.4532, -0.3319, -0.5030]],

        [[ 0.6944,  0.5773, -0.4130, -0.4684]]], grad_fn=<CatBackward>)
tensor([[[-0.7958,  0.2209]],

        [[-0.5869, -0.3335]],

        [[ 0.6944,  0.5773]],

        [[-0.3319, -0.5030]]], grad_fn=<ViewBackward>)
