[54 循环神经网络 RNN](https://www.bilibili.com/video/BV1D64y1z7CA?spm_id_from=333.999.0.0)
- <img src="picture/屏幕截图 2022-05-31 191341.png">
- 新概念h：隐变量，这个变量是真实存在的，但无法被感知到的变量
- 输出$o_t只能由隐变量h_t和x_{t-1}共同进行输出，不能由x_t直接输出$
- $h_t = \phi(W_{hh}h_{t-1}+W_{}{hx}X_{t-1}+b_h)$
- $o_t = \phi(W_{oh}+b_o)这里视频有误是W_{oh}$
- <img src="picture/屏幕截图 2022-05-31 191542.png">
- 输出$o_t是作为当前时刻的预测值，预测x_t的值$
- <img src="picture/屏幕截图 2022-05-31 193028.png">

- 困惑度 perplexity
- 衡量一个模型的好坏可以用平均交叉熵来衡量
- $$\pi = \frac{1}{n} \sum_{t=1}^n -\log P(x_t \mid x_{t-1}, \ldots, x_1)$$
- p是语言模型的预测概率，$x_t$是真实词
- 历史原因NLP使用$exp(\pi)$来衡量（能够扩大相应的值，直接使用会太小）
- 1表示完美。无穷大是最差情况

- <img src="picture/屏幕截图 2022-05-31 230201.png">
- 主要应用
- <img src="picture/屏幕截图 2022-05-31 231212.png">

[55 循环神经网络 RNN 的实现](https://www.bilibili.com/video/BV1kq4y1H7sw/?spm_id_from=333.788.recommend_more_video.1)

In [1]:
# 开炼！
%matplotlib inline
import math
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

In [2]:
# 加载数据集
# vocab
batch_size,num_steps = 32,35
train_iter,vocab = d2l.load_data_time_machine(batch_size,num_steps)

In [3]:
for x in train_iter:
    print(x[0].shape)
    print(x[0][0])
    print(x[1].shape)
    print(x[1][0])
    break

torch.Size([32, 35])
tensor([15,  9,  5,  6,  2,  1, 21, 19,  1,  9,  1, 18,  1, 17,  2, 12, 12,  8,
         5,  3,  9,  2,  1,  3,  5, 13,  2,  1,  3, 10,  4, 22,  2, 12, 12])
torch.Size([32, 35])
tensor([ 9,  5,  6,  2,  1, 21, 19,  1,  9,  1, 18,  1, 17,  2, 12, 12,  8,  5,
         3,  9,  2,  1,  3,  5, 13,  2,  1,  3, 10,  4, 22,  2, 12, 12,  2])


In [4]:
vocab.token_freqs

[(' ', 29927),
 ('e', 17838),
 ('t', 13515),
 ('a', 11704),
 ('i', 10138),
 ('n', 9917),
 ('o', 9758),
 ('s', 8486),
 ('h', 8257),
 ('r', 7674),
 ('d', 6337),
 ('l', 6146),
 ('m', 4043),
 ('u', 3805),
 ('c', 3424),
 ('f', 3354),
 ('w', 3225),
 ('g', 3075),
 ('y', 2679),
 ('p', 2427),
 ('b', 1897),
 ('v', 1295),
 ('k', 1087),
 ('x', 236),
 ('z', 144),
 ('j', 97),
 ('q', 95)]

In [5]:
len(vocab)

28

In [6]:
# 独热编码
F.one_hot(torch.tensor([0,2,12]),len(vocab))

tensor([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0]])

In [9]:
# 小批量数据的形状是（批量大小,时间步数）
# rnn的时间步长可以理解为tau，也是错位的序列
# 由于文本是连续的，时间步长在文本上就表现为单词的连续，例如时间步数为5，就表示5元组成一个序列
# 批量大小指的是多少个序列组成一组
X = torch.arange(10).reshape(2,5)
F.one_hot(X.T,28)

tensor([[[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0]],

        [[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0]],

        [[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0]],

        [[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0]],

        [[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0

In [10]:
# 初始化参数模型
def get_params(vocab_size, num_hiddens, device):
    num_inputs = num_outputs = vocab_size

    def normal(shape):
        return torch.randn(size=shape, device=device) * 0.01

    # 隐藏层参数
    W_xh = normal((num_inputs, num_hiddens))
    W_hh = normal((num_hiddens, num_hiddens))
    b_h = torch.zeros(num_hiddens, device=device)
    # 输出层参数
    W_hq = normal((num_hiddens, num_outputs))
    b_q = torch.zeros(num_outputs, device=device)
    # 附加梯度
    params = [W_xh, W_hh, b_h, W_hq, b_q]
    for param in params:
        param.requires_grad_(True)
    return params