# content
- 循环神网络结构与应用
- pytorch_lstm_input
- LSTM Cell 计算过程
- pytorch 单向LSTM使用
- pytorch Lstm_batch_first解释
- 数据准备
- LSTM网络拓扑

# 循环神网络结构与应用
![deque](pic/RNN结构.png)

1. ono to one : 图片分类
2. one to many : 图片阅读，送入图片输出图片的文字含义
3. many to one : 语法纠错，输入一段序列判断语法正确性
4. many to many（异步）: 机器翻译
5. many to many (同步) : 序列标注，对视频每一帧打标签

# pytorch_lstm_input
![deque](pic/Pytorch_LSTM_input.png)

# LSTM Cell 计算过程
![deque](pic/lstm_cell1.png)
![deque](pic/lstm_cell2.png)
# pytorch 单向LSTM使用
    --------------------------------
    rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2(input_size,hidden_size,num_layers)
    input = torch.randn(5, 3, 10)#(seq_len, batch, input_size)
    h0 = torch.randn(2, 3, 20) #(num_layers,batch,output_size)
    c0 = torch.randn(2, 3, 20) #(num_layers,batch,output_size)
    output, (hn, cn) = rnn(input, (h0, c0))
    --------------------------------
    output.shape #(seq_len, batch, hidden_size)
    torch.Size([5, 3, 20])
    --------------------------------
    hn.shape #(num_layers, batch, hidden_size)
    torch.Size([2, 3, 20])
    --------------------------------
    cn.shape #(num_layers, batch, hidden_size)
    torch.Size([2, 3, 20])
    --------------------------------
![deque](pic/单向lstm1.png)
# pytorch Lstm_batch_first解释
![deque](pic/lstm_batch_first.png)

In [2]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu' )

torch.manual_seed(1)

<torch._C.Generator at 0x211805ca510>

In [3]:
# 数据准备
def prepare_sequence(seq, word_vocabulary):
    idxs = [word_vocabulary[w]  for w in seq]
    tensor = torch.LongTensor(idxs)
    return autograd.Variable(tensor)

training_data = [
    ('the dog ate the apple'.split(), ['DET','NN','V','DET','NN'] ),
    ('Everybody read that book'.split(), ['NN', 'V', 'DET','NN'])
]

word_to_idx = {}
# 构造词典
for sentence, tags in training_data:
    for word in sentence:
        if word not in word_to_idx:
            word_to_idx[word] = len(word_to_idx)

tag_to_idx = {'DET':0,'NN':1,'V':2}        

In [4]:
# LSTM网络拓扑
class LSTMTagger(nn.Module):
    def __init__(self, input_size, hidden_size, vocab_size, output_size, num_layers):
        super(LSTMTagger, self).__init__()
        # 网络拓扑构建
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.vocab_size = vocab_size
        self.output_size = output_size
        
        self.word_embeddings = nn.Embedding(self.vocab_size, self.input_size)
        self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers)
        
        # 线性层将隐层状态空间映射到标注空间
        # 线性层的输入维度hidden_size 输出维度 output_size
        self.fc = nn.Linear(self.hidden_size, self.output_size)
       
    def forward(self, inputs):
        
        embeds = self.word_embeddings(inputs)
        
        h0 = autograd.Variable(torch.zeros(self.num_layers, 1, self.hidden_size)).to(device)
        # hn.shape = (num_layers, batch, hidden_size)
        
        c0 = autograd.Variable(torch.zeros(self.num_layers, 1, self.hidden_size)).to(device)
        # c0.shape = (num_layers, batch, hidden_size)
        
        outs, (_,_) = self.lstm(embeds.view(len(inputs), 1, -1),(h0, c0))
        # out = (seq_len, batch_size, hidden_size)
        '''
        LSTM 三个输入:
        
        LSTM 的所有的形式固定为3D 的 tensor. 每个维度有固定的语义含义, 不能乱掉.
        位置一: 序列的长度 seq_len
        位置二: 输入数据的 batch_size
        位置三: 输入序列的 input_size 
         
        LSTM 两个返回值:
        
        另外, 我们还可以一次对整个序列进行训练. LSTM 返回的第一个值表示所有时刻的隐状态值,
        第二个值表示最近的隐状态值 (因此下面的 "lstm_out"的最后一个值和 "_" 的值是一样的).
        之所以这样设计, 是为了通过 "lstm_out" 的值来获取所有的隐状态值, 而用 "_" 的值来
        进行序列的反向传播运算, 具体方式就是将它作为参数传入后面的 LSTM 网络.
        '''
        outs = self.fc(outs.view(len(inputs), -1))
        
        outs = F.log_softmax(outs, dim=1)
        
        return outs

In [5]:
# 模型训练

# 模型初始化/损失函数定义/优化方式定义

# 超参
input_size = 6 # embedding_dim
hidden_size = 6
vocab_size = len(word_to_idx)
output_size = len(tag_to_idx)
num_layers = 1
learning_rate = 0.01

# lstm网络层
num_layers = 1

model = LSTMTagger(input_size, hidden_size, vocab_size, output_size, num_layers).to(device)
# 确定损失函数
loss_function = nn.NLLLoss()
# 确定优化器
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

inputs = prepare_sequence(training_data[0][0], word_to_idx)

tag_scores = model(inputs.to(device))

print('输入的句子:','\n',training_data[0][0])
print('模型输入inputs:','\n',inputs)

#输出矩阵元素 i,j 表示单词 i 对应 j 标签的分数
print('beforing training:','\n',tag_scores)

for epoch in range(2):
    for sentence, tags in training_data:
        
        # pytorch 会累加梯度，所以每次训练前需要清空模型参数的梯度值
        model.zero_grad()
        
        sentence_in = prepare_sequence(sentence, word_to_idx)
        
        targets = prepare_sequence(tags, tag_to_idx)
        
        tag_scores = model(sentence_in.to(device))
        
        loss = loss_function(tag_scores, targets.to(device))
        # 反向计算模型参数的每个导数值
        loss.backward()
        
        optimizer.step()
        
inputs = prepare_sequence(training_data[0][0], word_to_idx)
tag_scores = model(inputs.to(device))

print('after training :','\n',tag_scores)

输入的句子: 
 ['the', 'dog', 'ate', 'the', 'apple']
模型输入inputs: 
 tensor([0, 1, 2, 0, 3])
beforing training: 
 tensor([[-1.0069, -1.3650, -0.9696],
        [-0.9032, -1.3852, -1.0657],
        [-0.9662, -1.4467, -0.9567],
        [-1.0182, -1.4220, -0.9225],
        [-0.9647, -1.4282, -0.9698]], device='cuda:0',
       grad_fn=<LogSoftmaxBackward>)
after training : 
 tensor([[-1.0060, -1.3554, -0.9770],
        [-0.9032, -1.3753, -1.0730],
        [-0.9670, -1.4365, -0.9623],
        [-1.0170, -1.4118, -0.9299],
        [-0.9654, -1.4172, -0.9760]], device='cuda:0',
       grad_fn=<LogSoftmaxBackward>)
