# RNN
循环神经网络

## 阅读材料

[Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks](https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/)


## 笔记

1. A single time step of the input is supplied to the network i.e. xt is supplied to the network
2. We then calculate its current state using a combination of the current input and the previous state i.e. we calculate ht
3. The current ht becomes ht-1 for the next time step
4. We can go as many time steps as the problem demands and combine the information from all the previous states
5. Once all the time steps are completed the final current state is used to calculate the output yt
6. The output is then compared to the actual output and the error is generated
7. The error is then backpropagated to the network to update the weights(we shall go into the details of backpropagation in further sections) and the network is trained

## 简单实现

样本用'hello'这个单词，目标是输入'hell',能推理出'o'

In [1]:
word = 'hello'
letters = list(word)
print(f"letters:{letters}")
words = list(set(letters))
voc_size = len(words)
print(f"vocabular:{words}")
print(f"voc size:{voc_size}")

letters:['h', 'e', 'l', 'l', 'o']
vocabular:['h', 'e', 'o', 'l']
voc size:4


In [4]:
# 保持和文章一致方便对照，这里硬编码word to idx
word_to_idx = {
    'h':0,
    'e':1,
    'l':2,
    'o':3
}
print(word_to_idx)

{'h': 0, 'e': 1, 'l': 2, 'o': 3}


In [5]:
import torch

def one_hot_encoding(word):
    tensor = torch.zeros(len(word_to_idx))
    tensor[word_to_idx[word]] = 1
    return tensor

encoded_words = [(one_hot_encoding(word),word)for word in words]
print(encoded_words)

[(tensor([1., 0., 0., 0.]), 'h'), (tensor([0., 1., 0., 0.]), 'e'), (tensor([0., 0., 0., 1.]), 'o'), (tensor([0., 0., 1., 0.]), 'l')]


In [6]:
Wxh = torch.tensor([
    [0.287027,0.84606,0.572392,0.486813],
    [0.902874,0.871522,0.691079,0.18998],
    [0.537524,0.09224,0.558159,0.491528]
],dtype=torch.float32)

print(Wxh)

tensor([[0.2870, 0.8461, 0.5724, 0.4868],
        [0.9029, 0.8715, 0.6911, 0.1900],
        [0.5375, 0.0922, 0.5582, 0.4915]])


In [7]:
Xt = torch.tensor([1,0,0,0],dtype=torch.float32)
print(Xt)
print(Xt.shape)
# print(Xt.view(-1,1))

result = torch.matmul(Wxh,Xt)

print(result)
print(result.shape)
result1 = torch.matmul(Wxh,Xt.view(-1,1))
print(result1)
print(result1.shape)


tensor([1., 0., 0., 0.])
torch.Size([4])
tensor([0.2870, 0.9029, 0.5375])
torch.Size([3])
tensor([[0.2870],
        [0.9029],
        [0.5375]])
torch.Size([3, 1])


In [8]:
# 探索一下矩阵和行向量，gpt生成了一个例子

# 定义一个1x3的矩阵
matrix = torch.tensor([[1, 2, 3]])

# 定义一个dim为3的行向量
row_vector = torch.tensor([1, 2, 3])

print(matrix)
print(row_vector)

tensor([[1, 2, 3]])
tensor([1, 2, 3])


> 在这个示例中，我们定义了一个1x3的矩阵matrix和一个dim为3的行向量row_vector，它们在大多数情况下可以等价使用。

> 然而，在一些情况下，例如矩阵乘法等操作中，PyTorch可能会对它们进行不同的处理，因此在进行特定操作时可能需要注意它们的类型和维度。但在大多数情况下，这两种表示都可以等价使用。


In [None]:
# 随机初始化测试

rnd = torch.rand(1,1)

print(rnd)

In [None]:
# 随机初始变量
Whh = torch.rand(1,1)
bias = torch.rand(1,1)

print(Whh)
print(bias)

In [None]:
# 测试一下计算

# state = torch.matmul(Whh,result1).add(bias)

print(Whh * result1 + bias)
#print(state)

In [None]:
print(torch.tanh(Whh * result1 + bias))

对原文的`Why`有疑惑，还有计算的完整过程也有没理解的地方，用文中数据实际进行计算来反推

进行完整的过程计算

In [9]:
Whh = torch.tensor([[0.427043]])
bias = torch.tensor([[0.57700]])

print(f"Whh={Whh},bias={bias}")

Whh=tensor([[0.4270]]),bias=tensor([[0.5770]])


In [10]:
# 第一步计算时，前一个状态是没有的，所以用[0,0,0]

h0 = torch.tensor([[0],[0],[0]])
print(h0)

tensor([[0],
        [0],
        [0]])


In [11]:
def calc(previous_state):
    return Whh * previous_state + bias

In [12]:
# step 1,计算 'h'
calced_h0 = calc(h0)
print(calced_h0)

Wxh_Xt = torch.matmul(Wxh,one_hot_encoding('h').view(-1,1))
print(Wxh_Xt)

print(Wxh)

h1 = torch.tanh( calc(h0)+ torch.matmul(Wxh,one_hot_encoding('h').view(-1,1)))
print(h1)


tensor([[0.5770],
        [0.5770],
        [0.5770]])
tensor([[0.2870],
        [0.9029],
        [0.5375]])
tensor([[0.2870, 0.8461, 0.5724, 0.4868],
        [0.9029, 0.8715, 0.6911, 0.1900],
        [0.5375, 0.0922, 0.5582, 0.4915]])
tensor([[0.6983],
        [0.9014],
        [0.8057]])


In [13]:
# step 2，计算 'e'
h2 = torch.tanh( calc(h1)+ torch.matmul(Wxh,one_hot_encoding('e').view(-1,1)))
print(h2)

tensor([[0.9380],
        [0.9502],
        [0.7671]])


In [14]:
# step 3，计算 'l'
h3 = torch.tanh( calc(h2)+ torch.matmul(Wxh,one_hot_encoding('l').view(-1,1)))
print(h3)

tensor([[0.9138],
        [0.9321],
        [0.8982]])


In [15]:
# step 4，计算 'l'
h4 = torch.tanh( calc(h3)+ torch.matmul(Wxh,one_hot_encoding('l').view(-1,1)))
print(h4)

tensor([[0.9121],
        [0.9310],
        [0.9085]])


In [None]:
# step 5，计算 'l'
h4 = torch.tanh( calc(h3)+ torch.matmul(Wxh,one_hot_encoding('l').view(-1,1)))
print(h4)