### RNN

优点： 简单

缺点：多层之间串联，计算慢；单向只有上文，双向才有上下文，且遵循最临近最重要的默认规则，不符合语言特性，没有权重项；只能做短序列任务，因为会丢失距离较远的信息；梯度难以收敛；

* Weight sharing: 每一个数据，都经过相同的层来训练，也就是w, b是共享的，当然RNN也是可以多层，ht是下一层的输入。
  
  ![rnn_wight_sharing](../images/rnn_wight_sharing.png)
* Consistent memory：每个数据经过层训练后的输出，同时作为下一个数据的输入，一起训练，相当于记录之前的状态.h0一般初始化为[0, 0, 0...]
  
  ![rnn_consistent_memory](../images/rnn_consistent_memory.png)

  ![rnn_unit](../images/rnn_unit.png)

  ![rnn_unit_formulation](../images/rnn_unit_formulation.png)
  
### Gradient

![rnn_gradient](../images/rnn_gradient.png)


### Recap

![rnn_recap](../images/rnn_recap.png)

In [2]:
import tensorflow as tf
from keras import layers
from tensorflow import keras

In [3]:
# SimpleRNNCell就是最简单的上文展示的RNN Unit
# 3表示有3个数据单元（时间单元，时间戳）
cell = layers.SimpleRNNCell(3)
cell.build(input_shape=(None, 4))

cell.trainable_variables

[<tf.Variable 'kernel:0' shape=(4, 3) dtype=float32, numpy=
 array([[ 0.72934437, -0.08529466,  0.08860195],
        [ 0.06994289,  0.30748487, -0.8542601 ],
        [ 0.6114085 , -0.7415279 , -0.6091521 ],
        [-0.65040267,  0.34583807, -0.32419276]], dtype=float32)>,
 <tf.Variable 'recurrent_kernel:0' shape=(3, 3) dtype=float32, numpy=
 array([[ 0.9317492 , -0.02076747, -0.36250818],
        [ 0.11296385,  0.96539825,  0.23504323],
        [-0.34508353,  0.25995165, -0.901855  ]], dtype=float32)>,
 <tf.Variable 'bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

In [5]:
x = tf.random.normal([4, 80, 100])
xt0 = x[:, 0, :]
print("xt0 shape:", xt0.shape)

cell = tf.keras.layers.SimpleRNNCell(64)

out, xt1 = cell(xt0, [tf.zeros([4, 64])])

out.shape, xt1[0].shape
# out和xt1[0](h1)是同一个东西，id相同
# 实际上这里xt1只有1个元素，tensorflow里state统一使用数组表示
id(out), id(xt1[0])

xt0 shape: (4, 100)


(1525508470528, 1525508470528)

### 多层Cell

![multi_rnn_cell](../images/multi_rnn_cell.png)

In [6]:
x = tf.random.normal([4, 80, 100])
xt0 = x[:, 0, :]
cell = tf.keras.layers.SimpleRNNCell(64)
cell2 = tf.keras.layers.SimpleRNNCell(64)

state0 = [tf.zeros([4, 64])]
state1 = [tf.zeros([4, 64])]

out0, state0 = cell(xt0, state0)
out2, state2 = cell2(out, state0)

In [7]:
rnn = keras.Sequential([
    # SimpleRNN只有一个cell, 与SimpleRNNCell的区别是它不需要人为的拆解数据，会自动拆解
    # return_sequences是因为上层输出要作为下层输入
    layers.SimpleRNN(units=64, dropout=0.5, return_sequences=True, unroll=True),
    layers.SimpleRNN(units=64, dropout=0.5, unroll=True)
])
x = tf.random.normal([4, 80, 100])
x = rnn(x)
x.shape

TensorShape([4, 64])