# RNN 递归神经网络


## RNN概述
    
    在全连接神经网络或卷积神经网络中，网络结果都是从输入层到隐含层再到输出层，层与层之间是全连接或部分连接的，但每层之间的结点是无连接的。在解决输入是序列，需要“记忆”来结局的问题时就无能为力了。例如：给出一个句子，给出前面几个单词，预测下一个单词是什么。这时一般需要用到当前单词以及前面的单词，因为句子中前后单词并不是独立的，比如，当前单词是“很”，前一个单词是“天空”，那么下一个单词很大概率是“蓝”。 这时，循环神经网络(Recurrent Neural Networks，简称RNN)就有了用武之地。 RNN通常应用于解决训练样本输入是连续的序列, 且序列的长短不一的问题，它刻画一个序列当前的输出与之前信息的关系。从网络结果上来说，RNN会记忆之前的信息，并利用之前的信息影响后面的输出。也就是说，RNN的隐藏层之间的结点是有连接的，隐藏层的输入不仅包括输入层的输出，还包含上一时刻隐藏层的输出。
    
![RNN_8.png](RNN_8.jpeg)
    当前，它被广泛的用于自然语言处理中的语言建模和文本生成，语音识别，以及机器翻译等领域。
    例如：股票预测中的RNN，输入是前N天价格，输出明天的股市价格。
    
    
## RNN的结构：

![RNN_3.png](RNN_3.png)

上图描述了经典的RNN模型，具体的参数解释如下：

对于序列索引号t，其中：
1. x(t)代表在序列索引号t时训练样本的输入。同样的，x(t−1)和x(t+1)代表在序列索引号t−1和t+1时训练样本的输入。
2. h(t)代表在序列索引号t时模型的隐藏状态，直观的理解为“记忆”。h(t)由x(t)和h(t−1)共同决定。 计算公式如下: h(t) = f(U * x(t) + W * h(t-1))
3. o(t)代表在序列索引号t时模型的输出。o(t)只由模型当前的隐藏状态h(t)决定。
4. L(t)代表在序列索引号t时模型的损失函数。
5. y(t)代表在序列索引号t时训练样本序列的真实输出。
6. U,W,V这三个矩阵是我们的模型的线性关系参数，它在整个RNN网络中是共享的，这点和DNN很不相同。也正因为是共享了，体现了RNN的模型的“循环反馈”的思想。同时，这也意味着这个模型对于每一步的作用是一致的，只是输入不同。这样的方式大幅降低了需要学习的参数总数，减少了很多计算量。



## RNN主要层次示例

### 应用CNN进行汽车图像识别用例
![RNN_1.png](RNN_1.png)


### 通过卷积计算进行局部特征提取
![RNN_2.gif](RNN_2.gif)
![RNN_3.png](CNN_3.png)
![RNN_4.png](RNN_4.jpeg)
![RNN_5.png](RNN_5.gif)
![RNN_6.gif](RNN_6.gif)
![RNN_8.png](RNN_8.jpeg)



### 池化示例（最大化池化）



### RNN算法汇总
![CNN_7.png](CNN_7.jpeg)


### RNN的经典论文
- Recurrent neural network based language model 《基于循环神经网络的语言模型》
- Extensions of Recurrent neural network based language model《基于循环神经网络拓展的语言模型》
- Generating Text with Recurrent Neural Networks《利用循环神经网络生成文本》
- A Recursive Recurrent Neural Network for Statistical Machine Translation《用于统计类机器翻译的递归型循环神经网络》
- Sequence to Sequence Learning with Neural Networks《利用神经网络进行序列至序列的学习》
- Joint Language and Translation Modeling with Recurrent Neural Networks《利用循环神经网络进行语言和翻译的建模》
- Towards End-to-End Speech Recognition with Recurrent Neural Networks《利用循环神经网络进行端对端的语音识别》

---

### 例1. 基于TensorFlow，搭建LSTM-RNN模型，教会神经网络进行二进制加法
![RNN_7.png](RNN_7.gif)

参考:  [Anyone Can learn To Code LSTM-RNN in Python(Part 1: RNN)](https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/)


In [23]:
import tensorflow as tf
import numpy as np

print("tensorflow version: "+str(tf.__version__))


# 一个字典，隐射一个数字到其二进制的表示
# 例如 int2binary[3] = [0,0,0,0,0,0,1,1]
int2binary = {}

# 最多8位二进制
binary_dim = 8

# 在8位情况下，最大数为2^8 = 256
largest_number = pow(2,binary_dim)

# 将[0,256)所有数表示成二进制
binary = np.unpackbits(
    np.array([range(largest_number)],dtype=np.uint8).T,axis=1)

# 建立字典
for i in range(largest_number):
    int2binary[i] = binary[i]

def binary_generation(numbers, reverse = False):
    '''
    返回numbers中所有数的二进制表达，
    例如 numbers = [3, 2, 1]
    返回 ：[[0,0,0,0,0,0,1,1],
            [0,0,0,0,0,0,1,0],
            [0,0,0,0,0,0,0,1]'
            
    如果 reverse = True, 二进制表达式前后颠倒，
    这么做是为训练方便，因为训练的输入顺序是从低位开始的
    
    numbers : 一组数字
    reverse : 是否将其二进制表示进行前后翻转
    '''
    binary_x = np.array([ int2binary[num] for num in numbers], dtype=np.uint8)
    
    if reverse:
        binary_x = np.fliplr(binary_x)
    
    return binary_x

def batch_generation(batch_size, largest_number):
    '''
    生成batch_size大小的数据，用于训练或者验证
    
    batch_x 大小为[batch_size, biniary_dim, 2]
    batch_y 大小为[batch_size, biniray_dim]
    '''

    # 随机生成batch_size个数
    n1 = np.random.randint(0, largest_number//2, batch_size)
    n2 = np.random.randint(0, largest_number//2, batch_size)
    # 计算加法结果
    add = n1 + n2
    
    # int to binary
    binary_n1 = binary_generation(n1, True)
    binary_n2 = binary_generation(n2, True)
    batch_y = binary_generation(add, True)
    
    # 堆叠，因为网络的输入是2个二进制
    batch_x = np.dstack((binary_n1, binary_n2))
    
    return batch_x, batch_y, n1, n2, add

def binary2int(binary_array):
    '''
    将一个二进制数组转为整数
    '''
    out = 0
    for index, x in enumerate(reversed(binary_array)):
        out += x*pow(2, index)
    return out

1.8.0


设置参数

In [20]:
batch_size = 64

# LSTM的个数，就是隐层中神经元的数量
lstm_size = 20

# 隐层的层数
lstm_layers =2

定义输入输出

In [21]:
# 输入，[None, binary_dim, 2], 
# None表示batch_size, binary_dim表示输入序列的长度，2表示每个时刻有两个输入
x = tf.placeholder(tf.float32, [None, binary_dim, 2], name='input_x')

# 输出
y_ = tf.placeholder(tf.float32, [None, binary_dim], name='input_y')
# dropout 参数
keep_prob = tf.placeholder(tf.float32, name='keep_prob')

建立模型

In [25]:


# 搭建LSTM层（看成隐层）
# 有lstm_size个单元
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# dropout
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
# 一层不够，就多来几层
def lstm_cell():
    return tf.contrib.rnn.BasicLSTMCell(lstm_size)

cell = tf.contrib.rnn.MultiRNNCell([ lstm_cell() for _ in range(lstm_layers)])

# 初始状态，可以理解为初始记忆
initial_state = cell.zero_state(batch_size, tf.float32)

# 进行forward，得到隐层的输出
# outputs 大小为[batch_size, lstm_size*binary_dim]
outputs, final_state = tf.nn.dynamic_rnn(cell, x, initial_state=initial_state)

# 建立输出层
weights = tf.Variable(tf.truncated_normal([lstm_size, 1], stddev=0.01))
bias = tf.zeros([1])

# [batch_size, lstm_size*binary_dim] ==> [batch_size*binary_dim, lstm_size]
outputs = tf.reshape(outputs, [-1, lstm_size])
# 得到输出, logits大小为[batch_size*binary_dim, 1]
logits = tf.sigmoid(tf.matmul(outputs, weights))
# [batch_size*binary_dim, 1] ==> [batch_size, binary_dim]
predictions = tf.reshape(logits, [-1, binary_dim])

ValueError: Variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "<ipython-input-4-a2a978a6716c>", line 16, in <module>
    outputs, final_state = tf.nn.dynamic_rnn(cell, x, initial_state=initial_state)
  File "/Users/liang/anaconda3/envs/tensorflow/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/Users/liang/anaconda3/envs/tensorflow/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2903, in run_ast_nodes
    if self.run_code(code, result):


损失函数和优化方法

In [None]:
cost = tf.losses.mean_squared_error(y_, predictions)
optimizer = tf.train.AdamOptimizer().minimize(cost)

训练

In [None]:
steps = 2000
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    iteration = 1
    for i in range(steps):
        # 获取训练数据
        input_x, input_y,_,_,_ = batch_generation(batch_size, largest_number)
        _, loss = sess.run([optimizer, cost], feed_dict={x:input_x, y_:input_y, keep_prob:0.5})
        
        if iteration % 1000 == 0:
            print('Iter:{}, Loss:{}'.format(iteration, loss))    
        iteration += 1
    
    # 训练结束，进行测试
    val_x, val_y, n1, n2, add = batch_generation(batch_size, largest_number)
    result = sess.run(predictions, feed_dict={x:val_x, y_:val_y, keep_prob:1.0})
            
    # 左右翻转二进制数组。因为输出的结果是低位在前，而正常的表达是高位在前，因此进行翻转
    result = np.fliplr(np.round(result))
    result = result.astype(np.int32)
            
    for  b_x, b_p, a, b, add in zip(np.fliplr(val_x), result, n1, n2, add):
        print('{}:{}'.format(b_x[:,0], a))
        print('{}:{}'.format(b_x[:,1], b))
        print('{}:{}\n'.format(b_p, binary2int(b_p)))

上例使用python实现

In [18]:
import copy, numpy as np
np.random.seed(0)

# compute sigmoid nonlinearity
def sigmoid(x):
    output = 1/(1+np.exp(-x))
    return output

# convert output of sigmoid function to its derivative
def sigmoid_output_to_derivative(output):
    return output*(1-output)


# training dataset generation
int2binary = {}
binary_dim = 8

largest_number = pow(2,binary_dim)
binary = np.unpackbits(
    np.array([range(largest_number)],dtype=np.uint8).T,axis=1)
for i in range(largest_number):
    int2binary[i] = binary[i]

print("---binary---")
print(binary)
print("\n")

print("---int2binary---")
print(int2binary)
print("\n")

# input variables
alpha = 0.1
input_dim = 2
hidden_dim = 16
output_dim = 1


# initialize neural network weights
synapse_0 = 2*np.random.random((input_dim,hidden_dim)) - 1
synapse_1 = 2*np.random.random((hidden_dim,output_dim)) - 1
synapse_h = 2*np.random.random((hidden_dim,hidden_dim)) - 1

synapse_0_update = np.zeros_like(synapse_0)
synapse_1_update = np.zeros_like(synapse_1)
synapse_h_update = np.zeros_like(synapse_h)

# training logic
for j in range(10000):
    
    # generate a simple addition problem (a + b = c)
    a_int = np.random.randint(largest_number/2) # int version
    a = int2binary[a_int] # binary encoding

    b_int = np.random.randint(largest_number/2) # int version
    b = int2binary[b_int] # binary encoding

    # true answer
    c_int = a_int + b_int
    c = int2binary[c_int]
    
    # where we'll store our best guess (binary encoded)
    d = np.zeros_like(c)

    overallError = 0
    
    layer_2_deltas = list()
    layer_1_values = list()
    layer_1_values.append(np.zeros(hidden_dim))
    
    # moving along the positions in the binary encoding
    for position in range(binary_dim):
        
        # generate input and output
        X = np.array([[a[binary_dim - position - 1],b[binary_dim - position - 1]]])
        y = np.array([[c[binary_dim - position - 1]]]).T

        # hidden layer (input ~+ prev_hidden)
        layer_1 = sigmoid(np.dot(X,synapse_0) + np.dot(layer_1_values[-1],synapse_h))

        # output layer (new binary representation)
        layer_2 = sigmoid(np.dot(layer_1,synapse_1))

        # did we miss?... if so by how much?
        layer_2_error = y - layer_2
        layer_2_deltas.append((layer_2_error)*sigmoid_output_to_derivative(layer_2))
        overallError += np.abs(layer_2_error[0])
    
        # decode estimate so we can print it out
        d[binary_dim - position - 1] = np.round(layer_2[0][0])
        
        # store hidden layer so we can use it in the next timestep
        layer_1_values.append(copy.deepcopy(layer_1))
    
    future_layer_1_delta = np.zeros(hidden_dim)
    
    for position in range(binary_dim):
        
        X = np.array([[a[position],b[position]]])
        layer_1 = layer_1_values[-position-1]
        prev_layer_1 = layer_1_values[-position-2]
        
        # error at output layer
        layer_2_delta = layer_2_deltas[-position-1]
        # error at hidden layer
        layer_1_delta = (future_layer_1_delta.dot(synapse_h.T) + \
            layer_2_delta.dot(synapse_1.T)) * sigmoid_output_to_derivative(layer_1)
        # let's update all our weights so we can try again
        synapse_1_update += np.atleast_2d(layer_1).T.dot(layer_2_delta)
        synapse_h_update += np.atleast_2d(prev_layer_1).T.dot(layer_1_delta)
        synapse_0_update += X.T.dot(layer_1_delta)
        
        future_layer_1_delta = layer_1_delta
    

    synapse_0 += synapse_0_update * alpha
    synapse_1 += synapse_1_update * alpha
    synapse_h += synapse_h_update * alpha    

    synapse_0_update *= 0
    synapse_1_update *= 0
    synapse_h_update *= 0
    
    # print out progress
    if(j % 1000 == 0):
        print("Error:" + str(overallError))
        print("Pred:" + str(d))
        print("True:" + str(c))
        out = 0
        for index,x in enumerate(reversed(d)):
            out += x*pow(2,index)
        print(str(a_int) + " + " + str(b_int) + " = " + str(out))
        print("------------")

---binary---
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]
 ...
 [1 1 1 ... 1 0 1]
 [1 1 1 ... 1 1 0]
 [1 1 1 ... 1 1 1]]


---int2binary---
{0: array([0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), 1: array([0, 0, 0, 0, 0, 0, 0, 1], dtype=uint8), 2: array([0, 0, 0, 0, 0, 0, 1, 0], dtype=uint8), 3: array([0, 0, 0, 0, 0, 0, 1, 1], dtype=uint8), 4: array([0, 0, 0, 0, 0, 1, 0, 0], dtype=uint8), 5: array([0, 0, 0, 0, 0, 1, 0, 1], dtype=uint8), 6: array([0, 0, 0, 0, 0, 1, 1, 0], dtype=uint8), 7: array([0, 0, 0, 0, 0, 1, 1, 1], dtype=uint8), 8: array([0, 0, 0, 0, 1, 0, 0, 0], dtype=uint8), 9: array([0, 0, 0, 0, 1, 0, 0, 1], dtype=uint8), 10: array([0, 0, 0, 0, 1, 0, 1, 0], dtype=uint8), 11: array([0, 0, 0, 0, 1, 0, 1, 1], dtype=uint8), 12: array([0, 0, 0, 0, 1, 1, 0, 0], dtype=uint8), 13: array([0, 0, 0, 0, 1, 1, 0, 1], dtype=uint8), 14: array([0, 0, 0, 0, 1, 1, 1, 0], dtype=uint8), 15: array([0, 0, 0, 0, 1, 1, 1, 1], dtype=uint8), 16: array([0, 0, 0, 1, 0, 0, 0, 0], dtype=uint8), 17: arr

Error:[3.63389116]
Pred:[1 1 1 1 1 1 1 1]
True:[0 0 1 1 1 1 1 1]
28 + 35 = 255
------------
Error:[3.91366595]
Pred:[0 1 0 0 1 0 0 0]
True:[1 0 1 0 0 0 0 0]
116 + 44 = 72
------------
Error:[3.72191702]
Pred:[1 1 0 1 1 1 1 1]
True:[0 1 0 0 1 1 0 1]
4 + 73 = 223
------------
Error:[3.5852713]
Pred:[0 0 0 0 1 0 0 0]
True:[0 1 0 1 0 0 1 0]
71 + 11 = 8
------------
Error:[2.53352328]
Pred:[1 0 1 0 0 0 1 0]
True:[1 1 0 0 0 0 1 0]
81 + 113 = 162
------------
Error:[0.57691441]
Pred:[0 1 0 1 0 0 0 1]
True:[0 1 0 1 0 0 0 1]
81 + 0 = 81
------------
Error:[1.42589952]
Pred:[1 0 0 0 0 0 0 1]
True:[1 0 0 0 0 0 0 1]
4 + 125 = 129
------------
Error:[0.47477457]
Pred:[0 0 1 1 1 0 0 0]
True:[0 0 1 1 1 0 0 0]
39 + 17 = 56
------------
Error:[0.21595037]
Pred:[0 0 0 0 1 1 1 0]
True:[0 0 0 0 1 1 1 0]
11 + 3 = 14
------------
