## 双向循环神经网络(BiRNN)  
  
使用TensorFlow构建一个双向循环神经网络(LSTM)  
  
- 作者: Aymeric Damien
- 代码: https://github.com/aymericdamien/TensorFlow-Examples/

#### BiRNN简介

<img src="https://ai2-s2-public.s3.amazonaws.com/figures/2016-11-08/191dd7df9cb91ac22f56ed0dfa4a5651e8767a51/1-Figure2-1.png" alt="nn" style="width: 600px;"/>

参考文献:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.


#### MNIST数据集简介
  
该例使用了MNIST手写数字数据集。MNIST数据集包含60000个实例，其中50000作为训练集，10000作为测试集。数字的大小已经被标准化和中心化到了固定的(0,1)区间(28\*28像素)。为了简便起见，每个图像矩阵被平铺，并转换为一个1维的numpy矩阵，其中包含784个特征点(28\*28)  
  
![MNIST 数据集](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)  
  
更多细节：http://yann.lecun.com/exdb/mnist/m

In [1]:
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

#### 1. 导入MNIST数据

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/", one_hot=True)   #当前目录下创建一个data文件夹即可

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz


#### 2. 自定义参数

In [8]:
#训练参数
learning_rate = 0.001
training_step = 10000
batch_size = 128
display_step = 200

#神经网络模型参数
num_input = 28    #MNIST输入数据（图像28*28像素）
timesteps = 28 
num_hidden = 128  #隐层的特征数
num_classes = 10  #MNIST的类别数，0-9个数字

#TensorFlow Graph输入
X = tf.placeholder('float',[None,timesteps,num_input])
Y = tf.placeholder('float',[None,num_classes])

#自定义权重参数
weights = {
    # 隐层的权重参数：2 * n_hidden ,因为包含前向和后向单元
    'out':tf.Variable(tf.random_normal([2*num_hidden,num_classes]))
}
biases = {
    'out':tf.Variable(tf.random_normal([num_classes]))
}

#### 3. 构建模型

In [13]:
def BiRNN(x,weights,biases):
    #准备可以满足RNN要求的数据形式
    #现在数据的形式：(batch_size,timesteps,n_input)
    #需要的形式：'timestep'的一个Tenosr List：(batch_size,num_input)
    
    
    #展开来得到一个'timestep'的一个Tenosr List：(batch_size,num_input)
    x = tf.unstack(x,timesteps,1)
    
    
    #用TensorFlow定义一个LSTM单元
    #前向单元
    lstm_fw_cell = rnn.BasicLSTMCell(num_hidden,forget_bias = 1.0,reuse=True)  #添加了reuse = True
    #后向单元
    lstm_bw_cell = rnn.BasicLSTMCell(num_hidden,forget_bias = 1.0,reuse=True)  #添加了reuse = True
    
    
    #取出LSTM单元的输出
    try:
        outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                              dtype=tf.float32)
    #旧版的TensorFlow只返回outputs而不返回state
    except Exception:
        outputs = rnn.static_bidirectional_rnn(lstm_fw_cell,lstm_bw_cell,x,
                                              dtype = tf.float32)
        
    # 线性激活函数，使用RNN内循环的最后一个output
    return tf.matmul(outputs[-1],weights['out']) + biases['out']



logits = BiRNN(X,weights,biases)
prediction = tf.nn.softmax(logits)

#定义loss和optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits= logits,labels = Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate= learning_rate)
train_op = optimizer.minimize(loss_op)

#模型评估（使用了test logits，没有设置dropout）
correct_pred = tf.equal(tf.argmax(prediction,1),tf.argmax(Y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32)) #tf.cast将数据格式转化成dtype

#初始化变量
init = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



#### 4. 训练模型

In [15]:
#开始训练
with tf.Session() as sess:
    
    #初始化
    sess.run(init)
    
    for step in range(1 , training_step + 1):
        batch_x ,batch_y = mnist.train.next_batch(batch_size)
        #将数据变形为28个序列，每个序列含有28个元素
        batch_x = batch_x.reshape((batch_size,timesteps,num_input))
        #运行optimizer（后向传播）
        sess.run(train_op,feed_dict = {X : batch_x,Y : batch_y})
        if step % display_step == 0 or step == 1 :
            #计算batch loss和accuracy
            loss, acc = sess.run([loss_op,accuracy],feed_dict = {X : batch_x,Y : batch_y})
            print('step ' +str(step) + ', Minibatch Loss = ' + \
                 '{:.4f}'.format(loss) + ', Training Accuracy = ' + \
                 '{:.3f}'.format(acc))
    print ('训练结束!')
    
    
    #在128张测试集照片中计算accuracy
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1,timesteps,num_input))
    test_label = mnist.test.labels[:test_len]
    print ('Testing Accuracy :', sess.run(accuracy,feed_dict = {X:test_data,Y:test_label}))

step 1, Minibatch Loss = 2.5691, Training Accuracy = 0.094
step 200, Minibatch Loss = 2.0525, Training Accuracy = 0.320
step 400, Minibatch Loss = 1.8990, Training Accuracy = 0.344
step 600, Minibatch Loss = 1.8753, Training Accuracy = 0.352
step 800, Minibatch Loss = 1.7142, Training Accuracy = 0.383
step 1000, Minibatch Loss = 1.5996, Training Accuracy = 0.398
step 1200, Minibatch Loss = 1.4993, Training Accuracy = 0.539
step 1400, Minibatch Loss = 1.4508, Training Accuracy = 0.469
step 1600, Minibatch Loss = 1.5240, Training Accuracy = 0.523
step 1800, Minibatch Loss = 1.2338, Training Accuracy = 0.625
step 2000, Minibatch Loss = 1.2532, Training Accuracy = 0.531
step 2200, Minibatch Loss = 1.1183, Training Accuracy = 0.625
step 2400, Minibatch Loss = 1.3090, Training Accuracy = 0.570
step 2600, Minibatch Loss = 1.0804, Training Accuracy = 0.633
step 2800, Minibatch Loss = 1.0308, Training Accuracy = 0.727
step 3000, Minibatch Loss = 1.1369, Training Accuracy = 0.656
step 3200, Mini