BRNN (Bidirectional Recurrent Neural Network) 은 왼쪽에서 오른쪽으로 문장을 처리하는 하나의 RNN과 오른쪽에서 왼쪽으로 문장을 처리하는 별도의 RNN으로 구성됩니다. 양측의 히든 벡터는 각 단어에서 표현을 생성하기 위해 연결 (또는 평균화)됩니다. 이런 방식으로, 각 단어의 표현은 그 단어 주위의 가변적인 크기의 문맥에 의해 풍성해집니다. 우리가 사용하는 BRNN의 정확한 형태는 t = 1, ..., T에 대해 다음과 같이 보입니다.
$$
\begin{align}
	&x_t = W_w \mathbb{I}_t \\
	&h_t^f = f(W_{fx}x_{t} + W_{f}h_{t-1}^f + b_f ) \\
	&h_t^b = f(W_{bx}x_{t} + W_{b}h_{t-1}^f + b_b ) \\
	&s_t = f(W_{d}(h_t^f + h_t^b) + b_d)
\end{align}
$$

BRNN은 왼쪽에서 오른쪽으로 ($h_f^t$) 이동하고 오른쪽에서 왼쪽으로 ($h_b^t$) 두 개의 독립적인 처리 스트림으로 구성됩니다. $t$ 번째 단어에 대한 최종 $h$ 차원 표현 $s_t$는 해당 위치의 단어와 문장의 주변 문맥 모두의 함수입니다. 활성화 함수 f의 일반적인 설정은 ReLU 또는 tanh입니다. s_t는 두 개의 RNN 스트림을 매개로하여 양측에서 단어 t와 해당 컨텍스트를 인코딩하는 개별 조각 벡터로 기능합니다.

![](http://cdn.images.postach.io/34375d19-06e8-41ef-90b9-f7fe3fc561d2/9b4518a9-f54a-41e2-9457-85807ac98116/223da3af-d22f-4e29-858d-2f9c82ef1d89.png)

* 예제의 구조
![image](https://cloud.githubusercontent.com/assets/1518919/21098227/89a7d22a-c0ab-11e6-9d09-71da3610bd44.png)


In [1]:
import tensorflow as tf
import numpy as np

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


To classify images using a bidirectional reccurent neural network, we consider
every image row as a sequence of pixels. Because MNIST image shape is 28*28px,
we will then handle 28 sequences of 28 steps for every sample.

In [2]:
# Parameters
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)


In [3]:
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

In [4]:
# Define weights
# Hidden layer weights => 2*n_hidden because of foward + backward cells
weights = tf.Variable(tf.random_normal([2*n_hidden, n_classes]))
biases = tf.Variable(tf.random_normal([n_classes]))

In [5]:
def BiRNN(x, weights, biases):

    # Prepare data shape to match `bidirectional_rnn` function requirements
    # Current data input shape: (batch_size, n_steps, n_input)
    # Required shape: 'n_steps' tensors list of shape (batch_size, n_input)

    # Permuting batch_size and n_steps
    x = tf.transpose(x, [1, 0, 2]) # shape=(28, ?, 28)
    # Reshape to (n_steps*batch_size, n_input)
    x = tf.reshape(x, [-1, n_input]) # shape=(?, 28)
    # Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    x = tf.split(0, n_steps, x) # 0차원에 대해서 n_step 만큼 쪼개라. 28 steps * (batch_size, n_input)

    # Define lstm cells with tensorflow
    # Forward direction cell
    lstm_fw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Backward direction cell
    lstm_bw_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)

    outputs, _, _ = tf.nn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                              dtype=tf.float32)
    return tf.matmul(outputs[-1], weights) + biases

In [6]:
pred = BiRNN(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()

In [7]:
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_input))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # Calculate batch accuracy
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

Iter 1280, Minibatch Loss= 1.858926, Training Accuracy= 0.36719
Iter 2560, Minibatch Loss= 1.536841, Training Accuracy= 0.47656
Iter 3840, Minibatch Loss= 1.242447, Training Accuracy= 0.56250
Iter 5120, Minibatch Loss= 0.914665, Training Accuracy= 0.72656
Iter 6400, Minibatch Loss= 0.793335, Training Accuracy= 0.75000
Iter 7680, Minibatch Loss= 1.123333, Training Accuracy= 0.61719
Iter 8960, Minibatch Loss= 0.782696, Training Accuracy= 0.71094
Iter 10240, Minibatch Loss= 0.585669, Training Accuracy= 0.79688
Iter 11520, Minibatch Loss= 0.377622, Training Accuracy= 0.92188
Iter 12800, Minibatch Loss= 0.667088, Training Accuracy= 0.79688
Iter 14080, Minibatch Loss= 0.531336, Training Accuracy= 0.83594
Iter 15360, Minibatch Loss= 0.365528, Training Accuracy= 0.87500
Iter 16640, Minibatch Loss= 0.473822, Training Accuracy= 0.85938
Iter 17920, Minibatch Loss= 0.316416, Training Accuracy= 0.89062
Iter 19200, Minibatch Loss= 0.271850, Training Accuracy= 0.88281
Iter 20480, Minibatch Loss= 0.14