# Recurrent Neural Network Example

Build a recurrent neural network (LSTM) with TensorFlow.

- Author: Guorui shen, original script https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb
- Project: https://github.com/aymericdamien/TensorFlow-Examples/
- Detailed description of this code https://jasdeep06.github.io/posts/Understanding-LSTM-in-Tensorflow-MNIST/

## LSTM
reference - https://en.wikipedia.org/wiki/LSTM
\begin{align}
&f_t = \sigma_g(W_fx_t+U_fh_{t-1}+b_f)\cr
&i_t = \sigma_g(W_ix_t+U_ih_{t-1}+b_i)\cr
&o_t = \sigma_g(W_ox_t+U_oh_{t-1}+b_o)\cr
&c_t = f_t\circ c_{t-1}+i_t\sigma_c(W_cx_t+U_ch_{t-1}+b_c)\cr
&h_t = o_t\circ \sigma_h(c_t)
\end{align}
where the initial values are $c_{0}=0$ and $h_{0}=0$, $c_t\in R^h, h_t\in R_h$. The LSTM is shown as 
<img src="./pictures/lstm_structure.png" alt="sturcture of a lstm unit" style="width: 300px;"/>
and unrolling timesteps times as 
<img src="./pictures/unroll.png" alt="unrolling/unfolding a lstm unit multiple times" style="width: 900px;"/>
and finally, the information of all inputs $x_1, x_2, \cdots, x_{timesteps}$ are accumulated into $h_{timesteps}$ when the lstm unit was unrolled timesteps times.

Both two pictures are from Stanford, Feifei-Li et al slides.

## RNN Overview

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" alt="nn" style="width: 600px;"/>

where $A$ in this picture represents a LSTM unit.

References:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.

## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.

More info: http://yann.lecun.com/exdb/mnist/

In [1]:
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import rnn

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [2]:
# Training Parameters
learning_rate = 0.001
training_steps = 10000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

In [3]:
# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]))
}

In [4]:
# tf.nn.static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)

# input:
#     cell:用于神经网络的RNN神经元,如BasicRNNCell,BasicLSTMCell
#     inputs:一个长度为T的list,list中的每个元素为一个Tensor，Tensor的size为[batch_size,input_size]
#     initial_state:RNN的初始状态，如果cell.state_size是一个整数，则它必须是适当类型和形如[batch_size,cell.state_size]的张量。如cell.state_size是一个元组，那么它应该是一个张量元组，对于cell.state_size中的s,应该是具有形如[batch_size,s]的张量的元组。
#     dtype:初始状态和预期输出的数据类型。可选参数。
#     sequence_length:指定每个输入的序列的长度。大小为batch_size的向量。
#     scope:变量范围
# output:
#     (outputs，state)
#     outputs:一个长度为T的list，list中的每个元素是每个输入对应的输出。例如一个时间步对应一个输出。
#     state:最终的状态

In [5]:
def RNN(x, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, timesteps, 1)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

    # Get lstm cell output
    h_t, c_t = rnn.static_rnn(lstm_cell, x, dtype=tf.float32) # x is of (timesteps, batch_size, num_input)
    outputs, states = h_t, c_t # where both h_t and c_t are of size (timesteps, batch_size, num_hidden)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

In [6]:
logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



In [7]:
# deletable, for understanding tf.unstack
import numpy as np
a = np.array([[[1,2,3],[4,5,6],[7,8,9],[10,11,12]], [[13,14,15],[16,17,18],[19,20,21],[22,23,24]]])
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)
    x = tf.constant(a) # 2,4,3
    x = tf.unstack(x, 4, 1)
    x_val = sess.run(x) # 4,2,3
    print(len(x_val), len(x_val[0]), len(x_val[0][0]))
    print(x_val)

4 2 3
[array([[ 1,  2,  3],
       [13, 14, 15]]), array([[ 4,  5,  6],
       [16, 17, 18]]), array([[ 7,  8,  9],
       [19, 20, 21]]), array([[10, 11, 12],
       [22, 23, 24]])]


In [8]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

Step 1, Minibatch Loss= 2.7322, Training Accuracy= 0.078
Step 200, Minibatch Loss= 2.0772, Training Accuracy= 0.266
Step 400, Minibatch Loss= 1.9767, Training Accuracy= 0.336
Step 600, Minibatch Loss= 1.8273, Training Accuracy= 0.422
Step 800, Minibatch Loss= 1.7489, Training Accuracy= 0.438
Step 1000, Minibatch Loss= 1.6559, Training Accuracy= 0.391
Step 1200, Minibatch Loss= 1.4982, Training Accuracy= 0.523
Step 1400, Minibatch Loss= 1.3998, Training Accuracy= 0.539
Step 1600, Minibatch Loss= 1.3682, Training Accuracy= 0.562
Step 1800, Minibatch Loss= 1.2685, Training Accuracy= 0.586
Step 2000, Minibatch Loss= 1.3515, Training Accuracy= 0.586
Step 2200, Minibatch Loss= 1.0189, Training Accuracy= 0.758
Step 2400, Minibatch Loss= 1.0394, Training Accuracy= 0.688
Step 2600, Minibatch Loss= 1.0661, Training Accuracy= 0.695
Step 2800, Minibatch Loss= 1.1419, Training Accuracy= 0.633
Step 3000, Minibatch Loss= 1.0063, Training Accuracy= 0.641
Step 3200, Minibatch Loss= 1.0878, Training Acc