# Recurrent Neural Network Example

Build a recurrent neural network (LSTM) with TensorFlow.

## RNN Overview

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" alt="nn" style="width: 600px;"/>

References:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.

## MNIST Dataset Overview

This example is using MNIST handwritten digits.
The dataset contains 60,000 examples for training and 10,000 examples for testing.
The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1.
For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

<u>To classify images using a recurrent neural network, we consider every image row as a sequence of pixels.
Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.</u>

More info: http://yann.lecun.com/exdb/mnist/

In [1]:
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import rnn

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/", one_hot=True)

Extracting /data/train-images-idx3-ubyte.gz
Extracting /data/train-labels-idx1-ubyte.gz
Extracting /data/t10k-images-idx3-ubyte.gz
Extracting /data/t10k-labels-idx1-ubyte.gz


In [2]:
# Training Parameters
learning_rate = 0.001
training_epochs = 1000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input], name='InputData')
Y = tf.placeholder("float", [None, num_classes], name='LabelData')

In [3]:
# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]), name='Weights')
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]), name='Bias')
}

tf.summary.histogram("weights", weights['out'])
tf.summary.histogram("biases", biases['out'])

<tf.Tensor 'biases:0' shape=() dtype=string>

In [4]:
def RNN(x, weights, biases):
    with tf.name_scope('Model'):
        # Prepare data shape to match `rnn` function requirements
        # Current data input shape: (batch_size, timesteps, n_input)
        # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

        # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
        """
        x will contain 28 tensors of shape 128 x 28
        So that, 1st tensor will contain 1st row (28 pixels) of 128 images
        2nd tensor will contain 2nd row (28 pixels) of 128 images
        ...
        """
        x = tf.unstack(x, timesteps, 1)

        # Define a lstm cell with tensorflow
        lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

        # Get lstm cell output
        """
        https://www.tensorflow.org/api_docs/python/tf/nn/static_rnn
        static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)
        inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.
        """
        outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

        # Linear activation, using rnn inner loop last output
        """
        Last tensor: outputs[-1]
        """
        #     print("x[-1]: ", x[-1])
        #     print("outputs: ", outputs)
        #     print("outputs[-1]: ", outputs[-1])
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

In [5]:
logits = RNN(X, weights, biases) # batch_size, 10
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
with tf.name_scope('Loss'):
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
tf.summary.scalar("loss", loss_op) # Create a summary to monitor loss_op tensor
with tf.name_scope('optimizer'):
    #optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
with tf.name_scope('Accuracy'):
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
tf.summary.scalar("accuracy", accuracy) # Create a summary to monitor accuracy tensor

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

In [6]:
import shutil, os
if os.path.exists("mnist_rnn_logs/"):
    shutil.rmtree("mnist_rnn_logs/")

In [7]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)
    
    summary_op = tf.summary.merge_all() # Merge all summaries into a single op
    summary_writer = tf.summary.FileWriter("mnist_rnn_logs/", graph=tf.get_default_graph())
    
    saver = tf.train.Saver(max_to_keep=2)

    for epoch in range(1, training_epochs+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if epoch % display_step == 0 or epoch == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x, Y: batch_y})
            print("Epoch " + str(epoch) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))
            
            # Write logs at every iteration
            summary_str = sess.run(summary_op, feed_dict={X: batch_x, Y: batch_y})
            summary_writer.add_summary(summary_str, epoch)
            
            # Save checkpoint
            saver.save(sess, "mnist_rnn_logs/model-checkpoint", epoch)

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

Epoch 1, Minibatch Loss= 2.1852, Training Accuracy= 0.125
Epoch 200, Minibatch Loss= 0.2927, Training Accuracy= 0.883
Epoch 400, Minibatch Loss= 0.1874, Training Accuracy= 0.922
Epoch 600, Minibatch Loss= 0.0606, Training Accuracy= 0.992
Epoch 800, Minibatch Loss= 0.0818, Training Accuracy= 0.961
Epoch 1000, Minibatch Loss= 0.0303, Training Accuracy= 0.992
Optimization Finished!
Testing Accuracy: 0.984375
