# Recurrent Neural Network Example

Build a recurrent neural network (LSTM) with TensorFlow.
- https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb

## RNN Overview

![RNN-unrolled](RNN-unrolled.png)

References:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.

## MNIST Dataset Overview

This example is using MNIST handwritten digits.
The dataset contains 60,000 examples for training and 10,000 examples for testing.
The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1.
For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](mnist_100_digits.png)

<u>To classify images using a recurrent neural network, we consider every image row as a sequence of pixels.
Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.</u>

More info: http://yann.lecun.com/exdb/mnist/

In [1]:
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import rnn

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../../../Datasets/MNIST_data/", one_hot=True)

Extracting ../../../Datasets/MNIST_data/train-images-idx3-ubyte.gz
Extracting ../../../Datasets/MNIST_data/train-labels-idx1-ubyte.gz
Extracting ../../../Datasets/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ../../../Datasets/MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
# Training Parameters
learning_rate = 0.001
training_epochs = 1000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input], name='InputData')
Y = tf.placeholder("float", [None, num_classes], name='LabelData')

In [3]:
# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]), name='Weights')
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]), name='Bias')
}

tf.summary.histogram("weights", weights['out'])
tf.summary.histogram("biases", biases['out'])

<tf.Tensor 'biases:0' shape=() dtype=string>

In [4]:
sess = tf.Session()
#print(sess.run(tf.random_normal([num_hidden, num_classes])))
#print(sess.run(tf.random_normal([num_classes])))
print(weights['out'])
print(biases['out'])

<tf.Variable 'Weights:0' shape=(128, 10) dtype=float32_ref>
<tf.Variable 'Bias:0' shape=(10,) dtype=float32_ref>


In [5]:
def RNN(x, weights, biases):
    with tf.name_scope('Model'):
        # Prepare data shape to match `rnn` function requirements
        # Current data input shape: (batch_size, timesteps, n_input)
        # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

        # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
        """
        x will contain 28 tensors of shape 128 x 28
        So that, 1st tensor will contain 1st row (28 pixels) of 128 images
        2nd tensor will contain 2nd row (28 pixels) of 128 images
        ...
        """
        x = tf.unstack(x, timesteps, 1)

        # Define a lstm cell with tensorflow
        """
        https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell
        BasicLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None)
        The value of "num_units" is it up to you, too high a value may lead to overfitting
        or a very low value may yield extremely poor results.
        But, the shape of "outputs" depends on "num_units"
        So, define the shape of "weights" accourdingly because "outputs[-1]" and "weights" will be multiplied
        """
        lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

        # Get lstm cell output
        """
        https://www.tensorflow.org/api_docs/python/tf/nn/static_rnn
        static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)
        inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.
        """
        outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

        # Linear activation, using rnn inner loop last output
        """
        Last tensor: outputs[-1]
        """
        print("x[-1]: ", x[-1])
        print("outputs: ", outputs)
        print("outputs[-1]: ", outputs[-1])
        print("tf.matmul(outputs[-1], weights['out']) + biases['out']: ", tf.matmul(outputs[-1], weights['out']) + biases['out'])
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

In [6]:
batch_x, batch_y = mnist.train.next_batch(batch_size)
print("batch_x: ", batch_x)
print(batch_x.shape)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, timesteps, num_input)) # 128, 28, 28
print("batch_x: ", batch_x[0][0])
print(batch_x.shape)
# Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
x = tf.unstack(batch_x, timesteps, 1) # 128, 28
print("x: ", x)
print("x[-1]: ", x[-1])
#print(sess.run(x))
lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)


batch_x:  [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
(128, 784)
batch_x:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0.]
(128, 28, 28)
x:  [<tf.Tensor 'unstack:0' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:1' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:2' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:3' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:4' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:5' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:6' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:7' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:8' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:9' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:10' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:11' shape=(128, 28) dtype=float32>, <tf.Tensor 'unstack:12' shape=(128, 28) dtype

In [7]:
# o = tf.random_normal([num_hidden, num_input])
# w = tf.random_normal([28, num_classes])
# b = tf.random_normal([num_classes])

o = tf.random_normal([num_input, num_hidden])
w = tf.random_normal([num_hidden, num_classes])
b = tf.random_normal([num_classes])

mul = tf.matmul(o, w) + b
print(mul)
sess = tf.Session()
print(sess.run(mul))

Tensor("add:0", shape=(28, 10), dtype=float32)
[[ 1.57712336e+01  5.01923609e+00  5.50983572e+00 -1.16721058e+01
   1.59632397e+01 -1.61846423e+00  2.99074411e+00  8.10353374e+00
   8.27149296e+00 -4.55466080e+00]
 [-8.71562481e+00 -1.61945953e+01  1.44565248e+00  5.63215256e-01
   2.02663975e+01 -4.38008785e+00  5.29723120e+00  4.46333599e+00
  -8.26929665e+00  1.79043427e+01]
 [-1.14358540e+01  1.81356583e+01  7.64720774e+00 -5.16752958e+00
   1.61236248e+01 -7.86019611e+00 -7.01574993e+00 -8.87354183e+00
   1.10761900e+01 -5.56854677e+00]
 [ 1.09309778e+01  1.69290829e+00 -1.29086838e+01  1.62330456e+01
  -2.89666233e+01  7.13659668e+00  2.96516180e+00  2.33518353e+01
  -1.07007074e+01  1.14885175e+00]
 [ 4.60101986e+00  1.15139446e+01 -1.55585027e+00 -1.52930870e+01
  -1.07653790e+01  7.06326842e-01  2.83944416e+00  7.92920828e+00
  -4.52332354e+00 -2.53119993e+00]
 [-2.75440264e+00 -6.13933516e+00 -1.55492153e+01  1.08188438e+01
   1.48510218e+00  7.69097805e+00  5.47162247e+00 -1

In [9]:
list_temp = [
    [
        [1, 2, 3], [11, 2, 3]
    ],
    [
        [4, 5, 6], [44, 5, 6]
    ],
    [
        [7, 8, 9], [77, 8, 9]
    ]
] # (3, 2, 3)

"""
https://www.tensorflow.org/api_docs/python/tf/unstack
unstack(value, num, axis)
'num' should be the size of elements in 'axis'
Returns 'num' tensors
"""

x_temp = tf.unstack(list_temp, 2, 1) # 3, 3

print("x_temp: ", x_temp)
sess = tf.Session()
print(sess.run(x_temp))

x_temp:  [<tf.Tensor 'unstack_2:0' shape=(3, 3) dtype=int32>, <tf.Tensor 'unstack_2:1' shape=(3, 3) dtype=int32>]
[array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]]), array([[11,  2,  3],
       [44,  5,  6],
       [77,  8,  9]])]


In [10]:
#         print("x[-1]: ", x[-1])
#         print("outputs: ", outputs)
#         print("outputs[-1]: ", outputs[-1])
#         print("tf.matmul(outputs[-1], weights['out']) + biases['out']: ", tf.matmul(outputs[-1], weights['out']) + biases['out'])

logits = RNN(X, weights, biases) # [timesteps, num_classes]: 28, 10
print("logits: ", logits)
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
with tf.name_scope('Loss'):
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
tf.summary.scalar("loss", loss_op) # Create a summary to monitor loss_op tensor

with tf.name_scope('optimizer'):
    #optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
with tf.name_scope('Accuracy'):
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
tf.summary.scalar("accuracy", accuracy) # Create a summary to monitor accuracy tensor

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

x[-1]:  Tensor("Model/unstack:27", shape=(?, 28), dtype=float32)
outputs:  [<tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_2:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_5:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_8:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_11:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_14:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_17:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_20:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_23:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_26:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_29:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Model/rnn/rnn/basic_lstm_cell/mul_32:0' shape=(?, 128) dtype=float32>, <tf.Tensor 'Mo

In [6]:
import shutil, os
if os.path.exists("rnn_mnist_logs/"):
    shutil.rmtree("rnn_mnist_logs/")

In [7]:
# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)
    
    summary_op = tf.summary.merge_all() # Merge all summaries into a single op
    summary_writer = tf.summary.FileWriter("rnn_mnist_logs/", graph=tf.get_default_graph())
    
    saver = tf.train.Saver(max_to_keep=2)

    for epoch in range(1, training_epochs+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if epoch % display_step == 0 or epoch == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x, Y: batch_y})
            print("Epoch " + str(epoch) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))
            
            # Write logs at every iteration
            summary_str = sess.run(summary_op, feed_dict={X: batch_x, Y: batch_y})
            summary_writer.add_summary(summary_str, epoch)
            
            # Save checkpoint
            saver.save(sess, "rnn_mnist_logs/model-checkpoint", epoch)

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

Epoch 1, Minibatch Loss= 2.5150, Training Accuracy= 0.133
Epoch 200, Minibatch Loss= 0.2169, Training Accuracy= 0.922
Epoch 400, Minibatch Loss= 0.2320, Training Accuracy= 0.930
Epoch 600, Minibatch Loss= 0.0587, Training Accuracy= 0.977
Epoch 800, Minibatch Loss= 0.0694, Training Accuracy= 0.977
Epoch 1000, Minibatch Loss= 0.1639, Training Accuracy= 0.969
Optimization Finished!
Testing Accuracy: 0.984375
