A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.

Recurrent networks are distinguished from feedforward networks by that feedback loop, ingesting their own outputs moment after moment as input. It is often said that recurrent networks have memory. Adding memory to neural networks has a purpose: There is information in the sequence itself, and recurrent nets use it to perform tasks that feedforward networks can’t.

Here's a great article: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

In [1]:
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

In [2]:
#Import MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
#Setting up tensforflow parameters
learning_rate =  0.001
training_iterations = 100000
batch_size = 128

#Network Paramaters
n_hidden = 128 #number of nodes in hidden layer
n_steps = 28 #number of times to iterate over self
n_inputs = 28 #MNIST data 28pixels for each image
n_classes = 10 #Output can be 1 of the 10 numbers


In [4]:
#TF graph inputs

#mnist images are of shape 28*28 = 784
#We do not know the number of images/ they will be different for train and test thus None, steps, number inputs
x = tf.placeholder("float", [None, n_steps, n_inputs])

#y, i.e. the result can be any number from 0 to 9 in onehot encoded format
y = tf.placeholder("float", [None, n_classes]) 

In [5]:
#Defining weights and bias
W = tf.Variable(tf.random_normal([n_hidden, n_classes]))
b = tf.Variable(tf.random_normal([n_classes]))

In [6]:
def rnn_model(xs, w, b):
    """
    The LSTM (Long/short term memory RNN) model I plan on building requires all data
    to be passed as parameters. Shape of the data needs to a tensor of the shape of 
    n_steps*batch_size, number of inputs
    """
    x = tf.transpose(xs, [1, 0, 2]) #makes the oth index 1, 1th index o and 2nd index as is
    x = tf.reshape(x, [-1, n_inputs]) #columns are the number of images and automaticlly 
    #decide the number of rows
    
    ##Splitting images in bulk of number of steps 
    ## Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    x = tf.split(x, n_steps, 0)
    
    lstm_obj = rnn.BasicLSTMCell(n_hidden, forget_bias = 1.0)
    outputs, states = rnn.static_rnn(lstm_obj, x, dtype = tf.float32)
    
    #Liear activation on last loop for output
    return tf.add(tf.matmul(outputs[-1], w), b)


In [7]:
pred = rnn_model(x, W, b)

In [8]:
#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

#Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

#initilaization
init = tf.global_variables_initializer()

In [9]:
with tf.Session() as sess:
    sess.run(init)
    step = 1
    
    #Train till maximum iterations
    while step*batch_size < training_iterations:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        
        #Reshape data to get 28 sequences of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_inputs))
        sess.run(optimizer, feed_dict = {x:batch_x, y: batch_y})
        
        if step%10 == 0:
            acc = sess.run(accuracy, feed_dict = {x: batch_x, y:batch_y})
            loss = sess.run(cost, feed_dict = {x:batch_x, y:batch_y})
            print "Iteration: ", str(step*batch_size), "loss: ", loss, "Accuracy: ", acc
    
        step += 1
    
    print "Done Training"
    
    test_data = mnist.test.images[:256].reshape((-1, n_steps, n_inputs))
    test_label = mnist.test.labels[:256]
    print "Test Accuracy:", sess.run(accuracy, feed_dict = {x: test_data, y: test_label})

Iteration:  1280 loss:  1.91178 Accuracy:  0.367188
Iteration:  2560 loss:  1.70376 Accuracy:  0.421875
Iteration:  3840 loss:  1.3589 Accuracy:  0.5
Iteration:  5120 loss:  1.17619 Accuracy:  0.59375
Iteration:  6400 loss:  0.887656 Accuracy:  0.71875
Iteration:  7680 loss:  1.30655 Accuracy:  0.585938
Iteration:  8960 loss:  0.983267 Accuracy:  0.648438
Iteration:  10240 loss:  0.750213 Accuracy:  0.75
Iteration:  11520 loss:  0.465928 Accuracy:  0.890625
Iteration:  12800 loss:  0.737447 Accuracy:  0.726562
Iteration:  14080 loss:  0.684177 Accuracy:  0.765625
Iteration:  15360 loss:  0.382037 Accuracy:  0.90625
Iteration:  16640 loss:  0.451465 Accuracy:  0.898438
Iteration:  17920 loss:  0.312327 Accuracy:  0.882812
Iteration:  19200 loss:  0.343009 Accuracy:  0.835938
Iteration:  20480 loss:  0.176603 Accuracy:  0.960938
Iteration:  21760 loss:  0.486304 Accuracy:  0.84375
Iteration:  23040 loss:  0.237851 Accuracy:  0.921875
Iteration:  24320 loss:  0.385988 Accuracy:  0.859375
