# 7 Recurrent neural networks
In this exercise we will try a simple experiment with a recurrent neural network. One of the well-known recurrent neural network models is the so called Long short-term memory (LSTM) network. More information on LSTM can be found in the text [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/).

## 7.1 The MNIST dataset revisited (1)
In one of the previous exercises the MNIST dataset was used to demonstrate the use of multilayer perceptron. Here we are going to apply a recurrent neural network to the problem of digits classification. To keep it simple, we will use a simple LSTM network that will be fed with one row of the image at a time. With each new row, it will update its states and give its prediction. What we are interested in is its prediction after the last row i.e. after it has the full information.

In [None]:
import tensorflow as tf
from tensorflow.contrib import rnn

from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets("mnist/", one_hot=True)

learning_rate=0.001
training_epochs_count=10
batch_size=100
batches_count=int(mnist.train.num_examples/batch_size)
display_step=1

#we will feed a row at a time to the LSTM and there are 28 rows per image
timesteps=28
#each row has 28 columns whose values are simultaneously passed to LSTM
n_input=28 # MNIST data input (img shape: 28*28)
#the number of hidden states in the LSTM
n_hidden=128
n_classes=10

x=tf.placeholder("float", [None, timesteps, n_input])
y=tf.placeholder("float", [None, n_classes])

#separate the rows separate rows
unstacked=tf.unstack(x, timesteps, 1)

#prepare the LSTM
lstm_cell=rnn.BasicLSTMCell(n_hidden)
#feed the rows iteratively to LSTM
outputs, states=rnn.static_rnn(lstm_cell, unstacked, dtype=tf.float32)
#take the last output (index -1) i.e. the output after the last row and use it for classification
logits=tf.layers.dense(outputs[-1], n_classes)
y_predicted=tf.nn.softmax(logits)

cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

correct_y_predicted=tf.equal(tf.argmax(y_predicted, 1), tf.argmax(y, 1))
accuracy=tf.reduce_mean(tf.cast(correct_y_predicted, tf.float32))

#with such a block we don't need to close the session later - it will be closed automatically
with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for epoch in range(training_epochs_count):
        for i in range(batches_count):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            session.run(optimizer, feed_dict={x:batch_x.reshape((-1, timesteps, n_input)), y:batch_y})
        if ((epoch+1)%display_step==0):
            print("Epoch #"+str(epoch+1)+" "+str(session.run(accuracy, feed_dict={x: mnist.test.images.reshape((-1, timesteps, n_input)), y: mnist.test.labels})))


**Tasks**

1. Study and run the code below.
2. Readjust the parameters in order to make the model obtain acuracy above 0.99 on the test set.
3. Draw a plot that shows the relation between the number of rows given to the network and its final accuracy on the test set.
4. What happens if we use gradient descent instead of Adam?

NOTE: when you want to restart the code, first shutdown the kernel e.g. by choosing Kernel in the menu above and then Restart & Clear Output.