# LSTM Tutorial

In this notebook we will train a simple LSTM network in TensorFlow to classify digits from famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).
The code is based on the work of Aymeric Damien (https://github.com/aymericdamien/TensorFlow-Examples).

There will be blanks left all over the code for you to fill out. If you ever get stuck, there's also a complete version in the repository.

Ready? **Let's go!**

Let's start by importing the needed packages.

In [None]:
from __future__ import print_function

import tensorflow as tf
import numpy as np
import random

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

random.seed(1)
np.random.seed(1)
tf.set_random_seed(1)

## Download MNIST dataset

Fortunately, TensorFlow can download MNIST dataset for us automatically. It is also already divided into training, validation and test sets. We can also specify right away to transform the labels to one-hot form, which is needed for a softmax classifier. 

If you're not familiar with one-hot representation, read this great Quora answer (https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science). 


The process may take a while.

In [None]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

## Check training and test set

It is always a good idea to check the data we work with.

#### Number of train/test examples

Let's start with checking the number of examples we have available, both in training and test sets (we will not be using validation set in this tutorial). To do this, explore the ```mnist``` object.

_(If you're not familiar with jupyter notebooks, you can do this by typing ```mnist.``` and pressing **Tab** to see the member fields and methods.)_

In [None]:
### [TASK] Get number of train/test examples
train_num_examples = None
test_num_examples = None
###
print('Number of train examples: {0}'.format(train_num_examples))
print('Number of train examples: {0}'.format(test_num_examples))

If you did this correctly, you should get below values:

Name | Value
--- | --- 
Number of train examples     | 55000 
Number of test examples      | 10000 

#### Image shape

So we know how many samples we have, but what about a single image?

Let's start by getting a single example and checking its shape. It doesn't matter if it comes from train or test set.

In [None]:
### [TASK] Get a random example from MNIST dataset
mnist_example = None
###

print('Single example shape: {0}'.format(mnist_example.shape))

Name | Value
--- | --- 
Single example shape     | (784,) 

#### Vector to matrix

As you can see, the image is in a vector form. It's a common practice to reshape images to this form, since it's easier to process them like this on a computer.

That being said, we'd like to see what it looks like and for this we need to get the image to its original form. We have to use the knowledge of the MNIST dataset to achieve this: each digit is a square, black-and-white image (1 channel, not 3 as with normal images).

With that knowledge in mind, lets use the reshape function to change ```mnist_example``` to a square matrix form.

In [None]:
### [TASK] Reshape mnist_example to square matrix.
mnist_example = None
###

print('MNIST digit shape: {0}'.format(mnist_example.shape))

Name | Value
--- | --- 
MNIST digit shape     | (28, 28) 

#### Show a random MNIST training image

Now that we know how to restore digits to the original form, let's write a simple function to sample a random example from training set and show it using matplotlib. Your job is to retrive a correct label for the randomly chosen image (watch out for a one-hot vector!).

In [None]:
def show_random_mnist_train_example(mnist):
    """Draws a random training image from MNIST dataset and displays it.
    
    Args:
        mnist: MNIST dataset.
    """
    random_idx = random.randint(0, mnist.train.num_examples)
    image = mnist.train.images[random_idx].reshape(28, 28)
    imgplot = plt.imshow(image, cmap='Greys')
    ### [TASK] Get a correct label for the image 
    label = None
    ###
    print('Correct label for image #{0}: {1}'.format(random_idx, label))

show_random_mnist_train_example(mnist)

Name | Value
--- | --- 
Correct label     | 7 

## Model

We will be using a simple LSTM network with `timesteps` inputs of size `num_input` (slices of image, see preprocessing). The output of a final unrolled cell will be then fed through a simple linear layer to get a required number of outputs (`num_classes` = 10 for MNIST). Finally we will use a softmax cross-entropy loss to fit our model.

Here's a picture representing our model on 5x5 image.

![model_mnist_lstm](img/model_mnist_lstm.png)

### Preprocessing

Since RNN requires a sequence of inputs and we have only one input (image), we have to divide the image into smaller parts. We will do this by moving a small window of arbitrary size (kernel) through the image. For each position of the kernel, we take the corresponding patch of the image, reshape it to vector (for training ease) and append in to the list of kernels.

We define 2 functions to do this:<br>
    1) **get_kernels** - retrieves all patches from image using a kernel of shape `kernel_shape`, which is moved by `stride` pixels in x and y direction at each step; if a kernel is partially outside of the image, zero-padding is used <br>
    2) **get_batch_kernels** - does the same as get_kernels, but for the batch of vector-shaped images
    
Your task is to write get_batch_kernels function, using get_kernels.

In [None]:
def get_patches(image, kernel_shape, stride):
    """Get all patches from image using a moving kernel.
    
    Args:
        image (matrix): Matrix of arbitrary shape.
        kernel_shape (tuple): The shape of the kernel.
        stride (tuple): Number of units to move the kernel in x and y.
        
    Returns:
        np.array: Array of vector-shaped patches.
    """
    image_h, image_w = image.shape
    kernel_h, kernel_w = kernel_shape
    stride_h, stride_w = stride
    h = w = 0
    patches = []
    while h < image_h:
        w = 0
        while w < image_w:
            patch = image[h:(h + kernel_h), w:(w + kernel_w)]
            if patch.shape != (kernel_h, kernel_w):
                # use zero padding
                patch = np.pad(patch, ((0, kernel_h - patch.shape[0]), 
                                         (0, kernel_w - patch.shape[1])), 'constant')
            patches.append(patch.reshape(-1))
            w += stride_w
        h += stride_h
    return np.array(patches)

def get_batch_patches(batch, kernel_shape, stride, image_shape=(28, 28)):
    """Gets patches from a batch of images.
    
    Args:
        batch (matrix): Matrix of images, shape [batch_size, image_vector_size].
        kernel_shape (tuple): The shape of the kernel.
        stride (tuple): Number of units to move the kernel in x and y.
        image_shape (tuple): Shape of a single image in matrix form.
    
    Returns:
        np.array: Array of patches from the batch.
    """
    batch_out = []
    ### [TASK] Retrieve patches for every image in batch:
    for image in batch:
        batch_out.append(None)
    ###
    return np.array(batch_out)


batch_patches = get_batch_patches(mnist.train.images[:128], kernel_shape=[4, 4], stride=[4, 4])
print('Number of batch samples: {0}'.format(len(batch_patches)))
print('Batch of patches shape: {0}'.format(batch_patches.shape))

Name | Value
--- | --- 
Number of batch samples    | 128 
Batch of patches shape     | (128, 49, 16) 

For the purpose of this notebook, we will use patches of shape (28, 1). In other words, we will divide each image into collumns, receiving 28 timesteps, each having 28 elements vector as an input. Run cell below to calculate the number of timesteps and input size. 

After finishing this notebook you should also try playing around with these parameters to find an optimal kernel shape and stride.

In [None]:
kernel_shape = (28, 1)
stride = (28, 1)
timesteps, num_input = get_patches(mnist.train.images[0].reshape(28, 28), kernel_shape, stride).shape
print('Timesteps: {0}'.format(timesteps))
print('Input vector size: {0}'.format(num_input))

Name | Value
--- | --- 
Timesteps     | 28 
Input vector size     | 28 

#### Config
It is often a good idea to put all configurable parameters in one dictionary. Run cell below to do this.

In [None]:
config = {'num_input': num_input, 'timesteps': timesteps, 'kernel_shape': kernel_shape, 'stride': stride,
          'num_hidden': 128, 'num_classes': 10, 'learning_rate': 0.001, 'training_steps': 10000, 
          'batch_size': 128, 'display_step': 200}

### Model class

Let's create a class representing our model. It is able to do the following:<br>
    1) Perform forward propagation.  
    2) Compute loss.  
    3) Minimize loss function (train).  
    4) Calculate accuracy.  
   
Note: Methods where moved outside of the class, for the ease of testing them. Normally, they would be inside the class.

In [None]:
class MnistLstmModel(object):
    def __init__(self, config):
        # Retrieve config parameters
        self._config = config
        timesteps, num_input, num_hidden, num_classes, learning_rate = config['timesteps'], \
            config['num_input'], config['num_hidden'], config['num_classes'], config['learning_rate']
        
        self.X, self.Y = get_placeholders(timesteps, num_input, num_classes)
        self.logits = forward_propagation(self.X, timesteps, num_hidden, num_classes)
        self.loss = compute_loss(self.logits, self.Y)
        self.train_op = optimize(self.loss, learning_rate)
        self.accuracy, self.predictions = predict(self.logits, self.Y)

#### Create placeholders

Before we do anything, we must provide space for the input features and output labels. That is, tensorflow must know before computation begins, what the shape of input and output is, to allocate enough space. We can do this using tf.placeholders:
```python
    p = tf.placeholder(type, shape)
```
where type is usually set to ```tf.float32``` and shape is a list specyfing the final shape. Please note that the model will be given a batch of images at each iteration, not 1 or all of them. We do this by adding a ```None``` value to ```shape``` at the first position, e.g. for a single input of shape ```[a, b]```, batch input placeholder will be of shape ```[None, a, b]```.

Please create placeholders for X and Y of MNIST dataset (type: float).

In [None]:
def get_placeholders(timesteps, num_input, num_classes):
    ### [TASK] Create placeholders for input and output of MNIST dataset
    x = None
    y = None
    ###
    return x, y

After you're done, run below cell to test your code.

In [None]:
### get_placeholders test

tf.reset_default_graph()
X, Y = get_placeholders(config['timesteps'], config['num_input'], config['num_classes'])
print ("X = {0}".format(X))
print ("Y = {0}".format(Y))

Name | Value
--- | --- 
X | Tensor("Placeholder:0", shape=(?, 28, 28), dtype=float32) 
Y | Tensor("Placeholder_1:0", shape=(?, 10), dtype=float32) 

#### Forward propagation

Forward pass consists of 2 steps:  
1) LSTM forward pass.  
2) Linear layer forward pass, converting LSTM output to ```num_classes``` vector for loss computation. Only the last output is taken into consideration.

In [None]:
def forward_propagation(x, timesteps, num_hidden, num_classes):
    """Forward pass."""
    outputs, states = get_lstm(x, timesteps, num_hidden)
    logits = get_linear_layer(X=outputs[-1], num_in=num_hidden, num_out=num_classes)
    return logits

#### LSTM

Let's start with the LSTM part. We have to create a multi layer LSTM cell. You can do this in TF in 2 steps. <br>
1) Get a list of cells. Cell in this context is a single LSTM layer. To do this use ```tf.contrib.rnn.BasicLSTMCell```. <br>
2) Use the list of cells to create ```tf.contrib.MultiRNNCell```, which basically assembles cells into a single model. <br>

Afterwards, we use ```tf.contrib.rnn.static_rnn``` to connect our LSTM model with input.

Note: ```tf.contrib.rnn.static_rnn``` requires input to be in time-major form, i.e. instead of a tensor of shape ```[batch_size, timesteps, num_input]```, we have to get _timesteps_ number of tensors of shape ```[batch_size, num_input]```. We can do this by using ```tf.unstack```.

In [None]:
def get_lstm(x, timesteps, num_hidden, num_layers=1):
    cells = []
    for _ in range(num_layers):
        ### [TASK] Append a basic LSTM cell.
        cells.append(None)
        ###
    ### [TASK] Construct a multi RNN cell.
    lstm_cell = None
    ###
    
    ### [TASK] Unstack input x.
    x = None
    ###
    
    ### [TASK] Create static RNN of cells.
    outputs, states = None
    ###
    return outputs, states

After you're done, run below cell to test your code.

In [None]:
### get_lstm test

tf.reset_default_graph()
X, _ = get_placeholders(config['timesteps'], config['num_input'], config['num_classes'])
outputs, states = get_lstm(X, config['timesteps'], config['num_hidden'])
print("Outputs shape: {0}".format(len(outputs)))
print("States type: {0}".format(type(states[0])))

Name | Value
--- | --- 
Outputs shape | 28 
States type   | &lt;class 'tensorflow.python.ops.rnn_cell_impl.LSTMStateTuple'> 

#### Linear layer

We need to convert outputs of last LSTM to a num_classes sized vector. To do this, we will create a linear layer of weights and biases. You can use ```tf.get_variable``` to create weights and bias tensors, specifying shape and initializer. For this example, please use ```tf.random_normal_initializer(seed=1)``` (seed=1 to get the same results).

In [None]:
def get_linear_layer(X, num_in, num_out):
    ### [TASK] Create weights and bias variables and compute X*W + b
    W = None
    b = None
    output = None
    ###
    return output

After you implemented get_linear_layer, forward_propagation should work fine.

In [None]:
### forward_propagation test
tf.reset_default_graph()
X = tf.placeholder(tf.float32, (100, 200))
output = get_linear_layer(X, 200, 10)
print("Linear layer output: {0}".format(output))

# forward_propagation test
tf.reset_default_graph()
X, _ = get_placeholders(config['timesteps'], config['num_input'], config['num_classes'])
logits = forward_propagation(X, config['timesteps'], config['num_hidden'], config['num_classes'])
print("Logits: {0}".format(logits))

Name | Value
--- | --- 
Linear layer output | Tensor("add:0", shape=(100, 10), dtype=float32) 
Logits              | Tensor("add:0", shape=(?, 10), dtype=float32) 

#### Loss

Let's use ```tf.nn.softmax_cross_entropy_with_logits``` as our loss function. Please remember to return the mean value of the loss across the whole batch.

In [None]:
def compute_loss(logits, labels):
    ### [TASK] Calculate softmax cross-entropy loss (cost).
    loss = None
    ###
    return loss

After you're done, run below cell to test your code.

In [None]:
# compute_loss test
tf.reset_default_graph()
X, Y = get_placeholders(config['timesteps'], config['num_input'], config['num_classes'])
logits = forward_propagation(X, config['timesteps'], config['num_hidden'], config['num_classes'])
loss = compute_loss(logits, Y)
print("Loss: {0}".format(loss))
print("Loss shape: {0}".format(loss.shape))

Name | Value
---  | --- 
Loss       | Tensor("Mean:0", shape=(), dtype=float32) 
Loss shape | () 

#### Optimizer

Let's minimize our loss function with ```tf.train.GradientDescentOptimizer```.

In [None]:
def optimize(loss, learning_rate):
    ### [TASK] Minimize loss function using SGD.
    train_op = None
    return train_op

After you're done, run below cell to test your code.

In [None]:
# optimize test
tf.reset_default_graph()
X, Y = get_placeholders(config['timesteps'], config['num_input'], config['num_classes'])
logits = forward_propagation(X, config['timesteps'], config['num_hidden'], config['num_classes'])
loss = compute_loss(logits, Y)
train_op = optimize(loss, config['learning_rate'])
print(train_op)

Name | Value
---  | --- 
name | "GradientDescent"
op | "NoOp"
input | "^GradientDescent/update_rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel/ApplyGradientDescent"
input | "^GradientDescent/update_rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias/ApplyGradientDescent"
input | "^GradientDescent/update_weights_out/ApplyGradientDescent"
input | "^GradientDescent/update_bias_out/ApplyGradientDescent"


#### Prediction

We want to know how well our model is performing in human-readable form. We will use a simple accuracy metric for this purpose:  
1) Calculate softmax with ```tf.nn.softmax```  
2) Get predicted (most probable) labels. (_hint: use ```tf.argmax```_)  
3) Get a vector of correct predictions: if correct_pred[i] = 1, model predicted label correctly. (_hint: use ```tf.equal```_)  
4) Get a mean value of above to calculate accuracy (_hint: you may need to convert tensorf to ```tf.float32``` first_).

In [None]:
def predict(logits, labels):
    # Convert labels to numerical values instead of one-hot vector
    correct_labels = tf.argmax(labels, 1)
    ### [TASK] Calcuate prediction accuracy
    y_pred = None  # 1) Calculate softmax.
    predictions = None  # 2) Get predicted labels.
    correct_pred = None  # 3) Check if predictions are correct.
    accuracy = None  # 4) Average across correct predictions.
    ###
    return accuracy, predictions

After you're done, run below cell to test your code.

In [None]:
# predict test
tf.reset_default_graph()
X, Y = get_placeholders(config['timesteps'], config['num_input'], config['num_classes'])
logits = forward_propagation(X, config['timesteps'], config['num_hidden'], config['num_classes'])
accuracy, predictions = predict(logits, Y)
print("Accuracy: {0}".format(accuracy))
print("Predictions: {0}".format(predictions))

Name | Value
---  | --- 
Accuracy | Tensor("Mean:0", shape=(), dtype=float32)
Predictions | Tensor("ArgMax_1:0", shape=(?,), dtype=int64)

## Training

Let's train our model and see how well it performs on MNIST dataset!

In [None]:
tf.reset_default_graph()

tf.set_random_seed(1)
model = MnistLstmModel(config)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:
    sess.run(init)
    losses = []
    accs = []
    log_steps = []
    # config['training_steps'] = 200
    for step in range(1, config['training_steps'] + 1):
        batch_x, batch_y = mnist.train.next_batch(config['batch_size'])
        batch_x = get_batch_patches(batch_x, config['kernel_shape'], config['stride'])
        sess.run(model.train_op, feed_dict={model.X: batch_x, model.Y: batch_y})
        if step % config['display_step'] == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([model.loss, model.accuracy], feed_dict={model.X: batch_x,
                                                                          model.Y: batch_y})
            losses.append(loss)
            accs.append(acc)
            log_steps.append(step)
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")
    
    # plot the loss
    plt.plot(log_steps, losses)
    plt.ylabel('loss')
    plt.xlabel('iterations')
    plt.title('Loss')
    plt.show()
    
    # plot the accuracy
    plt.plot(log_steps, accs)
    plt.ylabel('accuracy')
    plt.xlabel('iterations')
    plt.title('Train accuracy')
    plt.show()

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_images = mnist.test.images[:test_len]
    test_data = get_batch_patches(test_images, config['kernel_shape'], config['stride'])
    test_labels = mnist.test.labels[:test_len]
    acc, predictions = sess.run([model.accuracy, model.predictions], 
                                 feed_dict={model.X: test_data, model.Y: test_labels})
    print("Testing Accuracy: {0}".format(acc))
            

Name | Value
--- | --- 
Step 1 | Minibatch Loss= 3.1658, Training Accuracy= 0.133 
Step 200 | Minibatch Loss= 2.1227, Training Accuracy= 0.367 
... | ... 
Step 10000 | Minibatch Loss= 0.2749, Training Accuracy= 0.922 
Testing Accuracy | 0.96875 

Let's see where our model fails.

In [None]:
correct_labels = np.argmax(test_labels, 1)
for i in range(len(predictions)):
    if predictions[i] != correct_labels[i]:
        print("Prediction: {0}, correct label: {1}".format(predictions[i], correct_labels[i]))
        image = test_images[i].reshape(28, 28)
        imgplot = plt.imshow(image, cmap='Greys')
        plt.show()

# Congratulations!

You finished the assignment. Feel free to play around and experiment with hyperparameters.