# Sequential MNIST using RNNs

* To classify images using an **RNN**, we consider **every image row** as a **sequence of pixels**. 
* Because MNIST image shape is 28*28px, we will then handle **28 sequences of 28 timesteps** for every sample.

## System Information

In [1]:
from pathlib import Path
import random 
from datetime import datetime

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np
import data

### Load Data

#### Step 1: Read in data

In [2]:
batch_size=64
mnist_folder = '../data/mnist'
data.download_mnist(mnist_folder)
train, val, test = data.read_mnist(mnist_folder, flatten=False)

# Add validatation set into training as we don't need it for this example
new_train_images = np.concatenate([train[0],val[0]],axis=0)
new_train_labels = np.concatenate([train[1],val[1]],axis=0)
train = (new_train_images,new_train_labels)
del val

# reshape
train = (train[0].reshape(-1,784),train[1])
test = (test[0].reshape(-1,784),test[1])

../data/mnist/train-images-idx3-ubyte.gz already exists
../data/mnist/train-labels-idx1-ubyte.gz already exists
../data/mnist/t10k-images-idx3-ubyte.gz already exists
../data/mnist/t10k-labels-idx1-ubyte.gz already exists


In [3]:
print("Shape of:")
print("- Training-set:\t\t{}".format(train[0].shape))
print("- Test-set:\t\t{}".format(test[0].shape))
print("- Training-labels:\t\t{}".format(train[1].shape))
print("- Test-set-labels:\t\t{}".format(test[1].shape))

Shape of:
- Training-set:		(60000, 784)
- Test-set:		(10000, 784)
- Training-labels:		(60000, 10)
- Test-set-labels:		(10000, 10)


### Create Batch Iterators

In [4]:
def batch(iterable, batch_size=1):
    num_samples = len(iterable)
    for index in range(0, num_samples, batch_size):
        yield iterable[index:min(index + batch_size, num_samples)]        

def batch_sequences(iterable, batch_size=1,timesteps=28,num_input=28):
    num_samples = len(iterable)
    for index in range(0, num_samples, batch_size):
        batch = iterable[index:min(index + batch_size, num_samples)]
        # Reshape data to get 28 sequences of 28 elements
        yield batch.reshape((-1, timesteps, num_input))

In [5]:
# for x in batch(test[0][:10], 3):
#     print(x.shape)

# print()
    
# for x in batch_sequences(test[0][:10], 3):
#     print(x.shape)
    
# print()
    
# for x in batch(test[1][:10], 3):
#     print(x.shape)

#### Create our iterators
* Actually, as you'll see in the training loop, we're going to manually re-initialize them every epoch.
* This is meant to be illustrative only. For best performance, use the dataset API.

In [6]:
batch_x = batch_sequences(train[0],batch_size=batch_size)
batch_y = batch(train[1],batch_size=batch_size)

In [7]:
#print(next(batch_x).shape, next(batch_y).shape)

### Overview
Every example from the MNIST dataset is a 28x28 image. We will apply an RNN two ways:
* **Row-by-row:** The RNN cells are seeing the ith row of the image in the ith step, that is, a vector of size 28. The total number of time steps is 28.
* **Pixel-by-pixel:** The RNN cells are seeing the ith pixel (a single number, row-first order) in the ith step. The total number of time steps is 28*28 = 784.
* The pixel-by-pixel case is a lot harder because a decent model has to keep a very long-term memory.

We’re going to build four models (two models for each case):
* First we use a BasicLSTMCell class to build the LSTM layer.
* Then refactor the first model by replacing BasicLSTMCell with LSTMBlockCell, and add some scaffoldding that should help us debug and tune the model later.
* We can further increase the speed of the LSTM layer by using CudnnGRU instead (only for GPU so using GRUCell as example instead), as running long sequences from the pixel-by-pixel approach will drag down performance significantly. Tensorboard support is also added.
* Finally we use the exact same GRU model on permuted sequential MNIST data, which shuffles the order of the pixels and makes the problem even harder.

### Basic LSTM Model

In [8]:
# Training Parameters
learning_rate = 0.001
#training_steps = 10000
epochs = 40
batch_size = 128
display_step = batch_size * 3

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 256 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

In [9]:
# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]))
}

In [10]:
def RNN(x, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, timesteps, 1)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

In [11]:
logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

In [12]:
display_epoch = 1
num_batches = len(train[0])//batch_size
print("num batches: ", num_batches)

# Start training
with tf.Session() as sess:
    
    # Run the initializer
    sess.run(init)

    for epoch in range(epochs):
        batch_x = batch_sequences(train[0],batch_size=batch_size)
        batch_y = batch(train[1],batch_size=batch_size)        
        
        for step in range(1, num_batches):
            next_image = next(batch_x)
            next_label = next(batch_y)
            sess.run(train_op, feed_dict={X: next_image, Y: next_label})

        if epoch % display_epoch == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: next_image,
                                                                 Y: next_label})

            print("Epoch " + str(epoch) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = test[0][:test_len].reshape((-1, timesteps, num_input))
    test_label = test[1][:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

num batches:  468
Epoch 0, Minibatch Loss= 1.7356, Training Accuracy= 0.398
Epoch 1, Minibatch Loss= 1.4091, Training Accuracy= 0.500
Epoch 2, Minibatch Loss= 1.1588, Training Accuracy= 0.633
Epoch 3, Minibatch Loss= 0.9911, Training Accuracy= 0.680
Epoch 4, Minibatch Loss= 0.8732, Training Accuracy= 0.719
Epoch 5, Minibatch Loss= 0.7834, Training Accuracy= 0.758
Epoch 6, Minibatch Loss= 0.7127, Training Accuracy= 0.758
Epoch 7, Minibatch Loss= 0.6561, Training Accuracy= 0.789
Epoch 8, Minibatch Loss= 0.6102, Training Accuracy= 0.781
Epoch 9, Minibatch Loss= 0.5723, Training Accuracy= 0.805
Epoch 10, Minibatch Loss= 0.5406, Training Accuracy= 0.812
Epoch 11, Minibatch Loss= 0.5136, Training Accuracy= 0.805
Epoch 12, Minibatch Loss= 0.4900, Training Accuracy= 0.820
Epoch 13, Minibatch Loss= 0.4685, Training Accuracy= 0.820
Epoch 14, Minibatch Loss= 0.4484, Training Accuracy= 0.836
Epoch 15, Minibatch Loss= 0.4291, Training Accuracy= 0.844
Epoch 16, Minibatch Loss= 0.4104, Training Accur

KeyboardInterrupt: 