[View in Colaboratory](https://colab.research.google.com/github/sruan2/TCN/blob/master/sequential_mnist.ipynb)

# Simple RNN Models for Sequential MNIST with Tensorflow

Based on the work of [Aymeric Damien](https://github.com/aymericdamien/TensorFlow-Examples/) and [Sungjoon](https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/rnn_mnist_simple.ipynb)

## RNN Overview

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" alt="nn" style="width: 600px;"/>

References:
- [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf), Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.

## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.

More info: http://yann.lecun.com/exdb/mnist/

## System Information

In [1]:
!pip install watermark



In [1]:
from pathlib import Path
import random 
from datetime import datetime

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./TCN/mnist_pixel/data/mnist/raw", one_hot=True)

  from ._conv import register_converters as _register_converters
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Instructions for updating:
Please use tf.data to implement this functionality.


Extracting ./TCN/mnist_pixel/data/mnist/raw/train-images-idx3-ubyte.gz
Extracting

Instructions for updating:
Please use tf.one_hot on tensors.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


 ./TCN/mnist_pixel/data/mnist/raw/train-labels-idx1-ubyte.gz
Extracting ./TCN/mnist_pixel/data/mnist/raw/t10k-images-idx3-ubyte.gz
Extracting ./TCN/mnist_pixel/data/mnist/raw/t10k-labels-idx1-ubyte.gz


In [2]:
%load_ext watermark
%watermark -v -m -p tensorflow,numpy -g

CPython 3.6.3
IPython 4.0.1

tensorflow 1.8.0
numpy 1.14.3

compiler   : GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)
system     : Darwin
release    : 16.7.0
machine    : x86_64
processor  : i386
CPU cores  : 4
interpreter: 64bit
Git hash   : 00e4561211bf03416c7914b7c7a60219b71160de


## Row-by-row Sequential MNIST

How RNN model for row-by-row sequential MNIST works from [Sungjoon's notebook](https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/rnn_mnist_simple.ipynb):
![](https://raw.githubusercontent.com/sjchoi86/Tensorflow-101/582cc1d946f0ecbce078e493b8ccb1d7b40684df/notebooks/images/etc/rnn_mnist_look.jpg)

### Basic LSTM Model
Directly taken from [Aymeric Damien's notebook](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb).

In [2]:
# Training Parameters
learning_rate = 1e-3
training_steps = 10000
batch_size = 64
display_step = 100

# Network Parameters
num_input = 1 # MNIST data input (img shape: 28*28)
timesteps = 28*28 # timesteps
num_hidden = 130 # hidden layer num of features as in TCN paper
num_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

In [3]:
# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]))
}

In [4]:
def RNN(x, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, timesteps, 1)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

In [5]:
logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
#optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
#train_op = optimizer.minimize(loss_op)

# Change optimizer to RMSProp and add gradient crop
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
gvs = optimizer.compute_gradients(loss_op)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)


# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



In [6]:
# count total number of parameters
total_parameters = 0
for variable in tf.trainable_variables():
    # shape is an array of tf.Dimension
    shape = variable.get_shape()
    #print(shape)
    #print(len(shape))
    variable_parameters = 1
    for dim in shape:
        #print(dim)
        variable_parameters *= dim.value
    #print(variable_parameters)
    total_parameters += variable_parameters
print(total_parameters)

69950


In [7]:
# another way to count total number of parameters
np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])

69950

In [8]:
# format test data
test_data = mnist.test.images.reshape((-1, timesteps, num_input))
test_label = mnist.test.labels
f = open('test_acc_basic_lstm_sequential_mnist_784x1.txt', 'w')

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x, Y: batch_y})
            test_acc = sess.run(accuracy, feed_dict={X: test_data,Y: test_label})
                    
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.4f}".format(acc)+ \
                  ", Testing Accuracy= " + "{:.4f}".format(test_acc))
            
            f.write("%d, %f\n" % (step, test_acc))
    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
#     test_len = 128
#     test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
#     test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))
    
f.close()

Step 1, Minibatch Loss= 2.7015, Training Accuracy= 0.1562, Testing Accuracy= 0.1028
Step 100, Minibatch Loss= 2.2935, Training Accuracy= 0.1406, Testing Accuracy= 0.0958


KeyboardInterrupt: 

### Tidier Version of the LSTM Model

1. Use LSTMBlockCell, which should be faster than BasicLSTMCell
2. Replace manual weight definitions with tf.layers.Dense
3. Replace tf.nn.softmax_cross_entropy_with_logits with tf.nn.softmax_cross_entropy_with_logits_v2
4. Group graph definition together
5. Replace rnn.static_rnn with rnn.dynamic_rnn. (So no need to unstack the tensor.)
6. Add a batch_normalization layer between LSTM and Dense layers.
7. Add gradient clipping for RNN gradient
8. Add a checkpoint saver
9. Evaluate test accuracy every N steps (BAD PRACTICE: use a validation set instead)
10. Replace GradientDescentOptimizer with RMSPropOptimizer
11. Use tf.set_random_seed to control randomness

In [2]:
# Training Parameters
learning_rate = 0.02
training_steps = 5000
batch_size = 32
display_step = 250

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 64 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

In [3]:
def RNN(x):
    # Define a lstm cell with tensorflow
    lstm_cell = rnn.LSTMBlockCell(
        num_hidden, forget_bias=1.0)

    # Get lstm cell output
    # outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
    outputs, states = tf.nn.dynamic_rnn(
        cell=lstm_cell, inputs=x, time_major=False, dtype=tf.float32)
    
    output_layer = tf.layers.Dense(
        num_classes, activation=None, 
        kernel_initializer=tf.orthogonal_initializer()
    )
    return output_layer(tf.layers.batch_normalization(outputs[:, -1, :]))

In [4]:
# Need to clear the default graph before moving forward
tf.reset_default_graph()
graph = tf.Graph()
with graph.as_default():
    tf.set_random_seed(1)
    # tf Graph input
    X = tf.placeholder("float", [None, timesteps, num_input])
    Y = tf.placeholder("float", [None, num_classes])
    logits = RNN(X)
    prediction = tf.nn.softmax(logits)

    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=Y))
    optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
    gvs = optimizer.compute_gradients(loss_op)
    capped_gvs = [
        (tf.clip_by_norm(grad, 2.), var) if not var.name.startswith("dense") else (grad, var)
        for grad, var in gvs]
    for _, var in gvs:
        if var.name.startswith("dense"):
            print(var.name)    
    train_op = optimizer.apply_gradients(capped_gvs)  

    # Evaluate model (with test logits, for dropout to be disabled)
    correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    # Initialize the variables (i.e. assign their default value)
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    
    print("All parameters:", np.sum([np.product([xi.value for xi in x.get_shape()]) for x in tf.global_variables()]))
    print("Trainable parameters:", np.sum([np.product([xi.value for xi in x.get_shape()]) for x in tf.trainable_variables()]))    

dense/kernel:0
dense/bias:0
All parameters: 73886
Trainable parameters: 24586


In [5]:
# Start training
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
best_val_acc = 0.8
with tf.Session(graph=graph, config=config) as sess:
    # Run the initializer
    sess.run(init)
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            # Calculate accuracy for 128 mnist test images
            test_len = 128
            test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
            test_label = mnist.test.labels[:test_len]
            val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc) + ", Test Accuracy= " + \
                  "{:.3f}".format(val_acc))
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                save_path = saver.save(sess, "/tmp/model.ckpt", global_step=step)
                print("Model saved in path: %s" % save_path)
    print("Optimization Finished!")

Step 1, Minibatch Loss= 2.3193, Training Accuracy= 0.000, Test Accuracy= 0.062
Step 250, Minibatch Loss= 0.2185, Training Accuracy= 0.938, Test Accuracy= 0.828
Model saved in path: /tmp/model.ckpt-250
Step 500, Minibatch Loss= 0.0273, Training Accuracy= 1.000, Test Accuracy= 0.938
Model saved in path: /tmp/model.ckpt-500
Step 750, Minibatch Loss= 0.0158, Training Accuracy= 1.000, Test Accuracy= 0.953
Model saved in path: /tmp/model.ckpt-750
Step 1000, Minibatch Loss= 0.0194, Training Accuracy= 1.000, Test Accuracy= 0.961
Model saved in path: /tmp/model.ckpt-1000
Step 1250, Minibatch Loss= 0.0117, Training Accuracy= 1.000, Test Accuracy= 0.953
Step 1500, Minibatch Loss= 0.0115, Training Accuracy= 1.000, Test Accuracy= 0.906
Step 1750, Minibatch Loss= 0.0073, Training Accuracy= 1.000, Test Accuracy= 0.969
Model saved in path: /tmp/model.ckpt-1750
Step 2000, Minibatch Loss= 0.0244, Training Accuracy= 1.000, Test Accuracy= 0.961
Step 2250, Minibatch Loss= 0.0306, Training Accuracy= 1.000, 

## Pixel-by-pixel Sequential MNIST

View every example as a 784 x 1 sequence.

### CudnnGRU Model

Cudnn's implementation of GRU is much faster, but does not support variational dropout. It also does not support CPU mode.

Also some new additions:

1. Use tf.summary to save logs for Tensorboard
2. Use tf.variable_scope to group variables and operations
3. Use AdamOptimizer instead of RMSPropOptimizer (latter has some problem with CudnnGRU)

In [6]:
# Training Parameters
learning_rate = 0.002
training_steps = 5000
batch_size = 32
display_step = 250
total_batch = int(mnist.train.num_examples / batch_size)
print("Total number of batches:", total_batch)

# Network Parameters
num_input = 1 # MNIST data input (img shape: 28*28)
timesteps = 28 * 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

Total number of batches: 1718


In [7]:
def RNN(x):    
    gru = tf.contrib.cudnn_rnn.CudnnGRU(
        1, num_hidden,
        kernel_initializer=tf.orthogonal_initializer())
    outputs, _ = gru(tf.transpose(x, (1, 0, 2)))    
    output_layer = tf.layers.Dense(
        num_classes, activation=None, 
        kernel_initializer=tf.orthogonal_initializer(),
        trainable=True
    )
    # Linear activation, using rnn inner loop last output
    return output_layer(tf.layers.batch_normalization(outputs[-1, :, :]))

In [8]:
tf.reset_default_graph()
graph = tf.Graph()
with graph.as_default():
    tf.set_random_seed(10)
    # tf Graph input
    X = tf.placeholder("float", [None, timesteps, num_input])
    Y = tf.placeholder("float", [None, num_classes])
    # Define weights
    logits = RNN(X)
    prediction = tf.nn.softmax(logits)

    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=Y))
    
    with tf.variable_scope('optimizer'):
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.1)
        gvs = optimizer.compute_gradients(loss_op)
        capped_gvs = [
            (tf.clip_by_norm(grad, 1.), var) if not var.name.startswith("dense") else (grad, var)
            for grad, var in gvs]
        for _, var in gvs:
            if var.name.startswith("dense"):
                print(var.name)
        train_op = optimizer.apply_gradients(capped_gvs)  
    
    with tf.variable_scope('summarize_gradients'):
        for grad, var in gvs:
            norm = tf.norm(tf.clip_by_norm(grad, 10.), ord=2)
            tf.summary.histogram(var.name.replace(":", "_") + '/gradient_l2', 
                                 tf.where(tf.is_nan(norm), tf.zeros_like(norm), norm))
        for grad, var in capped_gvs:
            norm = tf.norm(grad, ord=2)
            tf.summary.histogram(var.name.replace(":", "_") + '/gradient_clipped_l2', 
                                 tf.where(tf.is_nan(norm), tf.zeros_like(norm), norm))

    merged_summary_op = tf.summary.merge_all()
    
    with tf.variable_scope('evaluate'):
      # Evaluate model (with test logits, for dropout to be disabled)
      correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
      accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    # Initialize the variables (i.e. assign their default value)
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

dense/kernel:0
dense/bias:0


In [9]:
# Start training
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
best_val_acc = 0.8
log_dir = "logs/cudnngru/%s" % datetime.now().strftime("%Y%m%d_%H%M")
Path(log_dir).mkdir(exist_ok=True, parents=True)
tb_writer = tf.summary.FileWriter(log_dir, graph)
with tf.Session(graph=graph, config=config) as sess:
    # Run the initializer
    sess.run(init)
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc, summary = sess.run(
                [loss_op, accuracy, merged_summary_op], 
                feed_dict={X: batch_x, Y: batch_y})
            tb_writer.add_summary(summary, global_step=step)
            tb_writer.flush()
            # Calculate accuracy for 128 mnist test images
            test_len = 128
            test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
            test_label = mnist.test.labels[:test_len]
            val_acc = sess.run(accuracy, feed_dict={X: test_data, Y: test_label})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc) + ", Test Accuracy= " + \
                  "{:.3f}".format(val_acc))
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                save_path = saver.save(sess, "/tmp/model.ckpt", global_step=step)
                print("Model saved in path: %s" % save_path)
    print("Optimization Finished!")

InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

	 [[Node: save/CudnnRNNCanonicalToParams = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=6, rnn_mode="gru", seed=0, seed2=0, _device="/device:GPU:0"](save/CudnnRNNCanonicalToParams/num_layers, save/CudnnRNNCanonicalToParams/num_units, save/CudnnRNNCanonicalToParams/input_size, save/Reshape, save/Reshape_1, save/Reshape_2, save/Reshape_3, save/Reshape_4, save/Reshape_5, save/split_3, save/split_3:1, save/RestoreV2:9, save/split_4, save/split_4:1, save/RestoreV2:10)]]

Caused by op 'save/CudnnRNNCanonicalToParams', defined at:
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/traitlets/config/application.py", line 592, in launch_instance
    app.start()
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 405, in start
    ioloop.IOLoop.instance().start()
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 162, in start
    super(ZMQIOLoop, self).start()
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/tornado/ioloop.py", line 883, in start
    handler_func(fd_obj, events)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 260, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 212, in dispatch_shell
    handler(stream, idents, msg)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 370, in execute_request
    user_expressions, allow_stdin)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 175, in do_execute
    shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2902, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3006, in run_ast_nodes
    if self.run_code(code, result):
  File "/Users/sherryruan/github/cs231n/cs231n-venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-8-49c071f6efe8>", line 46, in <module>
    saver = tf.train.Saver()
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in __init__
    self.build()
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 494, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 288, in restore
    opaque_params = self._CanonicalToOpaqueParams(weights, biases)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 379, in _CanonicalToOpaqueParams
    direction=self._direction)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/ops/gen_cudnn_rnn_ops.py", line 458, in cudnn_rnn_canonical_to_params
    name=name)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/Users/sherryruan/github/TCN/tcn-env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

	 [[Node: save/CudnnRNNCanonicalToParams = CudnnRNNCanonicalToParams[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", num_params=6, rnn_mode="gru", seed=0, seed2=0, _device="/device:GPU:0"](save/CudnnRNNCanonicalToParams/num_layers, save/CudnnRNNCanonicalToParams/num_units, save/CudnnRNNCanonicalToParams/input_size, save/Reshape, save/Reshape_1, save/Reshape_2, save/Reshape_3, save/Reshape_4, save/Reshape_5, save/split_3, save/split_3:1, save/RestoreV2:9, save/split_4, save/split_4:1, save/RestoreV2:10)]]


## Permuted Pixel-by-pixel Sequential MNIST

Increase the difficulty by shuffling the order of the sequence (by applying the same reindexing operation for all sequences).

### CudnnGRU Model
Basically the same. Just added a permutation operation in the graph definition.

In [17]:
tf.reset_default_graph()
graph = tf.Graph()
with graph.as_default():
    tf.set_random_seed(10)
    # tf Graph input
    X_ = tf.placeholder("float", [None, timesteps, num_input])
    Y = tf.placeholder("float", [None, num_classes])
    
    # Permute the time steps
    np.random.seed(100)
    permute = np.random.permutation(784)
    X = tf.gather(X_, permute, axis=1)   
    
    # Define weights
    logits = RNN(X)
    prediction = tf.nn.softmax(logits)

    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=Y))
    
    with tf.variable_scope('optimizer'):
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=0.1)
        gvs = optimizer.compute_gradients(loss_op)
        capped_gvs = [
            (tf.clip_by_norm(grad, 1.), var) if not var.name.startswith("dense") else (grad, var)
            for grad, var in gvs]
        for _, var in gvs:
            if var.name.startswith("dense"):
                print(var.name)
        train_op = optimizer.apply_gradients(capped_gvs)  
    
    with tf.variable_scope('summarize_gradients'):
        for grad, var in gvs:
            norm = tf.norm(tf.clip_by_norm(grad, 10.), ord=2)
            tf.summary.histogram(var.name.replace(":", "_") + '/gradient_l2', 
                                 tf.where(tf.is_nan(norm), tf.zeros_like(norm), norm))
        for grad, var in capped_gvs:
            norm = tf.norm(grad, ord=2)
            tf.summary.histogram(var.name.replace(":", "_") + '/gradient_clipped_l2', 
                                 tf.where(tf.is_nan(norm), tf.zeros_like(norm), norm))

    merged_summary_op = tf.summary.merge_all()
    
    with tf.variable_scope('evaluate'):
      # Evaluate model (with test logits, for dropout to be disabled)
      correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
      accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    # Initialize the variables (i.e. assign their default value)
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

dense/kernel:0
dense/bias:0


In [18]:
#@title Default title text
# Start training
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
best_val_acc = 0.8
log_dir = "logs/cudnngru/%s" % datetime.now().strftime("%Y%m%d_%H%M")
Path(log_dir).mkdir(exist_ok=True, parents=True)
tb_writer = tf.summary.FileWriter(log_dir, graph)
with tf.Session(graph=graph, config=config) as sess:
    # Run the initializer
    sess.run(init)
    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X_: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc, summary = sess.run(
                [loss_op, accuracy, merged_summary_op], 
                feed_dict={X_: batch_x, Y: batch_y})
            tb_writer.add_summary(summary, global_step=step)
            tb_writer.flush()
            # Calculate accuracy for 128 mnist test images
            test_len = 128
            test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
            test_label = mnist.test.labels[:test_len]
            val_acc = sess.run(accuracy, feed_dict={X_: test_data, Y: test_label})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc) + ", Test Accuracy= " + \
                  "{:.3f}".format(val_acc))
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                save_path = saver.save(sess, "/tmp/model.ckpt", global_step=step)
                print("Model saved in path: %s" % save_path)
    print("Optimization Finished!")

Step 1, Minibatch Loss= 2.2866, Training Accuracy= 0.188, Test Accuracy= 0.078
Step 250, Minibatch Loss= 1.5686, Training Accuracy= 0.500, Test Accuracy= 0.508
Step 500, Minibatch Loss= 0.8842, Training Accuracy= 0.781, Test Accuracy= 0.555
Step 750, Minibatch Loss= 0.9858, Training Accuracy= 0.719, Test Accuracy= 0.648
Step 1000, Minibatch Loss= 0.9501, Training Accuracy= 0.750, Test Accuracy= 0.648
Step 1250, Minibatch Loss= 0.7780, Training Accuracy= 0.688, Test Accuracy= 0.688
Step 1500, Minibatch Loss= 1.0259, Training Accuracy= 0.688, Test Accuracy= 0.680
Step 1750, Minibatch Loss= 0.5693, Training Accuracy= 0.906, Test Accuracy= 0.781
Step 2000, Minibatch Loss= 0.6409, Training Accuracy= 0.844, Test Accuracy= 0.805
Model saved in path: /tmp/model.ckpt-2000
Step 2250, Minibatch Loss= 0.6085, Training Accuracy= 0.812, Test Accuracy= 0.719
Step 2500, Minibatch Loss= 0.5675, Training Accuracy= 0.812, Test Accuracy= 0.750
Step 2750, Minibatch Loss= 0.4034, Training Accuracy= 0.938, T