# Imports #

Here we import some python libraries.  The most important of course is the TensorFlow library itself.  We also need the os library for doing some file path computations.  Finally, the urllib library allows us to load data stored somewhere on the web.

In [1]:
import os
import tensorflow as tf
import urllib.request

# Introduction #

This notebook demonstrates how to use the new Tensorflow 1.0 API from Google to classify the MNIST handwritten digits dataset.
More information about TensorFlow can be found at https://www.tensorflow.org/.  This demo is heavily based on material presented by Dandelion Mané (of Google) at the TensorFlow Dev Summit 2017 in his excellent talk: Hands-on TensorBoard (https://www.youtube.com/watch?v=eBbEDRsCmv4&index=4&list=PLOU2XLYxmsIKGc_NBoIhTn2Qhraji53cv) ....very worth watching.

# We need to set up some file locations etc... #

First, we need to define a logging directory to be used in the experiments that follow:

In [2]:
LOGDIR = 'c:/Users/lau/tmp/Demo/'

Dandelion Mané has very kindly set up a GitHub gist (think of a gist as a simplified Github repo, it is just used for sharing small pieces of code and examples) with everything we need in terms of data, here is the path to this gist:

In [3]:
GIST_URL = 'https://gist.githubusercontent.com/dandelionmane/4f02ab8f1451e276fea1f165a20336f1/raw/a20c87f5e1f176e9abf677b46a74c6f2581c7bd8/'

Now let's load the so called MNIST data (more info at http://yann.lecun.com/exdb/mnist/).  Notice that TensorFlow provides a function for loading these data...

In [6]:
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(train_dir = LOGDIR + 'data', one_hot = True)

URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

Dandelion Mané has kindly provided us with some data that we need to make the embedding demo work (more on this later...).  Basically we are loading a sprite and labels file for the embedding projector.

In [None]:
urllib.request.urlretrieve(GIST_URL + 'labels_1024.tsv', LOGDIR + 'labels_1024.tsv')
urllib.request.urlretrieve(GIST_URL + 'sprite_1024.png', LOGDIR + 'sprite_1024.png')

# Now we need to define a few convenience functions #

This first function basically makes it simple to compose some information strings...

In [None]:
def make_hparam_string(learning_rate, num_convs, num_fully_connected):
    return 'LR {0} Conv layers {1} Fully connected layers {2}'.format(learning_rate, num_convs, num_fully_connected)

This next function uses TensorFlow syntax to define a python function that sets up a convolutional layer...

In [None]:
def conv_layer(input, size_in, size_out, conv_size = 3, conv_stride = 1, pool_factor = 1, pool_stride = 1, name = "conv"):
    with tf.name_scope(name):
        # Initialize weights with a truncated normal distribution
        w = tf.Variable(tf.truncated_normal([conv_size, conv_size, size_in, size_out], stddev = 0.1), name = "W")
        # Set the biases to be constants
        b = tf.Variable(tf.constant(0.1, shape = [size_out]), name = "B")
        # Perform the actual convolution
        conv = tf.nn.conv2d(input, w, strides = [1, conv_stride, conv_stride, 1], padding = "SAME")
        # Apply a rectified linear unit to the convolution result
        act = tf.nn.relu(conv + b)
        # Dump information to the summary process
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        # Apply maxpooling and return
        return tf.nn.max_pool(act, ksize = [1, pool_factor, pool_factor, 1], strides = [1, pool_stride, pool_stride, 1], padding = "SAME")

The next function defines a python function that sets up a fully connected layer...

In [None]:
def fc_layer(input, size_in, size_out, name = "fc"):
    with tf.name_scope(name):
        # Initialize weights with a truncated normal distribution
        w = tf.Variable(tf.truncated_normal([size_in, size_out], stddev = 0.1), name = "W")
        # Set the biases to be constants
        b = tf.Variable(tf.constant(0.1, shape = [size_out]), name = "B")
        # Apply a rectified linear unit to the output
        act = tf.nn.relu(tf.matmul(input, w) + b)
        # Dump information to the summary process
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        # Return
        return act

Here goes a monster, it sets up the netowrk we want and trains it. It takes a learning rate and a descriptive string as input.

In [None]:
def mnist_model(learning_rate, hparam):

    # Clear the default graph stack and reset the global default graph
    tf.reset_default_graph()
    
    # Set up a session
    sess = tf.Session()
    
    # Setup placeholders for the image data and reshape the data for display
    x = tf.placeholder(tf.float32, shape = [None, 784], name = "x")
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    
    # Dump information to the summary process
    tf.summary.image('input', x_image, 6)
    
    # Setup placeholders for the label data
    y = tf.placeholder(tf.float32, shape = [None, 10], name = "labels")

    # Define network with three convolutional layers and two fully connected layer
    conv1 =    conv_layer(x_image, 1,  32, conv_size = 3, conv_stride = 1, pool_factor = 2, pool_stride = 2, name = "conv1")
    conv2 =    conv_layer(conv1,  32,  64, conv_size = 3, conv_stride = 1, pool_factor = 2, pool_stride = 2, name = "conv2")

    # Flatten to prepare for the fully connected layers
    #flattened = tf.reshape(conv_out, [-1, 28 * 28 * 128])
    flattened = tf.reshape(conv2, [-1, 7 * 7 * 64])

    # Define fully connected layers
    #fc1 = fc_layer(flattened, 28 * 28 * 128, 1024, "fc1")
    fc1 = fc_layer(flattened, 7 * 7 * 64, 1024, "fc1")
    embedding_input = fc1
    embedding_size = 1024
    logits = fc_layer(fc1, 1024, 10, "fc2")

    # Calculate the cross entropy and send to summary writer
    with tf.name_scope("xent"):
        xent = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = y), name = "xent")
        tf.summary.scalar("xent", xent)

    # Then train
    with tf.name_scope("train"):
        train_step = tf.train.AdamOptimizer(learning_rate).minimize(xent)

    # Calculate accuracy
    with tf.name_scope("accuracy"):
        correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar("accuracy", accuracy)

    # Put all summaries into one chunk
    summ = tf.summary.merge_all()

    # Then prepare for showing the embedding
    embedding = tf.Variable(tf.zeros([1024, embedding_size]), name = "test_embedding")
    assignment = embedding.assign(embedding_input)
    saver = tf.train.Saver()

    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter(LOGDIR + hparam)
    writer.add_graph(sess.graph)

    config = tf.contrib.tensorboard.plugins.projector.ProjectorConfig()
    embedding_config = config.embeddings.add()
    embedding_config.tensor_name = embedding.name
    embedding_config.sprite.image_path = LOGDIR + 'sprite_1024.png'
    embedding_config.metadata_path = LOGDIR + 'labels_1024.tsv'
    
    # Specify the width and height of a single thumbnail.
    embedding_config.sprite.single_image_dim.extend([28, 28])
    tf.contrib.tensorboard.plugins.projector.visualize_embeddings(writer, config)

    # Now train for 2000 iterations
    for i in range(20001):
        batch = mnist.train.next_batch(100)
        if i % 5 == 0:
            [train_accuracy, s] = sess.run([accuracy, summ], feed_dict = {x: batch[0], y: batch[1]})
            writer.add_summary(s, i)
        if i % 500 == 0:
            #sess.run(assignment, feed_dict={x: mnist.test.images[:1024], y: mnist.test.labels[:1024]})
            sess.run(assignment, feed_dict={x: mnist.test.images[:1024], y: mnist.test.labels[:1024]})
            saver.save(sess, os.path.join(LOGDIR, "model.ckpt"), i)
        sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})

# Finally run it all and enjoy #

In [None]:
# Define a learning rate
learning_rate = 5E-5

# Construct a hyperparameter string to describe what we are up to
hparam = make_hparam_string(learning_rate, 2, 1)
print('Starting run for %s' % hparam)

# Actually run with the new settings
mnist_model(learning_rate, hparam)