# Imports #

Here we import some python libraries.  The most important of course is the TensorFlow library itself.  We also need the os library for doing some file path computations.  Finally, the urllib library allows us to load data stored somewhere on the web.

In [8]:
import os
import tensorflow as tf
import urllib

# Introduction #

This notebook demonstrates how to use the new Tensorflow 1.0 API from Google to classify the MNIST handwritten digits dataset.
More information about TensorFlow can be found at https://www.tensorflow.org/.  This demo is heavily based on material presented by Dandelion Mané (of Google) at the TensorFlow Dev Summit 2017 in his excellent talk: Hands-on TensorBoard (https://www.youtube.com/watch?v=eBbEDRsCmv4&index=4&list=PLOU2XLYxmsIKGc_NBoIhTn2Qhraji53cv) ....very worth watching.

# We need to set up some file locations etc... #

First, we need to define a logging directory to be used in the experiments that follow.

In [9]:
LOGDIR = 'logdir/'

We now create the logdirectory if it doesn't exist. When start a cell with '!' in a jupyter notebook, we don't execute python-code anymore, but we are actually executing the command in a shell. So the below is equivalent to going to your shell and executing "mkdir -p $LOGDIR"

In [10]:
! mkdir -p LOGDIR

A subdirectory or file -p already exists.
Error occurred while processing: -p.
A subdirectory or file LOGDIR already exists.
Error occurred while processing: LOGDIR.


Dandelion Mané has very kindly set up a GitHub gist (think of a gist as a simplified Github repo, it is just used for sharing small pieces of code and examples) with everything we need in terms of data, here is the path to this gist:

In [11]:
GIST_URL = 'https://gist.githubusercontent.com/dandelionmane/4f02ab8f1451e276fea1f165a20336f1/raw/a20c87f5e1f176e9abf677b46a74c6f2581c7bd8/'

Now let's load the so called MNIST data (more info at http://yann.lecun.com/exdb/mnist/).  Notice that TensorFlow provides a function for loading these data...

In [12]:
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(train_dir = LOGDIR + 'data', one_hot = True)

Extracting logdir/data\train-images-idx3-ubyte.gz
Extracting logdir/data\train-labels-idx1-ubyte.gz
Extracting logdir/data\t10k-images-idx3-ubyte.gz
Extracting logdir/data\t10k-labels-idx1-ubyte.gz


Dandelion Mané has kindly provided us with some data that we need to make the embedding demo work (more on this later...).  Basically we are loading a sprite and labels file for the embedding projector.

In [15]:
urllib.request.urlretrieve(GIST_URL + 'labels_1024.tsv', LOGDIR + 'labels_1024.tsv')
urllib.request.urlretrieve(GIST_URL + 'sprite_1024.png', LOGDIR + 'sprite_1024.png')

('logdir/sprite_1024.png', <http.client.HTTPMessage at 0x19072438>)

# Now we need to define a few convenience functions #

This first function basically makes it simple to compose some information strings...

In [16]:
def make_hparam_string(learning_rate, num_convs, num_fully_connected):
    return 'LR {0} Conv layers {1} Fully connected layers {2}'.format(learning_rate, num_convs, num_fully_connected)

This next function uses TensorFlow syntax to define a python function that sets up a convolutional layer...

In [17]:
def conv_layer(input, size_in, size_out, conv_size = 3, conv_stride = 1, pool_factor = 1, pool_stride = 1, name = "conv"):
    with tf.name_scope(name):
        # Initialize weights with a truncated normal distribution
        w = tf.Variable(tf.truncated_normal([conv_size, conv_size, size_in, size_out], stddev = 0.1), name = "W")
        # Set the biases to be constants
        b = tf.Variable(tf.constant(0.1, shape = [size_out]), name = "B")
        # Perform the actual convolution
        conv = tf.nn.conv2d(input, w, strides = [1, conv_stride, conv_stride, 1], padding = "SAME")
        # Apply a rectified linear unit to the convolution result
        act = tf.nn.relu(conv + b)
        # Dump information to the summary process
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        # Apply maxpooling and return
        return tf.nn.max_pool(act, ksize = [1, pool_factor, pool_factor, 1], strides = [1, pool_stride, pool_stride, 1], padding = "SAME")

The next function defines a python function that sets up a fully connected layer...

In [18]:
def fc_layer(input, size_in, size_out, name = "fc"):
    with tf.name_scope(name):
        # Initialize weights with a truncated normal distribution
        w = tf.Variable(tf.truncated_normal([size_in, size_out], stddev = 0.1), name = "W")
        # Set the biases to be constants
        b = tf.Variable(tf.constant(0.1, shape = [size_out]), name = "B")
        # Apply a rectified linear unit to the output
        act = tf.nn.relu(tf.matmul(input, w) + b)
        # Dump information to the summary process
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        # Return
        return act

# Create Model for MNIST #
Now let's see how we can create the MNIST model

In [19]:
# Setup placeholders for the image data and reshape the data for display
x = tf.placeholder(tf.float32, shape = [None, 784], name = "x")
x_image = tf.reshape(x, [-1, 28, 28, 1])

# Dump information to the summary process
tf.summary.image('input', x_image, 6)

# Setup placeholders for the label data
y = tf.placeholder(tf.float32, shape = [None, 10], name = "labels")

# Define network with three convolutional layers and two fully connected layer
conv1 = conv_layer(x_image, 1,  32, conv_size = 5, conv_stride = 1, pool_factor = 2, pool_stride = 2, name = "conv1")
conv2 = conv_layer(conv1,  32,  64, conv_size = 5, conv_stride = 1, pool_factor = 2, pool_stride = 2, name = "conv2")

# Flatten to prepare for the fully connected layers
#flattened = tf.reshape(conv_out, [-1, 28 * 28 * 128])
flattened = tf.reshape(conv2, [-1, 7 * 7 * 64])

# Define fully connected layers
#fc1 = fc_layer(flattened, 28 * 28 * 128, 1024, "fc1")
feature_layer_size = 1024
fc1 = fc_layer(flattened, 7 * 7 * 64, feature_layer_size, "fc1")

logits = fc_layer(fc1, feature_layer_size, 10, "fc2")

# Defining loss and optimizer #
We here use the cross-entropy loss, which is commonly used for classification tasks

In [20]:
# define cross-entropy loss
with tf.name_scope("xent"):
    xent = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = y), name = "xent")
    tf.summary.scalar("xent", xent)

Now for the optimization. We here use a optimization method called [ADAM](https://arxiv.org/pdf/1412.6980.pdf). This is a version of stochastic gradient descent (SGD) which tries to adjust the step size dynamically. Note that we don't even have to figure out the derivatives, TensorFlow propagates down the graph without any supervision!

In [21]:
# Define a learning rate
learning_rate = 1E-4

# Then define train operations - nothing runs yet, we are just defining the execution graph!
with tf.name_scope("train"):
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(xent)

We also want to check how many errors we make

In [22]:
# Calculate accuracy
with tf.name_scope("accuracy"):
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar("accuracy", accuracy)

In [23]:
# Put all summaries into one chunk
summ = tf.summary.merge_all()

# Embeddings #

This is a way to visualize the features you are learning. Our features are the layer where we make our predictions from. As the network is learning, the feature vectors of the different classes should be separated in space. This is only for visualization purposes, it will not affect training. (As you will later see there seems to be some mismatch in the indexing of the images...)

In [24]:
# Then prepare for showing the embedding
embedding_input = fc1 # take input from the last fully connected layers before prediction
embedding_size = 1024

embedding = tf.Variable(tf.zeros([1024, embedding_size]), name = "test_embedding")
assignment = embedding.assign(embedding_input)


config = tf.contrib.tensorboard.plugins.projector.ProjectorConfig()
embedding_config = config.embeddings.add()
embedding_config.tensor_name = embedding.name
embedding_config.sprite.image_path = LOGDIR + 'sprite_1024.png'
embedding_config.metadata_path = LOGDIR + 'labels_1024.tsv'
    
# Specify the width and height of a single thumbnail.
embedding_config.sprite.single_image_dim.extend([28, 28])


# Let's start executing #
Up until now have just been setup, telling TensorFlow what we want to do. To actually do something we need to create a session.

In [25]:
# Initialize a session
sess = tf.Session()

# All variables must be initialized! Now we actually start executing something
sess.run(tf.global_variables_initializer())

hparam = make_hparam_string(learning_rate, 2, 2) # create a string to add to the name of the logdir
writer = tf.summary.FileWriter(LOGDIR + hparam)

writer.add_graph(sess.graph) # add graph for visualization
tf.contrib.tensorboard.plugins.projector.visualize_embeddings(writer, config) # add embedding to visualizations

Now lets launch a TensorBoard server so we can visualize what happens. 
Go to your shell and enter the ouput of the command below as a command

In [27]:
! echo tensorboard --logdir $LOGDIR

tensorboard --logdir logdir/


Now in your webbrowser navigate to localhost:6006 (i.e. write "localhost:6006" in the address bar).
You may also use the provided bookmark in the virtual machine! It's only the graphs section that contains any information yet! Try to right-clik on the "train" label and remove it from the main graph. You will now get a clearer picture.

Let's start training!

In [28]:
saver = tf.train.Saver() # use to checkpoint the model

# Now train for 2501 iterations
for i in range(2501):
    batch = mnist.train.next_batch(100)
    if i % 5 == 0:
        [train_accuracy, s] = sess.run([accuracy, summ], feed_dict = {x: batch[0], y: batch[1]})
        writer.add_summary(s, i)
    if i % 500 == 0:
        sess.run(assignment, feed_dict={x: mnist.test.images[:1024], y: mnist.test.labels[:1024]}) 
        saver.save(sess, os.path.join(LOGDIR, "model.ckpt"), i)
    sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})

Now go to TensorBoard and explore! We have here plotted only the training loss and accuracy. What is a possible issue with this? Can you extend this to also plot the validation accuracy? (Hint: you may need a second "writer")! You may e.g. use the mnist.test.images dataset for validation. 

You could also try a different learning rate, and see if that affects the results. Or change the number of layers!