# TensorFlow MNIST Tutorial

[TensorFlow](https://www.tensorflow.org) is a framework created by Google meant to make programming various machine learning tasks more simple and easily distributable, with an emphasis on deep learning and neural networks. Other frameworks with a similar aim include [Theano](http://deeplearning.net/software/theano/) and [Torch](http://torch.ch).

TensorFlow can be installed through a variety of resources from the command line. The tutorial at [this address](https://www.tensorflow.org/get_started/os_setup#pip_installation) will help walk through installation. Note that installation with GPU support is a separate package.

Check that TensorFlow has been properly installed by performing the following import in Python:

In [2]:
import tensorflow as tf

# The Philosophy Behind TensorFlow

The most central idea in TensorFlow's organization is that machine learning updates occur across a graph that is defined by the user, and that actions that affect the graph are defined separately. In the same way one has to build a fountain before running water through it, the graph must be defined before we run computations across it.

For example, the following code defines a node in a graph 

In [5]:
x = tf.placeholder(tf.float32)

As you can tell, we haven't given this node a value. It is a placeholder that we will provide when we wish to run a computation using the graph. We must specify that when we provide an input to this node, it will be in the format of a 32-bit floating-point number. 

The reason that TensorFlow does this is that running many computations in Python can be slow and unweildly, so TensorFlow farms this code out to its optimized backend, which is written in C/C++. By defining the graph outside of Python, we can use Python as an interface for a much faster backend.

We can obtain additional information using the print command.

In [6]:
print x

Tensor("Placeholder_2:0", dtype=float32)


This also has some negative connotations that make simple operations more complicated. For example, the following two code blocks perform the same operation.

In [11]:
2 + 3

5

In [15]:
with tf.name_scope("model") as scope:
    x = tf.placeholder(tf.float32)
    y = tf.placeholder(tf.float32)
    add_xy = tf.add(x, y)

with tf.Session() as sess:
    result = sess.run([add_xy], feed_dict={x: [2.], y: [3.]})
    print result

[array([ 5.], dtype=float32)]


Obviously, TensorFlow adds significant overhead here. However, the structure of this code will be essentially the same no matter what operation we are attempting to perform with this package.

In the first block we define two placeholders, x and y, which we then use to perform the addition operation with the tf.add() function. This is our graph structure.

In the second block we start a session and run the computations described by the graph relationship we set up previously, specifically asking for results to return the value of the add_xy variable, and passing in our placeholders' values as x = 2 and y = 3.

The result value is the same type as the add_xy value after the computation has been run.

# Why on earth do we do this?

So why use this complicated formulation when a simple formula would be just as effective? The answer is because this complicated framework lends itself to a major feature of machine learning specification, and specifically neural networks: updates and batches.

Once we have defined the relationships between our values, it is easy to repeat computations a number of times in order to train a model. It is also easy to introduce stochastic elements to the training process, like randomly censoring input nodes or values (known as dropout). 

In order to really understand the way we would do this, we must introduce the idea of `variables`. For a "simple" example, we can consider raising a number to a power.

In [16]:
with tf.name_scope('model') as scope:
  # initialize our variable at 1
  x = tf.constant(2)
  y = tf.Variable(1)
  # define an operation on the graph
  new_value = tf.mul(x,y)
  update = tf.assign(y, new_value)
  # initialize values
  init = tf.global_variables_initializer()

with tf.Session() as session:
  session.run(init)
  print(session.run(y))
  for i in range(10):
    session.run(update)
    print(session.run(y))

1
2
4
8
16
32
64
128
256
512
1024


This example introduces a couple of new ideas to the mix. We define `y` to be a `Variable`. This allows this value to be updated over successive iterations over the graph. We use the `update` method of TensorFlow to change the value of `y` each time we run through the graph, and then dump its value using the `session.run` function, which allows us to retrieve the values stored for the current run of the graph.

# Factorial

We can extend the above example using a placeholder value in order to provide the value of a factorial. In this case we supply a list of values, which TensorFlow iterates over in the background. Each new value is multiplied by the current value of y and updated.

In [18]:
with tf.name_scope('model') as scope:
  # initialize our variable at 1
  x = tf.placeholder(tf.int32)
  y = tf.Variable(1)
  # define an operation on the graph
  new_value = tf.mul(x,y)
  update = tf.assign(y, new_value)
  # initialize values
  init = tf.global_variables_initializer()

with tf.Session() as session:
  session.run(init)
  for i in range(1,10):
    session.run(update, feed_dict={x: i})
    print(session.run(y))

1
2
6
24
120
720
5040
40320
362880


Variables are used in machine learning applications for a variety of components that are updated behind the scenes in models. Things like weights and biases can be defined succinctly and manipulated with less mess than writing code from scratch. Placeholders generally serve to mark the nodes across which training and test data will be loaded. Instead of a series of numbers, this may represent a series of image data or other data streams. This organization allows the behaviors of these components to be specified in a more modular fashion, without needing to rewrite a framework from scratch every time a tweak is applied.

# Training a Basic Neural Network

While the former examples are much simpler than a machine learning model, they describe the major operations that TensorFlow performs in order to execute its model. We define a graph and the relationships that link its nodes, then we load data across that graph in order to modify it and arrive at a solution.

We can extend this example into a full classifier that trains a neural network on the MNIST handwritten digit data set and can classify examples with decent results. This section of the tutorial will focus on a very slightly modified version of code made available on [TensorFlow's official page](https://www.tensorflow.org/tutorials/mnist/beginners/) (full code without tutorial available [here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_softmax.py)), and will attempt to explain the choices being made in the code in order to simplify its interpretation. We will not cover the specifics of the modeling choices made by the code, for example the differences between a softmax and ReLU function. Instead the focus will be on the implementation and organization of the code included in this example.

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

These first imports get us updated handling of basic functions, relating to file handling, floating point division, and print function syntax, respectively. We also import the MNIST data set and TensorFlow itself.

In [2]:
# Import data
mnist = input_data.read_data_sets('/tmp/tensorflow/mnist/input_data', one_hot=True)

Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz


This block of code imports and extracts the MNIST data set from its location within the TensorFlow tutorial section. The `one_hot` flag tells TensorFlow that we would like this data arranged in [one-hot encoding](https://en.wikipedia.org/wiki/One-hot). As we are dealing with digits 0-9, we are concerned with ten classes. For example...

In [4]:
# An example of the number 3 in one-hot encoding
[0,0,0,1,0,0,0,0,0,0]

# An example of the number 7 in one-hot encoding
[0,0,0,0,0,0,0,1,0,0]

[0, 0, 0, 0, 0, 0, 0, 1, 0, 0]

Next we are tasked with defining the model we will apply to the data:

In [5]:
# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W) + b

We are able to define a lot of information in these four lines, and ultimately this is the relationship that makes the neural network perform its job. The basic formulation is as follows:

`y = x*W + b`

...where x is a vector of information that we will be training on, W is the weight values of the edges of the neural network, and b is the bias that is applied at each output node.

Our images are arranged in a 28x28 grid, giving us 784 pixels to train on. Each pixel will be an input to our neural net. The weights and biases are set to zero, but as TensorFlow variables they have the ability to be modified as the model is trained.

Next we must define what actions we will apply to train the network.

In [6]:
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Here we set up code to compare our actual values (`y_`) to the predicted values (`y`). We define a cross entropy function, which gives us the loss of the model or how far our predictions are off from the actual values. Essentially, it compares for each class how much likelihood the classifier gave that the example fell in that group, and how close it was to the actual chance of it being in that group. (More information available [here](http://colah.github.io/posts/2015-09-Visual-Information/))

Next we set up an operation called `train_step`, which uses the results of the `cross_entropy` calculation to adjust weights across the graph. Here we define that we want to use a gradient descent algorithm (one of many options given by TensorFlow) and we set it to have a learning rate of 0.5, asking it to minimize the `cross_entropy` value.

*It is worth pausing here for a note.* The above block of code is one of the most magical in the TensorFlow tutorial. It very concisely sets up the backpropagation procedure in a very small space. TensorFlow is able to infer from the structure of the graph what a backpropagation cycle constitutes and to implement it given very little information. This is the real power that TensorFlow affords. By specifying our model in simple terms, we can build very complex learning systems that are very powerful.

Before we can run the actual model, we must perform some housekeeping...

In [7]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

Here we define that we will be running our code in an interactive session in the interpreter. We then initialize the variables that are encoded in our graph with the values that are specified for their setup. We are now ready to train the model.

In [8]:
# Train
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Here we specify that we will be selecting batches of 100 digits from the training set, divided into inputs (`batch_xs`) and true labels (`batch_ys`). We compute one step of gradient descent in the model, based on our results from those 100 digits. This process is repeated 1000 times.

This simple process is very powerful and can be performed for many different kinds of learning models. The process of running random batches across the graph may seem less than ideal in practice, but is very powerful when training data are sufficiently large.

We can now go to our test set and see how well our model performed.

In [9]:
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))

0.9172


This block of code is very confusing, but is actually just a very pithy way of performing some basic mathematical calculations. We define a correct prediction to be a case where the correct classification was also the highest predicted value for a class. We then take the mean of the correctly predicted values (a list of zeros and ones) which gives us the accuracy value. We direct the model to go into our session and extract this computation from the graph, provided the test dataset.

The resulting value should be between 90% and 93% accuracy.

# Takeaways

The above model is a fully specified neural network with error checking. Without imports it is encapsulated in only *16 lines of code.* The model is also highly extensible, and many variables can be changed by changing a single line of code. While other packages exist to define a similar model with less hassle (for example in R or the scikit-learn module of python) the amount of flexibility and optimization is much higher with TensorFlow.

TensorFlow is much more than a toolkit for running machine learning. It is a flexible *modeling framework* that allows many different models to be created and extended by thinking in the context of a graph-based system. Many different learning models can be implemented given that you have a firm grasp on the structure and supplied tools.

# Additional Exercises

In order to get a handle on how this basic example can be updated in order to make it a multilayer neural network (a "deep learning" convolutional neural network model) take a look at the code provided in [this TensorFlow tutorial](https://www.tensorflow.org/tutorials/mnist/pros/).

Many other examples are available at the [TensorFlow tutorial page](https://www.tensorflow.org/tutorials/) with much deeper explanations of the mathematic considerations for building models, especially neural networks.