## import data

In [146]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


Here mnist is a lightweight class which stores the training, validation, and testing sets as NumPy arrays. It also provides a function for iterating through data minibatches, which we will use below.

## Start TensorFlow InteractiveSession

Tensorflow relies on a highly efficient C++ backend to do its computation. The connection to this backend is called a session. The common usage for TensorFlow programs is to first create a graph and then launch it in a session.

Here we instead use the convenient ```InteractiveSession``` class, which makes TensorFlow more flexible about how you structure your code. It allows you to interleave operations which build a [computation graph](http://www.tensorflow.org/get_started/basic_usage.html#the-computation-graph) with ones that run the graph. This is particularly convenient when working in interactive contexts like iPython. If you are not using an ```InteractiveSession```, then you should build the entire computation graph before starting a session and [launching the graph](http://www.tensorflow.org/get_started/basic_usage.html#launching-the-graph-in-a-session).

In [147]:
import tensorflow as tf
sess = tf.InteractiveSession()

### Computation Graph

To do efficient numerical computing in Python, we typically use libraries like NumPy that do expensive operations such as matrix multiplication outside Python, using highly efficient code implemented in another language. Unfortunately, there can still be a lot of overhead from switching back to Python every operation. This overhead is especially bad if you want to run computations on GPUs or in a distributed manner, where there can be a high cost to transferring data.

TensorFlow also does its heavy lifting outside Python, but it takes things a step further to avoid this overhead. Instead of running a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. This approach is similar to that used in Theano or Torch.

The role of the Python code is therefore to build this external computation graph, and to dictate which parts of the computation graph should be run. See the [Computation Graph](http://www.tensorflow.org/get_started/basic_usage.html#the-computation-graph) section of [Basic Usage](http://www.tensorflow.org/get_started/basic_usage.html) for more detail.

## Build a Softmax Regression Model

In this section we will build a softmax regression model with a single linear layer. In the next section, we will extend this to the case of softmax regression with a multilayer convolutional network.

## Placeholders

We start building the computation graph by creating nodes for the input images and target output classes.

In [148]:
x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

Here $x$ and $y\_$ aren't specific values. Rather, they are each a ```placeholder``` -- a value that we'll input when we ask TensorFlow to run a computation.

The input images $x$ will consist of a 2d tensor of floating point numbers. Here we assign it a ```shape``` of ```[None, 784]```, where 784 is the dimensionality of a single flattened MNIST image, and ```None``` indicates that the first dimension, corresponding to the batch size, can be of any size. The target output classes $y\_$ will also consist of a 2d tensor, where each row is a one-hot 10-dimensional vector indicating which digit class the corresponding MNIST image belongs to.

The ```shape``` argument to ```placeholder``` is optional, but it allows TensorFlow to automatically catch bugs stemming from inconsistent tensor shapes.

Variables

We now define the weights $W$ and biases $b$ for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle them: ```Variable```. A ```Variable``` is a value that lives in TensorFlow's computation graph. It can be used and even modified by the computation. In machine learning applications, one generally has the model paramaters be ```Variable```s.

In [149]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

In [150]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

We pass the initial value for each parameter in the call to ```tf.Variable```. In this case, we initialize both $W$ and $b$ as tensors full of zeros. $W$ is a 784x10 matrix (because we have 784 input features and 10 outputs) and $b$ is a 10-dimensional vector (because we have 10 classes).

Before ```Variables``` can be used within a session, they must be initialized using that session. This step takes the initial values (in this case tensors full of zeros) that have already been specified, and assigns them to each ```Variable```. This can be done for all ```Variables``` at once.

In [151]:
W_fc1 = weight_variable([784, 100])
b_fc1 = bias_variable([100])

h_fc1 = tf.nn.relu(tf.matmul(x, W_fc1) + b_fc1)

W_fc2 = weight_variable([100, 30])
b_fc2 = bias_variable([30])

h_fc2 = tf.matmul(h_fc1, W_fc2) + b_fc2

W_fc3 = weight_variable([30, 10])
b_fc3 = bias_variable([10])

h_fc3 = tf.matmul(h_fc2, W_fc3) + b_fc3
    

## Predicted Class and Cost Function

We can now implement our regression model. It only takes one line! We multiply the vectorized input images $x$ by the weight matrix $W$, add the bias $b$, and compute the softmax probabilities that are assigned to each class.

In [152]:
y = tf.nn.softmax(h_fc3)

The cost function to be minimized during training can be specified just as easily. Our cost function will be the cross-entropy between the target and the model's prediction.

## Train and Evaluate the Model

How well does this model do? To train and evaluate it we will use code that is nearly identical to that for the simple one layer SoftMax network above. The differences are that: we will replace the steepest gradient descent optimizer with the more sophisticated ADAM optimizer; we will include the additional parameter ```keep_prob``` in ```feed_dict``` to control the dropout rate; and we will add logging to every 100th iteration in the training process.

In [153]:
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

In [98]:
#graph_location = tempfile.mkdtemp()
#print('Saving graph to: %s' % graph_location)
#train_writer = tf.summary.FileWriter(graph_location)
#train_writer.add_graph(tf.get_default_graph())

Saving graph to: /var/folders/99/7j5v1c952h526sdx1plcnz2r0000gn/T/tmpk058qj2r


In [154]:

#Create a saver object which will save all the variables
saver = tf.train.Saver()

summary = sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1]})
    print ("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

#saver.save(sess, '/Users/bar/Downloads/tfMnist/my_test_model',global_step=1000)

print ("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels}))

step 0, training accuracy 0.1
step 100, training accuracy 0.46
step 200, training accuracy 0.66
step 300, training accuracy 0.76
step 400, training accuracy 0.94
step 500, training accuracy 0.88
step 600, training accuracy 0.92
step 700, training accuracy 0.9
step 800, training accuracy 0.86
step 900, training accuracy 0.86
step 1000, training accuracy 0.86
step 1100, training accuracy 0.88
step 1200, training accuracy 0.84
step 1300, training accuracy 0.88
step 1400, training accuracy 0.92
step 1500, training accuracy 0.9
step 1600, training accuracy 0.88
step 1700, training accuracy 0.9
step 1800, training accuracy 0.84
step 1900, training accuracy 0.9
step 2000, training accuracy 0.88
step 2100, training accuracy 0.94
step 2200, training accuracy 0.9
step 2300, training accuracy 0.94
step 2400, training accuracy 0.92
step 2500, training accuracy 0.96
step 2600, training accuracy 0.96
step 2700, training accuracy 0.96
step 2800, training accuracy 0.88
step 2900, training accuracy 0.8