This tutorial is from 'MNIST For ML Beginners' of tensorflow learning guide. And it's an explanation tutorial, line by line of what's happending in the mnist_softmax.py code.

From this tutorial we can learn below contents:
* Learn about the MNIST data and softmax regressions;
* Create a function that's model for recognizing digits, based on looking every pixel in the image;
* Use tensorflow to train the model to recognize digits by having it "look" at thousands of examples;
* Check the model's accuracy with our test data.

The MNIST data is hosted on Yann LeCun's websit(yann.lecun.com/exdb/mnist/). We can download and read in the data automatically with below two lines.

In [2]:
%matplotlib inline
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


The MNIST data is split into three parts: 55000 data points of training data; 10000 data points of test data and 5000 data points of validation data. 
It's essential in machine learning that we have seperate data which we don't learn from so that we can make sure that what we've learned actually generalizes.

In [7]:
#Check 'mnist' datasets
print mnist.train
print mnist.train.num_examples
print mnist.test
print mnist.test.num_examples
print mnist.validation
print mnist.validation.num_examples

<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x65cc590>
55000
<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x65cca90>
10000
<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x65ccb50>
5000


To do efficient numerical computing in Python, we typically use libraries like NumPy that do expensive operations such as matrix multiplication outside Python, using hightly efficient code implemented in another language. Unfortunately there can still be a lot of overhead from switching back to Python every operation.
Tensorflow also does its heavy lifting outside Python, but it takes things a step further to avoid this overhead. Instead of running a single expensive operation independently from Python,, tensorflow lets us describe a graph of interacting operations that run entirely outside Python.

Below are graph building procedure.
Like other framework as Theano or Torch, tensorflow lets us describe a graph of interacting operations that run entirely outside Python. So we use python ( actually a wrapper ) only to construct the graph, then run it with C++ (the more efficient language ) as the backend. The role of the Python code is therefore to build this external computation graph, and to dictate which parts of the computation graph should be run.

In [9]:
import tensorflow as tf
#x is a 'placeholder', a value that'll input when we ask tensorflow to run a computation. We want to be able to input any 
#number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating point
#numbers with a shape [None, 784]. (Here None means a dimension can be of any length.)
x = tf.placeholder(tf.float32, [None, 784])
print('x', x)
W = tf.Variable(tf.zeros([784, 10]))
print('W', W)
b = tf.Variable(tf.zeros([10]))
print('b', b)
#Below one line ddefines our final model
y = tf.nn.softmax(tf.matmul(x, W) + b)
print('y', y)

('x', <tf.Tensor 'Placeholder_1:0' shape=(?, 784) dtype=float32>)
('W', <tensorflow.python.ops.variables.Variable object at 0x65de610>)
('b', <tensorflow.python.ops.variables.Variable object at 0x65de850>)
('y', <tf.Tensor 'Softmax_1:0' shape=(?, 10) dtype=float32>)


We can use cross-entropy to evaluate if our model is good or bad.
To implement cross-entropy we need to add a new placeholder to input the correct answers.

In [11]:
y_ = tf.placeholder(tf.float32, [None, 10])
#then we implement cross-entropy function
#cross_entroy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices = [1]))
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = y, labels = y_))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

After building the whole graph, we will start to train and learn its parameters. All the work should be done in a session.
Tensorflow relies a highly efficient C++ backend to do its computation. The connection to this backend is called a session. The common usuage for tensorflow programs is to first create a graph and then launch it in a session.
Here we instead use the convenient InteractiveSession class, which makes tensorflow more flexible about how you structure your code. It allows you to interleave operations which build a computation graph with ones that run the graph.
This is particularly convenient when working in interactive contexts like IPython. If you're not using an InteractiveSession, then you should build the entire computation graph before starting a session and lanuching the graph.

In [12]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

Now everything is ready, we had the graph, had the session. Then we can start the training iteration like below.

In [15]:
for _ in xrange(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    #or use below codes instead
    #batch = mnist.train.next_batch(100)
    #train_step.run(feed_dict={x: batch[0], y_: batch[1]})
print('W', sess.run(W))
print('b', sess.run(b))

('W', array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32))
('b', array([-0.22346789,  0.37578991, -0.03983562, -0.25641957,  0.18791199,
        0.95993721,  0.04604222,  0.51998931, -1.30617368, -0.26377633], dtype=float32))


Using small batches of random data is called stochastic training -- in this case, stochastic gradient descent. Ideally, we'd like to use all our data for every step of training because that would give us a better sense of what we would be doing, but that's expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.

Also we need to know if our trained model can predict the correct label. tf.argmax is an extreamely useful function which gives us the index of the highest entry in a tensor along some axis.

In [16]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
#or use below codes
#print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9183
