In [1]:
# code for loading the format for the notebook
import os

# path : store the current path to convert back to it later
path = os.getcwd()
os.chdir( os.path.join('..', 'notebook_format') )
from formats import load_style
load_style()

In [2]:
os.chdir(path)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# change default figure and font size
plt.rcParams['figure.figsize'] = 8, 6 
plt.rcParams['font.size'] = 12

# 1. magic for inline plot
# 2. magic to print version
# 3. magic so that the notebook will reload external python modules
%matplotlib inline
%load_ext watermark
%load_ext autoreload 
%autoreload 2

import tensorflow as tf

%watermark -a 'Ethen' -d -t -v -p numpy,pandas,matplotlib,tensorflow

Ethen 2016-12-26 17:23:00 

CPython 3.5.2
IPython 4.2.0

numpy 1.11.3
pandas 0.18.1
matplotlib 1.5.1
tensorflow 0.11.0rc2


# Tensorflow

## Hello World

In [3]:
# note that this is simply telling tensorflow to 
# create a constant operation, nothing gets
# executed until we start a session and run it
hello = tf.constant('Hello, TensorFlow!')
hello

<tf.Tensor 'Const:0' shape=() dtype=string>

In [4]:
# start the session and run the graph
with tf.Session() as sess:
    print( sess.run(hello) )

b'Hello, TensorFlow!'


We can think of tensorflow as a system to define our computation, and using the operation that we've defined it will construct a computation graph (where each operation becomes a node in the graph). The computation graph that we've defined will not be `run` unless we give it some context and explicitly tell it to do so. In this case, we create the `Session` that encapsulates the environment in which Tensor objects are evaluated (execute the operations that are defined in the graph).

Consider another example that simply add and multiply two constant numbers.

In [5]:
a = tf.constant(2)
b = tf.constant(3)
c = a + b

with tf.Session() as sess:
    print( 'mutiply: ', sess.run(a * b) )
    print( 'add: ', sess.run(c) ) # note that we can define the add operation outside 
    print( 'add: ', sess.run(a + b) ) # or inside the .run()

mutiply:  6
add:  5
add:  5


We can do the same operation as above by first defining a `placeholder` (you must specify the data type). Then `feed` in values using `feed_dict` when we `run` it.

In [6]:
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)

# define some operations
add = a + b
mul = a * b

with tf.Session() as sess:
    print( 'mutiply: ', sess.run( mul, feed_dict = { a: 2, b: 3 } ) )
    print( 'add: ', sess.run( add, feed_dict = { a: 2, b: 3 } ) )

mutiply:  6.0
add:  5.0


Note there's also a `tf.Variable` as we'll later see that is slightly different than the `tf.placeholder`. 

> [Stackoverflow](http://stackoverflow.com/questions/36693740/whats-the-difference-between-tf-placeholder-and-tf-variable). The difference is that with `tf.Variable` you have to provide an initial value when you declare it. With `tf.placeholder` you don't have to provide an initial value and you can specify it at run time with the `feed_dict` argument inside `Session.run`.
> In short, we will use `tf.Variable` for trainable variables such as weights (W) and biases (B) for our model. On the other hand, `tf.placeholder` is used to feed actual training examples.

Some matrix operations are the same compared to numpy. e.g. 	

In [7]:
c = np.array([[3.,4], [5.,6], [6.,7]])
print(c)
print( np.mean(c, 1) )
print( np.argmax(c, 1) )

with tf.Session() as sess:
    result = sess.run( tf.reduce_mean(c, 1) )
    print(result)
    print( sess.run( tf.argmax(c, 1) ) )

[[ 3.  4.]
 [ 5.  6.]
 [ 6.  7.]]
[ 3.5  5.5  6.5]
[1 1 1]
[ 3.5  5.5  6.5]
[1 1 1]


The functionality of `numpy.mean` and `tensorflow.reduce_mean` are the same. When axis (numpy) parameter or reduction_indices (tensorflow) parameter is 1, it computes mean across (3,4) and (5,6) and (6,7), so 1 defines across which axis the mean is computed. When it is 0, the mean is computed across(3,5,6) and (4,6,7), and so on. The same can be applied to argmax which returns the index that contains the maximum value along an axis.

## MNIST Using Softmax

MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:

<img src='images/mnist.png'>

Each image is 28 pixels by 28 pixels, which is essentially a 28 * 28 array of numbers. To use it in a context of a machine learning problem, we can flatten this array into a vector of 28 * 28 = 784, this will be the number of features for each image. It doesn't matter how we flatten the array, as long as we're consistent between images.

The dataset also includes labels for each image, telling us the each image's label. For example, the labels for the above images are 5, 0, 4, and 1. Here we're going to train a softmax model to look at images and predict what digits they are. The possible label values in the MNIST dataset are numbers between 0 and 9, hence this will be a 10-class classification problem.

In [8]:
# convenient one-liner to load the dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data/', one_hot = True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


The downloaded data is split into three parts, 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation).

Every part of the dataset contains the data and label and we can access them via `.images` and `.labels`. e.g. the training images are mnist.train.images and the train labels are mnist.train.labels (one-hot encoded).

In [9]:
# pixels 
print(mnist.train.images.shape)
mnist.train.images

(55000, 784)


array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32)

In [10]:
# labels
print(mnist.train.labels.shape)
mnist.train.labels

(55000, 10)


array([[ 0.,  0.,  0., ...,  1.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.]])

In [11]:
n_features = mnist.train.images.shape[1]
n_class = mnist.train.labels.shape[1]

# define the input and output 
# here None means that a dimension can be of any length,
# which is what we want, since the number of observations we have can vary
X = tf.placeholder( tf.float32, [None, n_features] )
y = tf.placeholder( tf.float32, [None, n_class] )

# initialize both W and b as tensors full of zeros. 
# these are parameters that the model is later going to learn 
W = tf.Variable( tf.zeros([n_features, n_class]) )
b = tf.Variable( tf.zeros([n_class]) )

# matrix multiplication using the .matmul command
# and add the softmax output
output = tf.nn.softmax( tf.matmul(X, W) + b )

# cost function
cross_entropy = tf.reduce_mean( -tf.reduce_sum( y * tf.log(output), 1 ) )

Now that we defined the structure of our model, we'll:

1. Define a optimization algorithm the train it. In this case, we ask TensorFlow to minimize our defined cross_entropy cost using the gradient descent algorithm with a learning rate of 0.5.
2. We'll also add an operation to initialize the variables we created.
3. Define helper "function" to evaluate the prediction accuracy.

In [12]:
learning_rate = 0.5 
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
init = tf.initialize_all_variables()

# here we're return the predicted class of each observation using argmax
# and see if the ouput (prediction) is equal to the target variable (y)
# since equal is a boolean type tensor, we cast it to a float type.
correct_prediction = tf.equal( tf.argmax(y, 1), tf.argmax(output, 1) )
accuracy = tf.reduce_mean( tf.cast(correct_prediction, tf.float32) )

Now it's time to run it. During each step of the loop, we get a "batch" of one hundred random data points (defined by `batch_size`) from our training set. We run train_step feeding in the batches data to replace the placeholders.

Using small batches of random data is called stochastic training -- in this case, stochastic gradient descent. Ideally, we'd like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that's expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.

In [13]:
# define some global variables
batch_size = 100
iterations = 1000

with tf.Session() as sess:
    
    # initialize the variable, train the "batch" gradient descent
    # for a specified number of iterations and evaluate on accuracy score
    # remember the key to the feed_dict dictionary must match the variable we use
    # as the placeholder for the data in the beginning
    sess.run(init)
    for i in range(iterations):
        X_batch, y_batch = mnist.train.next_batch(batch_size)
        _, acc = sess.run( [optimizer, accuracy], feed_dict = { X: X_batch, y: y_batch } )
        
        # simply prints the training data's accuracy for every 100 iteration
        if i % 100 == 0:
            print(acc)
    
    # after training evaluate the accuracy on the testing data
    acc = sess.run( accuracy, feed_dict = { X: mnist.test.images, y: mnist.test.labels } )
    print('test:', acc)

0.12
0.95
0.93
0.95
0.88
0.92
0.93
0.93
0.89
0.88
test: 0.9199


Notice that we did not have to worry about computing the gradient to update the model, the nice thing about Tensorflow is that, once we've defined the structure of our model it has the capability to automatically differentiate mathematical expressions. This means we no longer need to compute the gradients ourselves!

This this example, our softmax classifier obtained pretty nice result around 92%.

## Reference

- [TensorFlow MNIST For ML Beginners](https://www.tensorflow.org/versions/r0.10/tutorials/mnist/beginners/index.html)
- [CS224D Lecture 7 - Introduction to TensorFlow](https://www.youtube.com/watch?v=L8Y2_Cq2X5s&index=7&list=PLmImxx8Char9Ig0ZHSyTqGsdhb9weEGam)