# Intro to Tensorflow

[Tensorflow](https://www.tensorflow.org/) is an open-source machine learning library from Google. We have been using it as the backend for Keras. It is a very powerful library that allows you to design and build entirely new machine learning models. It is also design for large-scale and production use-cases, while Keras focuses more on quick prototyping and testing ideas. Here we introduce the basic ideas of Tensorflow by implementing a simple MLP model for MNIST - we've already seen the Keras equivalent.

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from keras.datasets import mnist
from keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)

Tensorflow is a lower level library: there is much more code to write to create a model than in Keras because of this. But it also means you have a massive amount of flexibility. Below is a simple model for MNIST with a single hidden layer.

In [20]:
graph = tf.Graph()
with graph.as_default():
    
    # Placeholders tell Tensorflow that we will provide this data
    # later when we run the model.
    x = tf.placeholder(tf.float32, shape=[None, 784]) # input data
    y_ = tf.placeholder(tf.float32, shape=[None, 10]) # one-hot labels

    # We need to make our variables explicitly
    W0 = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1), name='W0')
    bias0 = tf.Variable(tf.zeros([10]), name='bias0')

    # We write layers as we would mathematically
    y = tf.nn.softmax(tf.matmul(x, W0) + bias0)
    
    # Cross entropy loss
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

    # Gradient step, 0.5 learning rate
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
    
    # Compute accuracy on batch
    prediction = tf.argmax(y, 1)
    groundtruth = tf.argmax(y_, 1)
    same = tf.equal(prediction, groundtruth)
    accuracy = tf.reduce_mean(tf.cast(same, tf.float32))
    
    # Op to init variables
    init = tf.initialize_all_variables()


After defining a graph of operations that implements the computation in the model, we can run this graph by creating a Session. Sessions manage the execution of the graph. Below we create a Session and run gradient descent.

In [21]:
sess = tf.Session(graph=graph)

# Initialize variables
sess.run(init)

batch_size = 128
num_steps = 1000

batch_start = 0
for i in range(num_steps):

    # Get a batch of data
    X_batch = X_train[batch_start:batch_start + batch_size, :]
    y_batch = y_train[batch_start:batch_start + batch_size, :]

    batch_start += batch_size
    if batch_start > X_train.shape[0] - batch_size:
        batch_start = 0
    
    # Run the optimizer operation to take a gradient step
    sess.run(train_step, feed_dict={x: X_batch, y_: y_batch})
    
    # Print out accuracy and loss
    if not i % 200:
        loss, acc = sess.run([cross_entropy, accuracy],
                             feed_dict={x: X_batch, y_: y_batch})
        print 'Loss: ', loss, '   Accuracy: ', acc

Loss:  1.87679    Accuracy:  0.375
Loss:  0.215099    Accuracy:  0.9375
Loss:  0.337602    Accuracy:  0.890625
Loss:  0.285959    Accuracy:  0.90625
Loss:  0.324795    Accuracy:  0.929688


Finally, we can compute the accuracy on the test data.

In [22]:
print 'Test accuracy: ', sess.run(accuracy, feed_dict={ x: X_test, y_: y_test })

Test accuracy:  0.9138


- - -
### Exercise 1 - Add a hidden layer

Copy the model and training code from above and add a ReLU hidden layer before the softmax. Train this deep model. With a good choice for the hidden layer size, you should get over 95% accuracy and higher if you train longer.
- - -

In [23]:
graph = tf.Graph()
with graph.as_default():
    
    # Placeholders tell Tensorflow that we will provide this data
    # later when we run the model.
    x = tf.placeholder(tf.float32, shape=[None, 784]) # input data
    y_ = tf.placeholder(tf.float32, shape=[None, 10]) # one-hot labels

    # We need to make our variables explicitly
    W0 = tf.Variable(tf.truncated_normal([784, 512], stddev=0.1), name='W0')
    bias0 = tf.Variable(tf.zeros([512]), name='bias0')

    # We can write layers as we would mathematically
    hidden = tf.nn.relu(tf.matmul(x, W0) + bias0)
    
    # Next layer weights
    W1 = tf.Variable(tf.truncated_normal([512, 10], stddev=0.1), name='W1')
    bias1 = tf.Variable(tf.zeros([10]), name='bias1')
    
    # Softmax
    y = tf.nn.softmax(tf.matmul(hidden, W1) + bias1)
    
    # Cross entropy loss
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

    # Gradient step
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
    
    # Compute accuracy on batch
    prediction = tf.argmax(y, 1)
    groundtruth = tf.argmax(y_, 1)
    same = tf.equal(prediction, groundtruth)
    accuracy = tf.reduce_mean(tf.cast(same, tf.float32))
    
    # Op to init variables
    init = tf.initialize_all_variables()


sess = tf.Session(graph=graph)

# Initialize variables
sess.run(init)

batch_size = 128
num_steps = 1000

batch_start = 0
for i in range(num_steps):

    # Get a batch of data
    X_batch = X_train[batch_start:batch_start + batch_size, :]
    y_batch = y_train[batch_start:batch_start + batch_size, :]

    batch_start += batch_size
    if batch_start > X_train.shape[0] - batch_size:
        batch_start = 0
    
    # Run the optimizer operation to take a gradient step
    sess.run(train_step, feed_dict={x: X_batch, y_: y_batch})
    
    # Print out accuracy and loss
    if not i % 200:
        loss, acc = sess.run([cross_entropy, accuracy],
                             feed_dict={x: X_batch, y_: y_batch})
        print 'Loss: ', loss, '   Accuracy: ', acc
        
print 'Test accuracy: ', sess.run(accuracy, feed_dict={ x: X_test, y_: y_test })

Loss:  5.56055    Accuracy:  0.28125
Loss:  0.0990738    Accuracy:  0.992188
Loss:  0.106638    Accuracy:  0.96875
Loss:  0.0585083    Accuracy:  0.984375
Loss:  0.0658509    Accuracy:  0.984375
Test accuracy:  0.9684
