<h1>Using TensorFlow With GPU</h1>
<p>Assuming you have an NVIDIA GPU with Cuda Compute Capability 3.0 or above...</p>
<p>Build TensorFlow from <a href="https://www.tensorflow.org/versions/master/get_started/os_setup.html#source">source</a> and configure it using the following command:</p>

In [1]:
%%bash
# cd to tensorflow root, do the following... The unofficial setting lets you use 3.0 GPUs instead of minimum 3.5
# TF_UNOFFICIAL_SETTING=1 ./configure

<p>Note that the above has some interactive prompts you need to fill out, so you can't do it from within this notebook. Then create a pip install like this:</p>

In [2]:
%%bash
# Make sure you're using python 2.7
# python --version
# download and install tensorflow with gpu capability (from the pip package you build from source!!!)
# see: https://www.tensorflow.org/versions/master/get_started/os_setup.html#create-pip

# ====================== UNCOMMENT THIS LINE ===================== #
# pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl

# Note: the name of the .whl may change in the future...

<p>Now your code will run through the damn GPU from your iPython notebook. Sick, huh? Now do this:</p>

In [3]:
import numpy as np
import tensorflow as tf
CPU = "/cpu:0"
GPU = "/gpu:0"

<h2>Computational Graphs with TensorFlow</h2>
<p>TensorFlow uses graphs to define computations. You create constants and operations, and using a <code>Session()</code> object, allow TesnorFlow to automatically handle the overhead of allocating resources and calling external libraries for you. When the session finishes, the resources are freed and the session terminates.</p>

In [4]:
# some constants and an operation (variables...)
matrix1 = tf.constant([[3.,3.]]) # 1 x 2 matrix
matrix2 = tf.constant([[2.],[2.]]) # 2 x 1 matrix
product = tf.matmul(matrix1, matrix2) # (1 x 2) * (2 x 1)

# create/run the session
sess = tf.Session()
result = sess.run(product)

print(result)

#close session
sess.close()

[[ 12.]]


<p>Trying the above using a <code>with</code> block...</p>

In [5]:
with tf.Session() as sess:
    result = sess.run(product)
    print(result)

[[ 12.]]


<p>You can also run it on a GPU or CPU by design:</p>

In [6]:
def run_on_dev(dev="/gpu:0"):
    with tf.Session() as sess:
        with tf.device(dev):
            A = tf.constant([[3.,3.]])
            B = tf.constant([[2.],[2.]])
            product = tf.matmul(A, B)
            result = sess.run(product)
            print(result)

dev1 = CPU
dev2 = GPU
run_on_dev(dev1)
run_on_dev(dev2)

[[ 12.]]
[[ 12.]]


## Interactive Session

In [7]:
# create interactive session
sess = tf.InteractiveSession()
x = tf.Variable([1.0, 2.0])
a = tf.constant([3.0, 3.0])

# initialize x
x.initializer.run()

# add an op to subtract a from x
sub = tf.sub(x, a)
print(sub.eval())

sess.close()

[-2. -1.]


### Data Types

In [8]:
# floats
# print(tf.float32)
# print(tf.float64)

# ints
# print(tf.int64)
# print(tf.int32)
# print(tf.int16)
# print(tf.int8)
# print(tf.uint8)

# other
# print(tf.string)
# print(tf.bool)
# print(tf.complex64)

# quantized
# print(tf.qint32)
# print(tf.qint8)
# print(tf.quint8)

### Device Allocation & Logging

In [9]:
def dev_log(dev):
    with tf.device(dev):
        a = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0], shape=[9,1], name='a')
        b = tf.constant([1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0], shape=[1,9], name='b')
        c = tf.matmul(a, b)
    # run
    with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: 
        print(sess.run(c))

In [10]:
# run on CPU
dev_log(CPU)

# run on GPU
dev_log(GPU)

[[  1.   2.   3.   4.   5.   6.   7.   8.   9.]
 [  2.   4.   6.   8.  10.  12.  14.  16.  18.]
 [  3.   6.   9.  12.  15.  18.  21.  24.  27.]
 [  4.   8.  12.  16.  20.  24.  28.  32.  36.]
 [  5.  10.  15.  20.  25.  30.  35.  40.  45.]
 [  6.  12.  18.  24.  30.  36.  42.  48.  54.]
 [  7.  14.  21.  28.  35.  42.  49.  56.  63.]
 [  8.  16.  24.  32.  40.  48.  56.  64.  72.]
 [  9.  18.  27.  36.  45.  54.  63.  72.  81.]]
[[  1.   2.   3.   4.   5.   6.   7.   8.   9.]
 [  2.   4.   6.   8.  10.  12.  14.  16.  18.]
 [  3.   6.   9.  12.  15.  18.  21.  24.  27.]
 [  4.   8.  12.  16.  20.  24.  28.  32.  36.]
 [  5.  10.  15.  20.  25.  30.  35.  40.  45.]
 [  6.  12.  18.  24.  30.  36.  42.  48.  54.]
 [  7.  14.  21.  28.  35.  42.  49.  56.  63.]
 [  8.  16.  24.  32.  40.  48.  56.  64.  72.]
 [  9.  18.  27.  36.  45.  54.  63.  72.  81.]]


In [11]:
def big_tensor_multiply(dev,dim=1000):
    with tf.device(dev):
        a = tf.constant(np.random.rand(dim,dim), shape=[dim,dim], name='a')
        b = tf.constant(np.random.rand(dim,dim), shape=[dim,dim], name='b')
        c = tf.matmul(a, b)
        d = tf.matrix_inverse(c)
    # soft_placement allows tensorflow to allocate ops to device of its choice
    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess:
        print(sess.run(d))

In [12]:
big_tensor_multiply(GPU)

[[ 0.20990674 -0.20230829  0.17776081 ..., -0.20146913  0.10701872
  -0.00976813]
 [ 0.29962251  0.66250558 -0.87355327 ...,  0.49991114 -0.54095028
  -0.09736525]
 [-0.78200617  0.8090682   0.42642459 ..., -0.66816305  0.90810963
   0.53495434]
 ..., 
 [ 1.40751078 -2.00168374 -0.41832389 ...,  0.78288914 -0.85603356
  -1.32163825]
 [ 0.42822143 -0.65827047 -0.15779854 ...,  0.28139009 -0.16366303
  -0.30078936]
 [ 0.25333576 -0.05197653  0.18205467 ..., -0.07066444  0.09918352
   0.41477799]]


In [13]:
def use_multiple_devices(devices):
    for d in devices:
        print("Using device: " + d)
        big_tensor_multiply(d)            

In [14]:
use_multiple_devices([GPU,CPU])

Using device: /gpu:0
[[  1.67987110e+01   4.59042604e+00  -7.03502289e+00 ...,  -5.90590991e+00
   -2.61779107e+01   3.09562784e+01]
 [  2.93990567e+00   8.93633654e-01  -1.23732048e+00 ...,  -1.13977535e+00
   -4.85073875e+00   5.58757069e+00]
 [  9.93436299e+00   2.65019172e+00  -4.14363473e+00 ...,  -3.50918142e+00
   -1.50688831e+01   1.81543764e+01]
 ..., 
 [  6.40669165e+00   1.68839968e+00  -2.82182753e+00 ...,  -2.31056570e+00
   -9.34648475e+00   1.20442782e+01]
 [  9.08421470e-01   1.51617430e-01  -2.09308943e-01 ...,  -3.96352446e-03
   -1.90075986e+00   1.47208046e+00]
 [ -3.94115146e-01  -1.37250877e-04   8.38180684e-02 ...,   1.64958373e-01
    4.36865552e-01  -4.86890235e-01]]
Using device: /cpu:0
[[  81.94335386   36.65881484 -111.14857879 ..., -144.25263076
   -21.26620605   48.12862562]
 [  13.93349061    5.93656083  -18.65762119 ...,  -24.61643898
    -3.57620121    7.52201247]
 [  71.14872792   32.42690571  -96.95856142 ..., -125.21547656
   -18.57256676   42.823330

## Using Variables

In [15]:
def counter_step(step):
    # really overblown counter
    state = tf.Variable(0, name="counter")
    
    # val 1
    one = tf.constant(step)
    
    # val step + state
    new_val = tf.add(state, one)
    
    # update assign operation
    update = tf.assign(state, new_val)

    # graph launch
    init_operation = tf.initialize_all_variables()

    # run graph
    with tf.Session() as sess:
        sess.run(init_operation)
        print(sess.run(state))
        for _ in range(3):
            sess.run(update)
            print(sess.run(state))

In [16]:
counter_step(4)

0
4
8
12


<h2>MNIST example: Handwritten Digit Recognition</h2>
<p>First, to download and install data for MNIST dataset. This code will be reused later as well...</p>

In [17]:
import input_data # comes from the file provided in the tutorial...
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# mnist.train -- training data
# mnist.test -- testing data
# mnist.train.images -- training images
# mnist.train.labels -- training labesls

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [18]:
print("Here's what the training data looks like:\n")
print(mnist.train.images)
print("\nNum images: " + str(len(mnist.train.images)))
print("Num labels: " + str(len(mnist.train.labels)))

Here's what the training data looks like:

[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]

Num images: 55000
Num labels: 55000


<h2>Model Parameters</h2>
<p>In this example we will only use a single layer model...</p>

In [19]:
# create an input vector for flattened images...
x = tf.placeholder(tf.float32, [None, 784])

# weight matrix 784 x 10 
W = tf.Variable(tf.zeros([784,10]))

# Biases
b = tf.Variable(tf.zeros([10]))

### Softmax Regression

In [20]:
# y = output = softmax(Sum(W * x) + b)
y = tf.nn.softmax(tf.matmul(x, W) + b)

### Cross-Entropy Output
General Form of the Cross Entropy loss function
\begin{align}
H_{y^{\prime}}\left(y\right) &= -\sum_i  \ y^{\prime}_i \ log\left(y_i\right)
\end{align}

In [21]:
# OP: Truth value
t = tf.placeholder(tf.float32, [None, 10])

# OP: Loss function
cross_entropy = -tf.reduce_sum(t * tf.log(y))

### Backpropagation Training

In [22]:
# OP: GD optimization
learning_rate = 0.01
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

### Initialization

In [23]:
# OP: initialize stuff. duh.
init = tf.initialize_all_variables()

### Training & Testing

In [24]:
# 1000 training iterations
def train_test(dev):
    batch_size = 100
    num_epochs = 10000

    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
        with tf.device(dev):
            # Run the session
            sess.run(init)
            for i in range(num_epochs):
                # periodic print out
                if i % (num_epochs/10.0) == 0: print("Epoch: " + str(i) + "...")
                batch_inputs, truth_values = mnist.train.next_batch(batch_size)
                sess.run(train_step, feed_dict={x: batch_inputs, t: truth_values})

            # OP: compare truth values to predictions
            correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(t, 1))

            # OP: calculate accuracy
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

            # RUN: print the accuracy
            test_result = sess.run(accuracy, feed_dict={x: mnist.test.images, t: mnist.test.labels})
            print("Accuracy on Test set: " + str(test_result))

In [25]:
train_test(GPU)

Epoch: 0...
Epoch: 1000...
Epoch: 2000...
Epoch: 3000...
Epoch: 4000...
Epoch: 5000...
Epoch: 6000...
Epoch: 7000...
Epoch: 8000...
Epoch: 9000...
Accuracy on Test set: 0.9193


# Using a Deeper Model: MNIST
But first a recap...

In [26]:
import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [27]:
# Input & Truth Vector
def get_placeholders():
    x = tf.placeholder("float",shape=[None,784])
    t = tf.placeholder("float",shape=[None,10])
    return (x, t)

# Weights & Bias
def get_model_params():
    W = tf.Variable(tf.zeros([784,10]))
    b = tf.Variable(tf.zeros([10]))
    return (W, b)

# Softmax Layer
def get_softmax_layer(x, W, b):
    return tf.nn.softmax(tf.matmul(x, W) + b)

# Cost Function
def get_cross_entropy_function(t, y):
    return -tf.reduce_sum(t * tf.log(y))

# Training module
def get_training_module(learning_rate, cost_function):
    return tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_function)

# Test the model
def do_test_model(inputs, outputs, truth):
    correct_prediction = tf.equal(tf.argmax(outputs,1), tf.argmax(truth,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    test_accuracy = accuracy.eval(feed_dict={inputs: mnist.test.images, truth: mnist.test.labels})
    print("Test Accuracy: " + str(test_accuracy))

# Training iterations
def do_train_model(training_algo, input_values, truth_values, batch_size, num_epochs):
    for i in range(num_epochs):
        if i % (num_epochs/10) == 0: print("Epoch " + str(i) + "...")
        batch = mnist.train.next_batch(batch_size)
        training_algo.run(feed_dict={input_values: batch[0], truth_values: batch[1]})

# Train/test
def do_train_test(learning_rate, batch_size, num_epochs):
    x, t = get_placeholders()
    W, b = get_model_params()
    y = get_softmax_layer(x, W, b)
    ce = get_cross_entropy_function(t, y)
    training_algo = get_training_module(learning_rate, ce)
    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())
        do_train_model(training_algo, x, t, batch_size, num_epochs)
        do_test_model(x, y, t)

In [28]:
learning_rate = 0.01
batch_size = 100
num_epochs = 1000
do_train_test(learning_rate, batch_size, num_epochs)

Epoch 0...
Epoch 100...
Epoch 200...
Epoch 300...
Epoch 400...
Epoch 500...
Epoch 600...
Epoch 700...
Epoch 800...
Epoch 900...
Test Accuracy: 0.9186


## Convolutional Model
First, some data parameters we'll need:

In [32]:
x, t = get_placeholders()
W, b = get_model_params()
y = get_softmax_layer(x, W, b)
ce = get_cross_entropy_function(t, y)
training_algo = get_training_module(learning_rate, ce)

Then a series of functions we'll need to do convolutions, etc...

In [36]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

### Convolutional Layers
This will create a convolution layer with 32 filters, each being a 5x5 pixel patch. The shape will therefore be [5,5,1,32] which indicates the size of our filters, the number of input channels (1), and the number of output channels (32). There is also a bias vector for each output channel, so a 32-dim vector of bias terms. 

In [37]:
# First convolutional layer
W_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])

To apply this layer, we reshape it to a 4d tensor, with the 2nd and 3rd dimensions corresponding to the image width and heigh, and the final dimension to the number of color channels. (1 for greyscale). 

In [38]:
x_image = tf.reshape(x, [-1,28,28,1])

Convolve x_image with the weight tensor, add bias, and apply ReLU, and finally a max pool...

In [39]:
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)