## Intro:

We now do the more advanced tutorial.

In [38]:
%matplotlib inline
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np

In [39]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [40]:
sess = tf.InteractiveSession()

"TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. This approach is similar to that used in Theano or Torch."

In [41]:
x = tf.placeholder(tf.float32, shape=[None,784])
y_ = tf.placeholder(tf.float32, shape=[None,10])

In [42]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
# Variables must be initialized before they can be used:
sess.run(tf.initialize_all_variables())

In [43]:
y = tf.matmul(x,W) + b # predicted output
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y,y_))
# I think the reason why we use the log version is, as said in the ML book,
# the product formulation is succeptable to "numerical underflow"

## Training

In [44]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In [45]:
for i in range(1000):
    batch = mnist.train.next_batch(100)
    train_step.run(feed_dict = {x: batch[0], y_: batch[1]})

In [46]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

In [47]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [48]:
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9184


## Build a Multilayer Convolutional Network

Note that for this model, we need more weights. To quote the tut: initialize weights with a small amount of noise for "symmetry breaking" and to prevent 0 gradients. 

They also use ReLU neurons (?), they say it's good practice to initialize these with positive bias, to avoid "dead neurons". 

In [49]:
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

They now disuss convolution and pooling.. what do these terms mean?? boundaries? stride size? Apparently, in this model, our stride size will be one, and is "zero-padded", so that output is the same size as the input. Pooling is "max pooling" over $2 \times 2$ blocks.

In [50]:
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME')
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides = [1,2,2,1], padding = 'SAME')

The convolution will compute 32 features for each $5\times5$ patch of the image? The associated weight tensor will have shape [5,5,1,32]. The first two dimensions are the patch size, the third is the number of input channels, and the last is the number of output channels. 

### First convolutional layer

In [51]:
W_conv1 = weight_variable([5,5,1,32])
b_conv1 = bias_variable([32])

In [52]:
# we then reshape x into a 4dim tensor
x_image = tf.reshape(x,[-1,28,28,1])

We convolve x_image with the weight tensor, add the bias and apply ReLU function. Finally, apply max pool. The function max_pool_2x2 will reduce the image size to $14\times14$. This is something I'd like to visualize! 

In [53]:
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

### Second convolutional layer

We stack several layers of this type. The second layer will have 64 features for each $5\times5$ patch. 

In [54]:
W_conv2 = weight_variable([5,5,32,64])
b_conv2 = bias_variable([64])

In [58]:
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

### Densely Connected Layer

Image size is now reduced to $7\times7$, we now add a "fully-connected" layer with 1024 neurons to allow processing on the entire image. Reshape tensor from the pooling layer..

In [63]:
W_fcl = weight_variable([7*7*64, 1024])
b_fcl = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fcl) + b_fc1)

### Dropout

This is to reduce overfitting... create a placeholder for the probability that a neuron's output is kept during dropout. 

In [64]:
keep_prob = tf.placeholder(tf.float32)
h_fcl_drop = tf.nn.dropout(h_fc1, keep_prob)

### Readout Layer

In [66]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fcl_drop, W_fc2) + b_fc2

### Train and Evaluate the Model

Replace the steepest gradient descent optimizer with ADAM optimizer
We have the keep_prob parameter in feed_dict to control dropout rate
Add "logging" to every 100th interation of the training process.

In [68]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.12
step 100, training accuracy 0.74
step 200, training accuracy 0.96
step 300, training accuracy 0.92
step 400, training accuracy 0.94
step 500, training accuracy 0.94
step 600, training accuracy 0.94
step 700, training accuracy 0.94
step 800, training accuracy 0.96
step 900, training accuracy 0.96
step 1000, training accuracy 0.98
step 1100, training accuracy 0.94
step 1200, training accuracy 0.98
step 1300, training accuracy 0.98
step 1400, training accuracy 0.96
step 1500, training accuracy 0.98
step 1600, training accuracy 1
step 1700, training accuracy 0.96
step 1800, training accuracy 0.98
step 1900, training accuracy 1
step 2000, training accuracy 0.96
step 2100, training accuracy 1
step 2200, training accuracy 0.94
step 2300, training accuracy 1
step 2400, training accuracy 1
step 2500, training accuracy 0.98
step 2600, training accuracy 0.96
step 2700, training accuracy 0.96
step 2800, training accuracy 0.96
step 2900, training accuracy 0.98
step 30

KeyboardInterrupt: 