## Data Prep

Read in data using input_data function. The result is a tensor of dimension [$n$, 784] where $n$ is the number of points. 784 comes from the size of the image (28x28)

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("mypath", one_hot=True)

Extracting /mypath/train-images-idx3-ubyte.gz
Extracting /mypath/train-labels-idx1-ubyte.gz
Extracting /mypath/t10k-images-idx3-ubyte.gz
Extracting /mypath/t10k-labels-idx1-ubyte.gz


In [4]:
print(mnist.train.images.shape)
print(mnist.test.images.shape)
print(mnist.validation.images.shape)

(55000, 784)
(10000, 784)
(5000, 784)


In [2]:
import tensorflow as tf
print(1)

1


## Simple Softmax Regression

We will create a simple one-layer neural network (i.e. of the form $y=softmax(Wx+b)$)

Everything in TF is done va symbolic variables that build a graph to be executed. 

We start with as a placeholder for the input (the None tells us that the input can be of any size). We also initialized $W$ and $b$ to zeros

In [3]:

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

In [4]:
y = tf.nn.softmax(tf.matmul(x, W) + b)

### Training with Cross-Entropy

We will train using a cross-entropy loss. To do this, we need to first compute $\hat{y}$

In [5]:
yhat = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum( yhat * tf.log(y), reduction_indices=[1]))

In [6]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

### Execution

Up to this point, we have created a graph of objects telling TF what our input, weights and output looks like, wha the loss function is and how the model is trained. We now have to actually run this.

In [7]:
batch_size = 100
num_iterations = 1000

init = tf.global_variables_initializer()

sess = tf.Session()
sess.run(init)

for i in range(num_iterations):
    batch_xs, batch_ys = mnist.train.next_batch(batch_size)
    sess.run(train_step, feed_dict = {x: batch_xs, yhat: batch_ys})

### Evaluation

We can assess the model

In [34]:
y_num = tf.argmax(y,1) #convert from one hot to numeric representation
yhat_num = tf.argmax(yhat,1)
is_correct = tf.equal(y_num, yhat_num) #need to run to get into local environment


accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))
sess.run(accuracy, feed_dict={x:mnist.test.images, yhat: mnist.test.labels})

0.91949999

In [35]:
import pandas as pd
results = pd.DataFrame({'y':  y_num.eval(session=sess, feed_dict={x:mnist.test.images, yhat: mnist.test.labels}), \
                        'yhat' : yhat_num.eval(session=sess, feed_dict={x:mnist.test.images, yhat: mnist.test.labels})})

pd.crosstab(results.y, results.yhat)

yhat,0,1,2,3,4,5,6,7,8,9
y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,950,0,4,3,1,7,6,2,4,9
1,0,1109,7,1,2,4,3,7,12,6
2,3,3,929,26,4,2,4,24,9,1
3,2,2,11,893,1,22,1,6,18,8
4,0,0,6,0,903,6,9,6,8,31
5,7,2,6,37,0,774,10,1,30,10
6,12,4,16,4,17,21,922,0,12,1
7,3,2,11,17,3,8,1,952,14,32
8,3,13,34,23,9,42,2,2,864,12
9,0,0,8,6,42,6,0,28,3,899


## Multilayer ConvNet

Let's try a more complicated model. We first build some functions which will simplify the model building

In [104]:
#we don't want zero weights to start this time; we will either want some noise or a non-zero constant

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

# we need to define how we do convolutions and pooling
# this is max pooling on 2x2 blocks and convolutions of stride one, zero-padded
#these functions are solely for convenience and could be specified at each step instead

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides=[1,2,2,1], padding='SAME')

### First and Second Conv Layers

We add a first layer to model - a convolution and max-pooling, done on a 5x5 patch and computing 32 features. We only have 1 channel to worry about, as well as a bias vector on the output

In [105]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

In [106]:
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1]) #convert image to the format we want

In [107]:
#add the actuall convolving, ReLU and pooling

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

In [108]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

### Dense Layer and Dropout and Softmax

Finally, we add the fully connected layer and dropout

In [109]:
W_dense1 = weight_variable([7 * 7 * 64, 1024])
b_dense1 = bias_variable([1024])

h_pool_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_dense1 = tf.nn.relu(tf.matmul(h_pool_flat, W_dense1) + b_dense1) 

#dropout prob

keep_prob = tf.placeholder(tf.float32)
h_dense1_drop = tf.nn.dropout(h_dense1, keep_prob)

#output/softmax

W_dense2 = weight_variable([1024, 10])
b_dense2 = bias_variable([10])

y_conv =  tf.matmul(h_dense1_drop, W_dense2) + b_dense2

In [110]:
h_pool2 

<tf.Tensor 'MaxPool_7:0' shape=(?, 7, 7, 64) dtype=float32>

### Train

In [121]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_conv, labels=y_))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

yconv_num = tf.argmax(y_conv, 1) #convert from one hot to numeric representation
y_num = tf.argmax(y_ ,1)
is_correct = tf.equal(yconv_num, y_num) #need to run to get into local environment

accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

sess2 = tf.Session()
sess2.run(tf.global_variables_initializer())

num_iterations = 1000
batch_size = 50

for i in range(num_iterations):
    batch = mnist.train.next_batch(batch_size)
    if i%100 == 0:
        test_accuracy = accuracy.eval(session=sess2,feed_dict={x:mnist.test.images,y_: mnist.test.labels, keep_prob:1.0})
        print("Step %d, Current Testing Accuracy %g" %(i, test_accuracy))

    fd = {x:batch[0],y_: batch[1], keep_prob:0.5}
    sess2.run(train_step, feed_dict=fd)


test_accuracy = accuracy.eval(session=sess2,feed_dict={x:mnist.test.images,y_: mnist.test.labels, keep_prob:1.0})
print("Done! Test Accuracy %g" %(test_accuracy))

Step 0, Current Testing Accuracy 0.0658
Step 100, Current Testing Accuracy 0.8456
Step 200, Current Testing Accuracy 0.9081
Step 300, Current Testing Accuracy 0.9307
Step 400, Current Testing Accuracy 0.9395
Step 500, Current Testing Accuracy 0.941
Step 600, Current Testing Accuracy 0.9524
Step 700, Current Testing Accuracy 0.9579
Step 800, Current Testing Accuracy 0.9583
Step 900, Current Testing Accuracy 0.9621
Done! Test Accuracy 0.9643


In [144]:
sess2.close()
sess.close()