## Import the Data

(MNIST is a subset of the NIST dataset of handwritten numbers)
Create a new subdirectory called 'MNIST_data' in the same directory as this Jupyter Notebook.
Download these files from [Yann LeCun's Website](http://yann.lecun.com/exdb/mnist/) and save them in that new directory.

* [Training Data 1](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz)
* [Training Data 1](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)
* [Test Data 1](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz)
* [Test Data 2](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz)

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


# Begin an Interactive Tensorflow Session

This will allow for the use of Tensorflow's [computational graph](https://www.tensorflow.org/versions/r0.7/get_started/basic_usage.html#launching-the-graph-in-a-session). 


In [2]:
import tensorflow as tf
sess = tf.InteractiveSession()

## Build up a Softmax Regression Model

### Create Placeholders for Input Data and Labels

*x* will act as a placeholder for the input data. The data is in the form of a flattened 28x28 matrix.

*y_* will act as a placeholder for the one-hot vector of the label corresponding to the digits 0-9.

In [3]:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

### Set up variables

$W$ is a modifiable tensor (tf.variable) that will contain the weignts of the learned computations. The shape is 748x10 so it matches the shape of a flattened 28x28 matrix from the input augmented with a 'one-hot' vector. Since it's unknown what values $W$ will contain, it's set to zeros.

$b$ is a modifiable tensor (tf.Variable) that will contain the biases for the learned computation. Like $W$, it's unknown what will be in $b$ at this point so set it to zeros.

In [4]:
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

### Initialize the Variables

In [5]:
sess.run(tf.initialize_all_variables())

### Cost Function and Regression

**Softmax** is a regression model based on the function: $$softmax(x)_i = \frac{exp(x_i)}{\sum_j exp(x_j)}$$

**Cross Entropy** is defined as: $$H_{y'}(y) = \sum_{i} {y}_{i}^{'}log_2(y_i)$$

where $y$ is the *predicted* probability distribution and $y'$ is the true distribution given in the one-hot vector label.

In [6]:
y = tf.nn.softmax(tf.matmul(x,W) + b)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

# Train the Model

Because TensorFlow knows the entire computation graph, it can use automatic differentiation to find the gradients of the cost with respect to each of the variables. TensorFlow has a variety of builtin optimization algorithms. For this example, we will use steepest gradient descent, with a step length of 0.01, to descend the cross entropy.


In [7]:
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

## Iterate Over the Training Step

Iterating will feed the output of the training step into a feed dictionary and ultimately come up with a final answer within some margin of error of the actual answer.

In [8]:
for i in range(1000):
  batch = mnist.train.next_batch(50)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

# Evaluate the Model

Using tf.equal will create a vector of boolean values that can be fed to tf.reduce mean to produce a fraction corresponding to the accuracy of the prediction. 

In [9]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

## Print the Results

Using the models given should produce a result of ~91%.

In [10]:
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9092


# Adding Additional Convolution Layers to Create a Neural Network

Until now, this has been an excercise in logistical regression but without the use of any kind of multilevel confolution network.

## Weight Initialization

The exercise above used a tensor for weights and a tensor for biases. From here, it will take a number of Weight bias tensors.

In [12]:
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

## Convolutions and Pools

The convolution layer of the neural network effectively slides around the data and performs operations on the elements underneath it. The maximum size of the pool in the example is 2x2 with a convolutional stride of 1. That's to say that it moves over 1 element at a time with no padding.

In [13]:
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

### Create the First Convolution Layer

The first convolutional layer will compute 32 features for each 5x5 patch. The first two elements in the shape vector are the 5x5 patch, followed by the number of input channels, and last element is for the output channel. 

The bias vector is setup with a vector for the output channel.

In [21]:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

## #Apply the Layer

In order to apply the layer x is reshaped into a 4 dimensional tensor with the second and third dimensions coresponding to the original width and height of the image, and the last dimension coresponding to the color depth.

Combine x_image with the weight tensor, add the bias, apply the ReLU function, and finally max pool.

In [22]:
x_image = tf.reshape(x, [-1,28,28,1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

### Create the Second Convolution Layer

Building a deep network involves stacking multiple layers ontop of each other.

This second layer has 64 features for each 5x5 patch.

In [23]:
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

### Densely Connected Layer

At this point the image size has been reduced to 7x7 it's time to add a fully connected layer of 1024 neurons which will process the entire image. 

Reshape the tensor from the pooling layer into a batch of vectors, multiply by a weight matrix, add a bias, and apply a ReLU.

In [24]:
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

### Reduce Overfitting and Dropout



In [25]:
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

### Readout

Add a softmax layer like the one used above.

In [26]:
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

## Train the Model

(Draw the rest of the owl...)


In [None]:
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print("step %d, training accuracy %g"%(i, train_accuracy))
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print("test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.12
step 100, training accuracy 0.84
step 200, training accuracy 0.84
step 300, training accuracy 0.9
step 400, training accuracy 0.98
step 500, training accuracy 0.98
step 600, training accuracy 0.96
step 700, training accuracy 0.88
step 800, training accuracy 0.98
step 900, training accuracy 0.96
step 1000, training accuracy 0.96
step 1100, training accuracy 1
step 1200, training accuracy 0.94
step 1300, training accuracy 0.96
step 1400, training accuracy 1
step 1500, training accuracy 0.96
step 1600, training accuracy 0.98
step 1700, training accuracy 0.98
step 1800, training accuracy 0.98
step 1900, training accuracy 0.98
step 2000, training accuracy 0.96
step 2100, training accuracy 1
step 2200, training accuracy 0.98
step 2300, training accuracy 1
step 2400, training accuracy 1
step 2500, training accuracy 0.98
step 2600, training accuracy 1
step 2700, training accuracy 1
step 2800, training accuracy 0.98
step 2900, training accuracy 0.98
step 3000, tra