In this exercise you will explore deeper models, still using the MNIST data set. Again the model follows the lecture closely. 

This notebook contains code to train a fully connected deep neural network on MNIST. The principal changes from the previous notebook are:

* We have switched from a shallow model to a deep neural network.

* We are using the AdamOptimizer instead of the vanilla GradientDescentOptimizer.

* We are using a much smaller learning rate and running more steps

An important takeaway: notice the code to calculate the loss and train the model is identical to the previous notebook, despite the more complex model.

Experiment with this notebook by modifying cells and running the cells which contain parameters.

Although this is a simple model, we can achieve about >97% accuracy on MNIST, which is impressive. 

In [6]:
# cell 1
# notebook version 1.2

import math
import os
import numpy as np

%pylab inline

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

print ('cell finished')

Populating the interactive namespace from numpy and matplotlib
cell finished


The next cell resets the default graph, in case there was a residual from a previous operation, then creates a session for our run. 

The next few lines specify variables used throughout the rest of the notebook. You can modify the EPOCHS, LEARNING_RATE, and BATCH_SIZE to optimize training time and accuracy, but don't change the number of pixels or classes or your model will fail. 

In [7]:
# cell 2

tf.reset_default_graph()
sess = tf.Session()

NUM_PIXELS = 28 * 28
NUM_CLASSES = 10
# Don't change number of pixels or classes

EPOCHS = 2000
LEARNING_RATE = .001
BATCH_SIZE = 100

print ('cell finished')

cell finished


The next cell will read the MNIST data into a variable named appropriately, mnist. This will read in the training and testing data such that the training data is mnist.train, and the testing data is mnist.test

In [8]:
# cell 3

mnist = input_data.read_data_sets('/tmp/data', one_hot=True)
# reads MNIST data into a variable called "mnist"

print ('cell finished')

Extracting /tmp/data\train-images-idx3-ubyte.gz
Extracting /tmp/data\train-labels-idx1-ubyte.gz
Extracting /tmp/data\t10k-images-idx3-ubyte.gz
Extracting /tmp/data\t10k-labels-idx1-ubyte.gz
cell finished


The next cell will create two placeholders that we will feed with training data initially, and testing data once the model is trained. Once trained for deployment the expected label input will not be necessary. 

In [9]:
# cell 4

# Define input placeholders as in last model. As you will see not much changes around the model

x = tf.placeholder(tf.float32, [None, NUM_PIXELS])
y_ = tf.placeholder(tf.float32, [None, NUM_CLASSES])
# y_ is expected label input

print ('cell finished')


cell finished


The next cell does all the work to create the deep network. Variable `l1w` is the weight tensor for the first hidden layer and variable `l1b` is the biases. The input placeholder `x` is matrix multiplied with the weight tensor, the bias is added and the result passed to a relu activation function returning the hidden layer activation `l1actv`. 

Activation `l1actv` is the input to the output layer. The output layer is specified by weight `l2w` and bias `l2b`. Again we use a matrix multiplication of the previous layer activation by the layer weights and add the bias. 

Output `y` is again the inferred classification of the pixel values for the input image. 



In [12]:
# cell 5

# Define the model as in the lecture. There will be a few more variables since we have multiple layers and 
# intermediate activation outputs. Use tf.nn.relu for the activation functions

l1w = tf.Variable(tf.truncated_normal([NUM_PIXELS,500], stddev=0.1))
l1b = tf.Variable(tf.constant(0.1,shape=[500]))
l1actv = tf.nn.relu(tf.matmul(x,l1w)+ l1b)

l2w = tf.Variable(tf.truncated_normal([500,NUM_CLASSES], stddev=0.1))
l2b = tf.Variable(tf.constant(0.1,shape=[NUM_CLASSES]))
l2actv = tf.nn.relu(tf.matmul(l1actv,l3w)+ l3b)

l3w = tf.Variable(tf.truncated_normal([NUM_PIXELS,500], stddev=0.1))
l3b = tf.Variable(tf.constant(0.1,shape=[500]))

y = tf.matmul(l2actv,l3w) + l3b

print ('cell finished')


ValueError: Dimensions must be equal, but are 10 and 784 for 'MatMul_7' (op: 'MatMul') with input shapes: [?,10], [784,500].

These lines look very similar to the shallow model in the previous exercise but this time we will use the AdamOptimizer, a lower learning rate, and many more training iterations. 

In [None]:
# cell 6

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_,logits=y))

train_step = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cross_entropy)

print ('cell finished')


Initialize the variables

In [None]:
# cell 7

sess.run(tf.global_variables_initializer())
# Initializes variables

print ('cell finished')


Let's again try the model before we train it to see what happens. Modify image_index below and try a few values. The range of allowed values is from 0 to 4999. Hit shift-enter to run the cell again and recalculate the value.

You will re-run this cell again after the model is trained and see the difference

In [None]:
# cell 8

image_index = 50
exp_label = np.argmax(mnist.test.labels[image_index], 0)
x_image = np.reshape(mnist.test.images[image_index], [-1,784])

outval = sess.run(y, feed_dict={x:x_image})
label= np.argmax(outval[0],0)

print ("calculated label = {} expected label = {}".format(label, exp_label))
pylab.imshow(mnist.test.images[image_index].reshape((28,28)), cmap=pylab.cm.gray_r)   
pylab.title('Label: %d' % np.argmax(mnist.test.labels[image_index])) 

The training loop is exactly the same as the shallow model. Pretty impressive given that we have a much more complex model this time. 

In [None]:
# cell 9

for t in range(EPOCHS):
  batch_xs, batch_ys = mnist.train.next_batch(BATCH_SIZE)
  loss, _ = sess.run([cross_entropy,train_step], feed_dict={x: batch_xs, y_: batch_ys})
  if t%100 == 0:
    print('train_step = {} loss = {}'.format(t,loss))

print ('cell finished')


The code to test the accuracy is also exactly the same as before. 

In [None]:
# cell 10

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

print(sess.run(accuracy, feed_dict={x:mnist.train.images, y_:mnist.train.labels}))

print(sess.run(accuracy, feed_dict={x:mnist.test.images, y_:mnist.test.labels}))
 

Experiment with the LEARNING_RATE, EPOCHS, and BATCH_SIZE to see if changing these gets better results. 

Go back to cell 8 and modify the `image_index` once more to test the trained model. You will see that it behaves very well now that it is trained. 

Now just for fun add a third layer. Are the results any better?


**IMPORTANT: When you are finished make sure you go to the Jupyter notebook “File” menu above and select “Close and halt”. This will shutdown this notebook and take you back to the Jupyter Notebook Home tab.**