# Learning how to classify digits using Single Layer Neural Network

The MNIST data is hosted on Yann LeCun's website. If you are copying and pasting in the code from this tutorial, start here with these two lines of code which will download and read in the data automatically

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


The MNIST data is split into three parts: 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation). This split is very important: it's essential in machine learning that we have separate data which we don't learn from so that we can make sure that what we've learned actually generalizes!

As mentioned earlier, every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We'll call the images "x" and the labels "y". Both the training set and test set contain images and their corresponding labels; for example the training images are mnist.train.images and the training labels are mnist.train.labels.

Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers.

Let's start with importing tensorflow

In [2]:
import tensorflow as tf

First of all, we need a placeholder where we will feed in the data. Next, we will also need two varaibles: W and b for the weights and biases that we will be learning

In [3]:
x = tf.placeholder(tf.float32, [None, 784], name="x")
W = tf.Variable(tf.zeros([784, 10]), name="W")
b = tf.Variable(tf.zeros([10]), name="b")


Since we are building the computation graph of a single layer neural network, we can use the inbuilt functions that tensorflow has to offer to do the matrix multiplication and addition and then finally passing it through a softmax to get the probabilities of the input images belonging to various classes. 

We create another placeholder `y_` which would contain the actual classes of the image that we have fed in

In [4]:
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10], name='y_')

Finally, let us use the cross entrophy loss to define how we are going to calculate the error

In [5]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

Now, we go ahead and define the trainingOp for our graph that will be responsible for optimizing the Neural Network

In [6]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)


We are now officially done with implementing the computation graph. Let's go ahead and create a Tensorflow session that we will use to run our compute graph

In [7]:
sess = tf.InteractiveSession()


The next thing that we want to do is to initialize the variables defined in the compute graph

In [8]:
tf.global_variables_initializer().run()


Create a writer that you can write to and use it for tensorboard. More on this later.

In [9]:
writer = tf.summary.FileWriter('./graphs', sess.graph)


Let us train for 1000 iterations

In [10]:
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

In [11]:
writer.close()

Finally, let's calculate the error rate

In [12]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}) * 100)

91.6599988937


# RNN

In [13]:
# Parameters
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)

from tensorflow.contrib import rnn

In [42]:
tf.reset_default_graph()

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([n_classes]))
}

def RNN(x, seq_lens, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, n_steps, n_input)
    # Required shape: 'n_steps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    #x = tf.unstack(x, n_steps, 1)

    # Define a lstm cell with tensorflow
    cell = tf.contrib.rnn.GRUCell(n_hidden) 

    # Roll over data. Note that seq length isn't required here 
    outputs, last_state = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32) 
    
    # We won't have to do this for the HW, but since we want ONLY the last output, we can tf.gather it
    outputs = tf.transpose(outputs, [1, 0, 2])
    last_output = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(last_output, weights['out']) + biases['out']

pred = RNN(x, seq_lens, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [43]:
# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        batch_seq_lens = [n_steps]*batch_size
        
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, n_steps, n_input))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        if step % display_step == 0:
            # Calculate batch accuracy
            acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print("Optimization Finished!")

    # Calculate accuracy for mnist test images
    test_data = mnist.test.images.reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

Iter 1280, Minibatch Loss= 1.866238, Training Accuracy= 0.32812
Iter 2560, Minibatch Loss= 1.456627, Training Accuracy= 0.55469
Iter 3840, Minibatch Loss= 1.342952, Training Accuracy= 0.60938
Iter 5120, Minibatch Loss= 1.096903, Training Accuracy= 0.60938
Iter 6400, Minibatch Loss= 0.994761, Training Accuracy= 0.68750
Iter 7680, Minibatch Loss= 0.832612, Training Accuracy= 0.78906
Iter 8960, Minibatch Loss= 0.678093, Training Accuracy= 0.77344
Iter 10240, Minibatch Loss= 0.560492, Training Accuracy= 0.82031
Iter 11520, Minibatch Loss= 0.757103, Training Accuracy= 0.72656
Iter 12800, Minibatch Loss= 0.386088, Training Accuracy= 0.88281
Iter 14080, Minibatch Loss= 0.481685, Training Accuracy= 0.82812
Iter 15360, Minibatch Loss= 0.335150, Training Accuracy= 0.87500
Iter 16640, Minibatch Loss= 0.200314, Training Accuracy= 0.94531
Iter 17920, Minibatch Loss= 0.474601, Training Accuracy= 0.85156
Iter 19200, Minibatch Loss= 0.323833, Training Accuracy= 0.89062
Iter 20480, Minibatch Loss= 0.19