Deep Learning
=============

Assignment 3
------------

Previously in `2_fullyconnected.ipynb`, you trained a logistic regression and a neural network model.

The goal of this assignment is to explore regularization techniques.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
import cPickle as pickle
import numpy as np
import tensorflow as tf

First reload the data we generated in _notmist.ipynb_.

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print 'Training set', train_dataset.shape, train_labels.shape
    print 'Validation set', valid_dataset.shape, valid_labels.shape
    print 'Test set', test_dataset.shape, test_labels.shape

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (18724, 28, 28) (18724,)


Reformat into a shape that's more adapted to the models we're going to train:
- data as a flat matrix,
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
    dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
    # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...]
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print 'Training set', train_dataset.shape, train_labels.shape
print 'Validation set', valid_dataset.shape, valid_labels.shape
print 'Test set', test_dataset.shape, test_labels.shape

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (18724, 784) (18724, 10)


In [4]:
def accuracy(predictions, labels):
    return (100.*np.sum(np.argmax(predictions,1) == np.argmax(labels, 1))/predictions.shape[0])

---
Problem 1
---------

Introduce and tune L2 regularization for both logistic and neural network models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compue the L2 loss for a tensor `t` using `nn.l2_loss(t)`. The right amount of regularization should improve your validation / test accuracy.

---

### For logistic model

In [5]:
batch_size = 128
reg_beta = 0.01

graph = tf.Graph()
with graph.as_default():
    # Input data. For the training data, we use a placeholder that will be fed
    # at run time with a training minibatch.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    weights = tf.Variable(tf.truncated_normal([image_size * image_size, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))

    # Training computation.
    logits = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) \
           + reg_beta * tf.nn.l2_loss(weights)

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
    test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

In [6]:
num_steps = 3001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print "Initialized"
    for step in xrange(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 500 == 0):
            print "Minibatch loss at step", step, ":", l
            print "Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels)
            print "Validation accuracy: %.1f%%" % accuracy(
            valid_prediction.eval(), valid_labels)
    print "Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels)

Initialized
Minibatch loss at step 0 : 45.8648
Minibatch accuracy: 13.3%
Validation accuracy: 17.2%
Minibatch loss at step 500 : 0.936209
Minibatch accuracy: 82.8%
Validation accuracy: 79.7%
Minibatch loss at step 1000 : 0.93504
Minibatch accuracy: 78.1%
Validation accuracy: 81.8%
Minibatch loss at step 1500 : 0.810273
Minibatch accuracy: 82.0%
Validation accuracy: 81.5%
Minibatch loss at step 2000 : 0.913523
Minibatch accuracy: 77.3%
Validation accuracy: 81.1%
Minibatch loss at step 2500 : 0.879309
Minibatch accuracy: 75.0%
Validation accuracy: 81.2%
Minibatch loss at step 3000 : 0.794345
Minibatch accuracy: 78.9%
Validation accuracy: 81.2%
Test accuracy: 88.2%


#### Results

Test accuracy increased from 85.9% to 88.2%, using a regularization paramater of 0.01

### For Single-Layer Neural Network

In [33]:
def get_nn_performance(num_steps= 3001,
                       batch_size=128,
                       n_hidden = 1024,
                       reg_beta = 0.01,
                       print_freq = 500,
                       train_data_size = None,
                       dropout=True):
    if train_data_size is None:
        train_data_size = train_labels.shape[0]
    graph = tf.Graph()
    with graph.as_default():
        # Input data. For the training data, we use a placeholder that will be fed
        # at run time with a training minibatch.

        # tf Graph input
        tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
        tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        # Variables.
        weights = {'h1': tf.Variable(tf.truncated_normal([image_size*image_size, n_hidden])),
                   'out': tf.Variable(tf.truncated_normal([n_hidden, num_labels]))}
        biases = {'b1': tf.Variable(tf.zeros([n_hidden])),
                  'out': tf.Variable(tf.zeros([num_labels]))}

        # Training computation.
        layer_1 = tf.nn.relu(tf.add(tf.matmul(tf_train_dataset, weights['h1']), biases['b1'])) 
        #keep_prob = tf.placeholder(tf.float32, shape=[image_size*image_size, n_hidden])
        keep_prob = tf.placeholder(tf.float32)
        layer_1_drop = tf.nn.dropout(layer_1, keep_prob)
        logits = tf.matmul(layer_1_drop, weights['out']) + biases['out']
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) \
               + reg_beta * (tf.nn.l2_loss(weights['h1']) + tf.nn.l2_loss(weights['out']))

        # Optimizer.
        optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

        # Predictions for the training, validation, and test data.
        train_prediction = tf.nn.softmax(logits)
        valid_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.add(tf.matmul(tf_valid_dataset, weights['h1']), biases['b1'])), weights['out']) + biases['out'])
        test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.add(tf.matmul(tf_test_dataset, weights['h1']), biases['b1'])), weights['out']) + biases['out'])
    
    with tf.Session(graph=graph) as session:
        tf.initialize_all_variables().run()
        print "Initialized"
        for step in xrange(num_steps):
            # Pick an offset within the training data, which has been randomized.
            # Note: we could use better randomization across epochs.
            offset = (step * batch_size) % (train_data_size - batch_size)
            # Generate a minibatch.
            batch_data = train_dataset[offset:(offset + batch_size), :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            # Prepare a dictionary telling the session where to feed the minibatch.
            # The key of the dictionary is the placeholder node of the graph to be fed,
            # and the value is the numpy array to feed to it.
            if dropout:
                feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels, keep_prob: 0.5}
            else:
                feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels, keep_prob: 1.}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % print_freq == 0):
                print "Minibatch loss at step", step, ":", l
                print "Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels)
                print "Validation accuracy: %.1f%%" % accuracy(
                    valid_prediction.eval(feed_dict = {keep_prob: 1.}), valid_labels)
        print "Test accuracy: %.1f%%" % accuracy(
            test_prediction.eval(feed_dict = {keep_prob: 1.}), test_labels)
    return graph

#### Results

In [7]:
get_nn_performance(reg_beta=0., dropout=False)

Initialized
Minibatch loss at step 0 : 342.41
Minibatch accuracy: 13.3%
Validation accuracy: 40.5%
Minibatch loss at step 500 : 9.13228
Minibatch accuracy: 80.5%
Validation accuracy: 80.9%
Minibatch loss at step 1000 : 13.0663
Minibatch accuracy: 82.0%
Validation accuracy: 80.9%
Minibatch loss at step 1500 : 20.4485
Minibatch accuracy: 75.8%
Validation accuracy: 78.9%
Minibatch loss at step 2000 : 8.90561
Minibatch accuracy: 78.9%
Validation accuracy: 82.2%
Minibatch loss at step 2500 : 5.10572
Minibatch accuracy: 77.3%
Validation accuracy: 82.0%
Minibatch loss at step 3000 : 2.84721
Minibatch accuracy: 83.6%
Validation accuracy: 82.1%
Test accuracy: 88.8%


<tensorflow.python.framework.ops.Graph at 0x7fb3dc8474d0>

In [38]:
get_nn_performance(dropout=False)

Initialized
Minibatch loss at step 0 : 3552.45
Minibatch accuracy: 8.6%
Validation accuracy: 41.0%
Minibatch loss at step 500 : 21.2522
Minibatch accuracy: 84.4%
Validation accuracy: 83.5%
Minibatch loss at step 1000 : 0.966424
Minibatch accuracy: 80.5%
Validation accuracy: 84.2%
Minibatch loss at step 1500 : 0.818788
Minibatch accuracy: 83.6%
Validation accuracy: 84.2%
Minibatch loss at step 2000 : 0.874688
Minibatch accuracy: 78.9%
Validation accuracy: 84.0%
Minibatch loss at step 2500 : 0.865266
Minibatch accuracy: 79.7%
Validation accuracy: 83.8%
Minibatch loss at step 3000 : 0.778686
Minibatch accuracy: 82.8%
Validation accuracy: 84.4%
Test accuracy: 90.3%


<tensorflow.python.framework.ops.Graph at 0x7fb37930ba10>

Test accuracy increased from 88.8% to 90.3% using a regularization parameter of 0.01

---
Problem 2
---------
Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

---

In [36]:
get_nn_performance(train_data_size=200, dropout=False)

Initialized
Minibatch loss at step 0 : 3526.05
Minibatch accuracy: 10.2%
Validation accuracy: 36.1%
Minibatch loss at step 500 : 21.0261
Minibatch accuracy: 100.0%
Validation accuracy: 73.2%
Minibatch loss at step 1000 : 0.380887
Minibatch accuracy: 100.0%
Validation accuracy: 74.1%
Minibatch loss at step 1500 : 0.220484
Minibatch accuracy: 100.0%
Validation accuracy: 73.7%
Minibatch loss at step 2000 : 0.210154
Minibatch accuracy: 100.0%
Validation accuracy: 73.8%
Minibatch loss at step 2500 : 0.202831
Minibatch accuracy: 100.0%
Validation accuracy: 73.8%
Minibatch loss at step 3000 : 0.201045
Minibatch accuracy: 100.0%
Validation accuracy: 73.7%
Test accuracy: 80.8%


<tensorflow.python.framework.ops.Graph at 0x7fb3cafa6b90>

#### Results

Very high minibatch accuracy, but somewhat low validation accuracy and test accuracy

---
Problem 3
---------
Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides `nn.dropout()` for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?

---

In [39]:
get_nn_performance(dropout=True)

Initialized
Minibatch loss at step 0 : 3590.91
Minibatch accuracy: 13.3%
Validation accuracy: 14.9%
Minibatch loss at step 500 : 21.486
Minibatch accuracy: 82.0%
Validation accuracy: 83.0%
Minibatch loss at step 1000 : 1.03931
Minibatch accuracy: 79.7%
Validation accuracy: 83.7%
Minibatch loss at step 1500 : 0.880907
Minibatch accuracy: 81.2%
Validation accuracy: 83.9%
Minibatch loss at step 2000 : 0.909137
Minibatch accuracy: 79.7%
Validation accuracy: 83.6%
Minibatch loss at step 2500 : 0.945273
Minibatch accuracy: 78.1%
Validation accuracy: 83.4%
Minibatch loss at step 3000 : 0.837463
Minibatch accuracy: 80.5%
Validation accuracy: 83.7%
Test accuracy: 89.9%


<tensorflow.python.framework.ops.Graph at 0x7fb3dc1001d0>

In [37]:
get_nn_performance(train_data_size=200, dropout=True)

Initialized
Minibatch loss at step 0 : 3624.46
Minibatch accuracy: 15.6%
Validation accuracy: 26.6%
Minibatch loss at step 500 : 21.0762
Minibatch accuracy: 100.0%
Validation accuracy: 74.0%
Minibatch loss at step 1000 : 0.390473
Minibatch accuracy: 100.0%
Validation accuracy: 74.4%
Minibatch loss at step 1500 : 0.244134
Minibatch accuracy: 100.0%
Validation accuracy: 74.6%
Minibatch loss at step 2000 : 0.224882
Minibatch accuracy: 100.0%
Validation accuracy: 74.4%
Minibatch loss at step 2500 : 0.220285
Minibatch accuracy: 100.0%
Validation accuracy: 74.6%
Minibatch loss at step 3000 : 0.215421
Minibatch accuracy: 100.0%
Validation accuracy: 74.4%
Test accuracy: 81.5%


<tensorflow.python.framework.ops.Graph at 0x7fb3caf78c50>

---
Problem 4
---------

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is [97.1%](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html?showComment=1391023266211#c8758720086795711595).

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.5, step, ...)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 
 ---


In [45]:
def get_nn_performance_two_hidden(num_steps=3001, batch_size=128, n_hidden=1024, n_hidden_2=128, reg_beta=0.01,
        print_freq=500, train_data_size=None, starter_learning_rate=0.1, dropout_rate=0.5, exp_learning_rate=True):
    if train_data_size is None:
        train_data_size = train_labels.shape[0]
    graph = tf.Graph()
    with graph.as_default():
        # Input data. For the training data, we use a placeholder that will be fed
        # at run time with a training minibatch.

        # tf Graph input
        if exp_learning_rate:
            global_step = tf.Variable(0, trainable=False)
            learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                                       100000, 0.96, staircase=True)
        tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
        tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        # Variables.
        weights = {'h1': tf.Variable(tf.truncated_normal([image_size*image_size, n_hidden])),
                   'h2': tf.Variable(tf.truncated_normal([n_hidden, n_hidden_2])),
                   'out': tf.Variable(tf.truncated_normal([n_hidden_2, num_labels]))}
        biases = {'b1': tf.Variable(tf.zeros([n_hidden])),
                  'b2': tf.Variable(tf.zeros([n_hidden_2])),
                  'out': tf.Variable(tf.zeros([num_labels]))}

        # Training computation.
        layer_1 = tf.nn.relu(tf.add(tf.matmul(tf_train_dataset, weights['h1']), biases['b1']))
        keep_prob = tf.placeholder(tf.float32)
        layer_1_drop = tf.nn.dropout(layer_1, keep_prob)
        layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1_drop, weights['h2']), biases['b2']))
        #later_2_drop = tf.nn.dropout(layer_2, keep_prob)
        #logits = tf.matmul(layer_1_drop, weights['out']) + biases['out']
        logits = tf.matmul(layer_2, weights['out']) + biases['out']
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) \
               + reg_beta * (tf.nn.l2_loss(weights['h1']) + tf.nn.l2_loss(weights['h2']) + tf.nn.l2_loss(weights['out']))

        # Optimizer.
        optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
        #optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

        # Predictions for the training, validation, and test data.
        train_prediction = tf.nn.softmax(logits)
        
        #valid_layer_1 = tf.nn.relu(tf.add(tf.matmul(valid_dataset, weights['h1']), biases['b1']))
        #valid_layer_2 = tf.nn.relu(tf.add(tf.matmul(tf.nn.relu(tf.add(tf.matmul(valid_dataset, weights['h1']), biases['b1'])), weights['h2']), biases['b2']))
        valid_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.add(tf.matmul(
                            tf.nn.relu(tf.add(tf.matmul(valid_dataset, weights['h1']), biases['b1'])), weights['h2']), biases['b2'])),
                                                   weights['out']) + biases['out'])
        
        test_prediction = tf.nn.softmax(tf.matmul(tf.nn.relu(tf.add(tf.matmul(
                            tf.nn.relu(tf.add(tf.matmul(test_dataset, weights['h1']), biases['b1'])), weights['h2']), biases['b2'])),
                                                   weights['out']) + biases['out'])
    
    with tf.Session(graph=graph) as session:
        tf.initialize_all_variables().run()
        print "Initialized"
        for step in xrange(num_steps):
            # Pick an offset within the training data, which has been randomized.
            # Note: we could use better randomization across epochs.
            offset = (step * batch_size) % (train_data_size - batch_size)
            # Generate a minibatch.
            batch_data = train_dataset[offset:(offset + batch_size), :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            # Prepare a dictionary telling the session where to feed the minibatch.
            # The key of the dictionary is the placeholder node of the graph to be fed,
            # and the value is the numpy array to feed to it.
            feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels, keep_prob: dropout_rate}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % print_freq == 0):
                print "Minibatch loss at step", step, ":", l
                print "Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels)
                print "Validation accuracy: %.1f%%" % accuracy(
                    valid_prediction.eval(feed_dict = {keep_prob: 1.}), valid_labels)
        print "Test accuracy: %.1f%%" % accuracy(
            test_prediction.eval(feed_dict = {keep_prob: 1.}), test_labels)
    return graph

In [81]:
help(get_nn_performance_two_hidden)

Help on function get_nn_performance_two_hidden in module __main__:

get_nn_performance_two_hidden(num_steps=3001, batch_size=128, n_hidden=1024, n_hidden_2=128, reg_beta=0.01, print_freq=500, train_data_size=None, starter_learning_rate=0.1, dropout_rate=0.5, exp_learning_rate=True)



Using Gradient Descent Optimizer

In [84]:
get_nn_performance_two_hidden(num_steps=7000, batch_size=128,
                              print_freq=250, starter_learning_rate=0.05,
                              dropout_rate=0.8, exp_learning_rate=True)

Initialized
Minibatch loss at step 0 : 5817.33
Minibatch accuracy: 7.8%
Validation accuracy: 25.5%
Minibatch loss at step 250 : 2822.95
Minibatch accuracy: 14.1%
Validation accuracy: 15.5%
Minibatch loss at step 500 : 2198.04
Minibatch accuracy: 18.0%
Validation accuracy: 15.4%
Minibatch loss at step 750 : 1712.41
Minibatch accuracy: 21.1%
Validation accuracy: 20.7%
Minibatch loss at step 1000 : 1333.81
Minibatch accuracy: 25.8%
Validation accuracy: 23.4%
Minibatch loss at step 1250 : 1038.74
Minibatch accuracy: 37.5%
Validation accuracy: 34.5%
Minibatch loss at step 1500 : 809.177
Minibatch accuracy: 37.5%
Validation accuracy: 41.0%
Minibatch loss at step 1750 : 630.489
Minibatch accuracy: 39.1%
Validation accuracy: 45.2%
Minibatch loss at step 2000 : 491.145
Minibatch accuracy: 50.0%
Validation accuracy: 50.6%
Minibatch loss at step 2250 : 382.706
Minibatch accuracy: 53.9%
Validation accuracy: 55.3%
Minibatch loss at step 2500 : 298.134
Minibatch accuracy: 53.1%
Validation accuracy: 

<tensorflow.python.framework.ops.Graph at 0x7fb3581ced90>

In [47]:
help(get_nn_performance_two_hidden)

Help on function get_nn_performance_two_hidden in module __main__:

get_nn_performance_two_hidden(num_steps=3001, batch_size=128, n_hidden=1024, n_hidden_2=128, reg_beta=0.01, print_freq=500, train_data_size=None, starter_learning_rate=0.1, dropout_rate=0.5, exp_learning_rate=True)



In [6]:
get_nn_performance_two_hidden(num_steps=7000, batch_size=256,
                              print_freq=250, starter_learning_rate=0.05,
                              dropout_rate=0.8, exp_learning_rate=True)

Initialized
Minibatch loss at step 0 : 6154.15
Minibatch accuracy: 12.9%
Validation accuracy: 18.6%
Minibatch loss at step 250 : 2872.94
Minibatch accuracy: 11.7%
Validation accuracy: 11.3%
Minibatch loss at step 500 : 2237.67
Minibatch accuracy: 15.2%
Validation accuracy: 12.8%
Minibatch loss at step 750 : 1742.9
Minibatch accuracy: 18.8%
Validation accuracy: 18.4%
Minibatch loss at step 1000 : 1357.58
Minibatch accuracy: 23.0%
Validation accuracy: 24.2%
Minibatch loss at step 1250 : 1057.4
Minibatch accuracy: 26.2%
Validation accuracy: 28.7%
Minibatch loss at step 1500 : 823.671
Minibatch accuracy: 35.5%
Validation accuracy: 37.3%
Minibatch loss at step 1750 : 641.573
Minibatch accuracy: 43.0%
Validation accuracy: 41.6%
Minibatch loss at step 2000 : 500.082
Minibatch accuracy: 41.4%
Validation accuracy: 45.3%
Minibatch loss at step 2250 : 389.496
Minibatch accuracy: 54.3%
Validation accuracy: 50.3%
Minibatch loss at step 2500 : 303.504
Minibatch accuracy: 57.0%
Validation accuracy: 5

<tensorflow.python.framework.ops.Graph at 0x7fc3d28da110>

In [49]:
get_nn_performance_two_hidden(num_steps=7001, batch_size=512,
                              print_freq=500, starter_learning_rate=0.05,
                              dropout_rate=0.8, exp_learning_rate=True,
                              reg_beta=0.01)

Initialized
Minibatch loss at step 0 : 6224.33
Minibatch accuracy: 12.9%
Validation accuracy: 29.6%
Minibatch loss at step 500 : 2206.45
Minibatch accuracy: 14.3%
Validation accuracy: 14.9%
Minibatch loss at step 1000 : 1338.47
Minibatch accuracy: 31.1%
Validation accuracy: 29.9%
Minibatch loss at step 1500 : 811.761
Minibatch accuracy: 52.5%
Validation accuracy: 56.7%
Minibatch loss at step 2000 : 492.366
Minibatch accuracy: 70.7%
Validation accuracy: 70.3%
Minibatch loss at step 2500 : 298.866
Minibatch accuracy: 72.5%
Validation accuracy: 76.9%
Minibatch loss at step 3000 : 181.452
Minibatch accuracy: 77.9%
Validation accuracy: 80.0%
Minibatch loss at step 3500 : 110.197
Minibatch accuracy: 81.1%
Validation accuracy: 81.9%
Minibatch loss at step 4000 : 67.1049
Minibatch accuracy: 79.7%
Validation accuracy: 83.0%
Minibatch loss at step 4500 : 40.88
Minibatch accuracy: 83.8%
Validation accuracy: 83.7%
Minibatch loss at step 5000 : 25.0898
Minibatch accuracy: 81.8%
Validation accuracy:

<tensorflow.python.framework.ops.Graph at 0x7fb3e69f2f50>

In [51]:
get_nn_performance_two_hidden(num_steps=10001, batch_size=1024,
                              print_freq=500, starter_learning_rate=0.05,
                              dropout_rate=0.8, exp_learning_rate=True,
                              reg_beta=0.01, n_hidden_2=256)

Initialized
Minibatch loss at step 0 : 7063.9
Minibatch accuracy: 13.7%
Validation accuracy: 25.9%
Minibatch loss at step 500 : 2549.72
Minibatch accuracy: 24.8%
Validation accuracy: 22.3%
Minibatch loss at step 1000 : 1546.4
Minibatch accuracy: 38.3%
Validation accuracy: 38.9%
Minibatch loss at step 1500 : 937.861
Minibatch accuracy: 56.6%
Validation accuracy: 56.7%
Minibatch loss at step 2000 : 568.936
Minibatch accuracy: 67.0%
Validation accuracy: 69.6%
Minibatch loss at step 2500 : 345.204
Minibatch accuracy: 75.0%
Validation accuracy: 77.1%
Minibatch loss at step 3000 : 209.553
Minibatch accuracy: 77.4%
Validation accuracy: 80.1%
Minibatch loss at step 3500 : 127.281
Minibatch accuracy: 81.1%
Validation accuracy: 81.7%
Minibatch loss at step 4000 : 77.486
Minibatch accuracy: 79.9%
Validation accuracy: 82.9%
Minibatch loss at step 4500 : 47.1753
Minibatch accuracy: 83.7%
Validation accuracy: 83.7%
Minibatch loss at step 5000 : 28.7924
Minibatch accuracy: 86.1%
Validation accuracy: 

<tensorflow.python.framework.ops.Graph at 0x7fb3e24d1810>

In [13]:
help(get_nn_performance_two_hidden)

Help on function get_nn_performance_two_hidden in module __main__:

get_nn_performance_two_hidden(num_steps=3001, batch_size=128, n_hidden=1024, n_hidden_2=128, reg_beta=0.01, print_freq=500, train_data_size=None, starter_learning_rate=0.1, dropout_rate=0.5, exp_learning_rate=True)



In [21]:
get_nn_performance_two_hidden(num_steps=10000, batch_size=256,
                              print_freq=500, starter_learning_rate=0.05,
                              dropout_rate=0.5, exp_learning_rate=True,
                              reg_beta=0.005, n_hidden_2=256, n_hidden=256)

Initialized
Minibatch loss at step 0 : 3199.26
Minibatch accuracy: 7.8%
Validation accuracy: 19.6%
Minibatch loss at step 500 : 397.628
Minibatch accuracy: 57.0%
Validation accuracy: 63.6%
Minibatch loss at step 1000 : 305.804
Minibatch accuracy: 60.5%
Validation accuracy: 66.3%
Minibatch loss at step 1500 : 238.092
Minibatch accuracy: 64.8%
Validation accuracy: 70.2%
Minibatch loss at step 2000 : 185.65
Minibatch accuracy: 67.2%
Validation accuracy: 73.2%
Minibatch loss at step 2500 : 144.419
Minibatch accuracy: 70.7%
Validation accuracy: 75.4%
Minibatch loss at step 3000 : 112.795
Minibatch accuracy: 70.3%
Validation accuracy: 76.9%
Minibatch loss at step 3500 : 87.7985
Minibatch accuracy: 76.6%
Validation accuracy: 78.2%
Minibatch loss at step 4000 : 68.5501
Minibatch accuracy: 76.6%
Validation accuracy: 79.4%
Minibatch loss at step 4500 : 53.5187
Minibatch accuracy: 78.5%
Validation accuracy: 80.0%
Minibatch loss at step 5000 : 41.8438
Minibatch accuracy: 78.9%
Validation accuracy:

<tensorflow.python.framework.ops.Graph at 0x7fc38e31fe90>

In [35]:
get_nn_performance_two_hidden(num_steps=10001, batch_size=256,
                              print_freq=500, starter_learning_rate=0.1,
                              dropout_rate=0.5, exp_learning_rate=True,
                              reg_beta=0.01, n_hidden_2=256, n_hidden=256)

Initialized
Minibatch loss at step 0 : 3199.4
Minibatch accuracy: 11.7%
Validation accuracy: 14.6%
Minibatch loss at step 500 : 422.14
Minibatch accuracy: 22.3%
Validation accuracy: 17.1%
Minibatch loss at step 1000 : 155.566
Minibatch accuracy: 59.0%
Validation accuracy: 57.5%
Minibatch loss at step 1500 : 57.6797
Minibatch accuracy: 67.2%
Validation accuracy: 75.5%
Minibatch loss at step 2000 : 21.8562
Minibatch accuracy: 72.7%
Validation accuracy: 80.8%
Minibatch loss at step 2500 : 8.38463
Minibatch accuracy: 82.8%
Validation accuracy: 82.5%
Minibatch loss at step 3000 : 3.67435
Minibatch accuracy: 78.9%
Validation accuracy: 83.2%
Minibatch loss at step 3500 : 1.87248
Minibatch accuracy: 79.3%
Validation accuracy: 83.5%
Minibatch loss at step 4000 : 1.2017
Minibatch accuracy: 81.6%
Validation accuracy: 84.0%
Minibatch loss at step 4500 : 0.983122
Minibatch accuracy: 81.6%
Validation accuracy: 84.2%
Minibatch loss at step 5000 : 0.863472
Minibatch accuracy: 82.0%
Validation accuracy

<tensorflow.python.framework.ops.Graph at 0x7fc38d8c9b90>

In [36]:
get_nn_performance_two_hidden(num_steps=10001, batch_size=256,
                              print_freq=500, starter_learning_rate=0.1,
                              dropout_rate=0.5, exp_learning_rate=True,
                              reg_beta=0.01, n_hidden_2=512, n_hidden=512)

Initialized
Minibatch loss at step 0 : 6972.53
Minibatch accuracy: 11.3%
Validation accuracy: 10.2%
Minibatch loss at step 500 : 1756.74
Minibatch accuracy: 11.3%
Validation accuracy: 10.7%
Minibatch loss at step 1000 : 647.301
Minibatch accuracy: 12.9%
Validation accuracy: 14.1%
Minibatch loss at step 1500 : 239.123
Minibatch accuracy: 29.3%
Validation accuracy: 27.7%
Minibatch loss at step 2000 : 88.6972
Minibatch accuracy: 43.8%
Validation accuracy: 49.1%
Minibatch loss at step 2500 : 32.8831
Minibatch accuracy: 75.8%
Validation accuracy: 77.2%
Minibatch loss at step 3000 : 12.697
Minibatch accuracy: 77.3%
Validation accuracy: 82.2%
Minibatch loss at step 3500 : 5.20093
Minibatch accuracy: 79.7%
Validation accuracy: 82.8%
Minibatch loss at step 4000 : 2.44537
Minibatch accuracy: 80.9%
Validation accuracy: 83.5%
Minibatch loss at step 4500 : 1.37343
Minibatch accuracy: 80.1%
Validation accuracy: 83.8%
Minibatch loss at step 5000 : 1.07574
Minibatch accuracy: 81.6%
Validation accuracy

<tensorflow.python.framework.ops.Graph at 0x7fc38f237750>

In [30]:
def get_nn_performance_three_hidden(num_steps=3001, batch_size=128, n_hidden=128, n_hidden_2=128, n_hidden_3=128, 
                                    reg_beta=0.01, print_freq=500, train_data_size=None, starter_learning_rate=0.1,
                                    dropout_rate=0.5, exp_learning_rate=True):
    if train_data_size is None:
        train_data_size = train_labels.shape[0]
    graph = tf.Graph()
    with graph.as_default():
        # Input data. For the training data, we use a placeholder that will be fed
        # at run time with a training minibatch.

        # tf Graph input
        if exp_learning_rate:
            global_step = tf.Variable(0, trainable=False)
            learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                                       100000, 0.96, staircase=True)
        tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))
        tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        # Variables.
        weights = {'h1': tf.Variable(tf.truncated_normal([image_size*image_size, n_hidden])),
                   'h2': tf.Variable(tf.truncated_normal([n_hidden, n_hidden_2])),
                   'h3': tf.Variable(tf.truncated_normal([n_hidden_2, n_hidden_3])),
                   'out': tf.Variable(tf.truncated_normal([n_hidden_3, num_labels]))}
        biases = {'b1': tf.Variable(tf.zeros([n_hidden])),
                  'b2': tf.Variable(tf.zeros([n_hidden_2])),
                  'b3': tf.Variable(tf.zeros([n_hidden_3])),
                  'out': tf.Variable(tf.zeros([num_labels]))}

        # Training computation.
        layer_1 = tf.nn.relu(tf.add(tf.matmul(tf_train_dataset, weights['h1']), biases['b1']))
        keep_prob = tf.placeholder(tf.float32)
        layer_1_drop = tf.nn.dropout(layer_1, keep_prob)
        layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1_drop, weights['h2']), biases['b2']))
        layer_3 = tf.nn.relu(tf.add(tf.matmul(layer_2, weights['h3']), biases['b3']))
        #later_2_drop = tf.nn.dropout(layer_2, keep_prob)
        #logits = tf.matmul(layer_1_drop, weights['out']) + biases['out']
        logits = tf.matmul(layer_3, weights['out']) + biases['out']
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels)) \
               + reg_beta * (tf.nn.l2_loss(weights['h1']) + tf.nn.l2_loss(weights['h2']) + tf.nn.l2_loss(weights['h3']) 
                             + tf.nn.l2_loss(weights['out']))

        # Optimizer.
        #optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
        optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

        # Predictions for the training, validation, and test data.
        train_prediction = tf.nn.softmax(logits)
        
        valid_prediction = tf.matmul(
            tf.nn.relu(tf.add(tf.matmul(
                        tf.nn.relu(tf.add(tf.matmul(
                                    tf.nn.relu(tf.add(tf.matmul(valid_dataset, weights['h1']), biases['b1'])), 
                                    weights['h2']), biases['b2'])),
                        weights['h3']), biases['b3'])),
            weights['out']) + biases['out']
        
        test_prediction = tf.matmul(
            tf.nn.relu(tf.add(tf.matmul(
                        tf.nn.relu(tf.add(tf.matmul(
                                    tf.nn.relu(tf.add(tf.matmul(test_dataset, weights['h1']), biases['b1'])), 
                                    weights['h2']), biases['b2'])),
                        weights['h3']), biases['b3'])),
            weights['out']) + biases['out']
    
    with tf.Session(graph=graph) as session:
        tf.initialize_all_variables().run()
        print "Initialized"
        for step in xrange(num_steps):
            # Pick an offset within the training data, which has been randomized.
            # Note: we could use better randomization across epochs.
            offset = (step * batch_size) % (train_data_size - batch_size)
            # Generate a minibatch.
            batch_data = train_dataset[offset:(offset + batch_size), :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            # Prepare a dictionary telling the session where to feed the minibatch.
            # The key of the dictionary is the placeholder node of the graph to be fed,
            # and the value is the numpy array to feed to it.
            feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels, keep_prob: dropout_rate}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % print_freq == 0):
                print "Minibatch loss at step", step, ":", l
                print "Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels)
                print "Validation accuracy: %.1f%%" % accuracy(
                    valid_prediction.eval(feed_dict = {keep_prob: 1.}), valid_labels)
        print "Test accuracy: %.1f%%" % accuracy(
            test_prediction.eval(feed_dict = {keep_prob: 1.}), test_labels)
    return graph

In [38]:
get_nn_performance_three_hidden(num_steps=1201, batch_size=4096,
                              print_freq=100, starter_learning_rate=0.035,
                              dropout_rate=0.5, exp_learning_rate=True,
                              reg_beta=0.02)

Initialized
Minibatch loss at step 0 : 8856.51
Minibatch accuracy: 10.1%
Validation accuracy: 21.6%
Minibatch loss at step 100 : 670.429
Minibatch accuracy: 33.9%
Validation accuracy: 34.9%
Minibatch loss at step 200 : 344.73
Minibatch accuracy: 25.2%
Validation accuracy: 21.4%
Minibatch loss at step 300 : 186.1
Minibatch accuracy: 40.4%
Validation accuracy: 44.8%
Minibatch loss at step 400 : 110.403
Minibatch accuracy: 55.5%
Validation accuracy: 59.7%
Minibatch loss at step 500 : 71.0393
Minibatch accuracy: 64.3%
Validation accuracy: 65.8%
Minibatch loss at step 600 : 48.4381
Minibatch accuracy: 72.8%
Validation accuracy: 78.1%
Minibatch loss at step 700 : 34.4451
Minibatch accuracy: 77.4%
Validation accuracy: 81.9%
Minibatch loss at step 800 : 25.186
Minibatch accuracy: 81.5%
Validation accuracy: 83.3%
Minibatch loss at step 900 : 18.9482
Minibatch accuracy: 80.7%
Validation accuracy: 83.9%
Minibatch loss at step 1000 : 14.5037
Minibatch accuracy: 81.8%
Validation accuracy: 84.0%
Min

<tensorflow.python.framework.ops.Graph at 0x7fb3e3afdfd0>