Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = '../../notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape((-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) \
            / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [17]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 1024

graph = tf.Graph()

with graph.as_default():

    # Input data.
    train = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    
    valid = tf.constant(valid_dataset)
    test = tf.constant(test_dataset)

    # Variables.
    w1 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
    b1 = tf.Variable(tf.zeros([depth]))
    
    w2 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[depth]))
    
    w3 = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    
    w4 = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape=[num_labels]))

    # Model.
    def model(data):
        conv = tf.nn.conv2d(data, w1, [1, 2, 2, 1], padding='SAME') + b1
        hidden = tf.nn.relu(conv)
        
        conv = tf.nn.conv2d(hidden, w2, [1, 2, 2, 1], padding='SAME') + b2
        hidden = tf.nn.relu(conv)
        
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        
        hidden = tf.nn.xw_plus_b(reshape, w3, b3)
        hidden = tf.nn.relu(hidden)
        
        return tf.nn.xw_plus_b(hidden, w4, b4)

    # Training computation.
    logits = model(train)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(valid))
    test_prediction = tf.nn.softmax(model(test))

In [18]:
nb_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(nb_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {train : batch_data, labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
    print('Validation accuracy: %.1f%%' % accuracy(
            valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 5.044826
Minibatch accuracy: 12.5%
Minibatch loss at step 50: 2.257859
Minibatch accuracy: 31.2%
Minibatch loss at step 100: 2.061034
Minibatch accuracy: 18.8%
Minibatch loss at step 150: 1.267842
Minibatch accuracy: 62.5%
Minibatch loss at step 200: 0.890392
Minibatch accuracy: 68.8%
Minibatch loss at step 250: 0.719255
Minibatch accuracy: 81.2%
Minibatch loss at step 300: 0.549546
Minibatch accuracy: 87.5%
Minibatch loss at step 350: 0.719501
Minibatch accuracy: 75.0%
Minibatch loss at step 400: 0.470098
Minibatch accuracy: 87.5%
Minibatch loss at step 450: 0.728643
Minibatch accuracy: 81.2%
Minibatch loss at step 500: 0.583343
Minibatch accuracy: 75.0%
Minibatch loss at step 550: 0.446994
Minibatch accuracy: 81.2%
Minibatch loss at step 600: 0.543558
Minibatch accuracy: 81.2%
Minibatch loss at step 650: 0.702579
Minibatch accuracy: 75.0%
Minibatch loss at step 700: 0.950412
Minibatch accuracy: 68.8%
Minibatch loss at step 750: 0.737710
Minibatch

---
Problem 1
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [19]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    # Input data.
    train = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    
    valid = tf.constant(valid_dataset)
    test = tf.constant(test_dataset)

    # Variables.
    w1 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
    b1 = tf.Variable(tf.zeros([depth]))
    
    w2 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[depth]))
    
    w3 = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    
    w4 = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape=[num_labels]))

    # Model.
    def model(data):
        conv = tf.nn.conv2d(data, w1, [1, 1, 1, 1], padding='SAME') + b1
        hidden = tf.nn.relu(conv)
        maxpool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        conv = tf.nn.conv2d(maxpool, w2, [1, 1, 1, 1], padding='SAME') + b2
        hidden = tf.nn.relu(conv)
        maxpool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        shape = maxpool.get_shape().as_list()
        reshape = tf.reshape(maxpool, [shape[0], shape[1] * shape[2] * shape[3]])
        
        hidden = tf.nn.xw_plus_b(reshape, w3, b3)
        hidden = tf.nn.relu(hidden)
        
        return tf.nn.xw_plus_b(hidden, w4, b4)

    # Training computation.
    logits = model(train)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(valid))
    test_prediction = tf.nn.softmax(model(test))

In [20]:
nb_steps = 1001
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(nb_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {train : batch_data, labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
    print('Validation accuracy: %.1f%%' % accuracy(
            valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.178649
Minibatch accuracy: 6.2%
Minibatch loss at step 50: 2.189816
Minibatch accuracy: 6.2%
Minibatch loss at step 100: 1.224128
Minibatch accuracy: 56.2%
Minibatch loss at step 150: 1.375121
Minibatch accuracy: 62.5%
Minibatch loss at step 200: 0.750853
Minibatch accuracy: 75.0%
Minibatch loss at step 250: 0.603216
Minibatch accuracy: 81.2%
Minibatch loss at step 300: 0.711165
Minibatch accuracy: 81.2%
Minibatch loss at step 350: 0.956233
Minibatch accuracy: 75.0%
Minibatch loss at step 400: 0.539745
Minibatch accuracy: 87.5%
Minibatch loss at step 450: 0.989764
Minibatch accuracy: 68.8%
Minibatch loss at step 500: 0.467473
Minibatch accuracy: 81.2%
Minibatch loss at step 550: 0.420324
Minibatch accuracy: 87.5%
Minibatch loss at step 600: 0.609977
Minibatch accuracy: 81.2%
Minibatch loss at step 650: 0.727440
Minibatch accuracy: 81.2%
Minibatch loss at step 700: 0.924767
Minibatch accuracy: 68.8%
Minibatch loss at step 750: 0.714200
Minibatch a

---
Problem 2
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

In [None]:
batch_size = 16
patch_size = 5
depth = 32
num_hidden = 1024

graph = tf.Graph()

with graph.as_default():

    # Input data.
    train = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    
    valid = tf.constant(valid_dataset)
    test = tf.constant(test_dataset)

    # Variables.
    w1 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
    b1 = tf.Variable(tf.zeros([depth]))
    
    w2 = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth * 2], stddev=0.1))
    b2 = tf.Variable(tf.constant(1.0, shape=[depth * 2]))
    
    w3 = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth * 2, num_hidden], stddev=0.1))
    b3 = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    
    w4 = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
    b4 = tf.Variable(tf.constant(1.0, shape=[num_labels]))
    
    keep_prob = tf.placeholder(tf.float32, shape=[])

    # Model.
    def model(data):
        conv = tf.nn.conv2d(data, w1, [1, 1, 1, 1], padding='SAME') + b1
        hidden = tf.nn.relu(conv)
        maxpool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        conv = tf.nn.conv2d(maxpool, w2, [1, 1, 1, 1], padding='SAME') + b2
        hidden = tf.nn.relu(conv)
        maxpool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        shape = maxpool.get_shape().as_list()
        reshape = tf.reshape(maxpool, [shape[0], shape[1] * shape[2] * shape[3]])
        
        hidden = tf.nn.xw_plus_b(reshape, w3, b3)
        hidden = tf.nn.relu(hidden)
        dropout = tf.nn.dropout(hidden, keep_prob)
        
        hidden = tf.nn.xw_plus_b(dropout, w4, b4)
        
        return hidden#
    

    # Training computation.
    logits = model(train)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

    # Optimizer.
    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(1e-4, global_step, 500, 0.96)
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(valid))
    test_prediction = tf.nn.softmax(model(test))

In [None]:
nb_steps = 20001
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(nb_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {train : batch_data, labels : batch_labels, keep_prob: 0.5}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
    print('Validation accuracy: %.1f%%' % accuracy(
            valid_prediction.eval({keep_prob: 1.0}), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval({keep_prob: 1.0}), test_labels))

Initialized
Minibatch loss at step 0: 24.463163
Minibatch accuracy: 12.5%
Minibatch loss at step 50: 13.132477
Minibatch accuracy: 18.8%
Minibatch loss at step 100: 6.159635
Minibatch accuracy: 18.8%
Minibatch loss at step 150: 6.755270
Minibatch accuracy: 31.2%
Minibatch loss at step 200: 2.322510
Minibatch accuracy: 50.0%
Minibatch loss at step 250: 1.775133
Minibatch accuracy: 56.2%
Minibatch loss at step 300: 1.186511
Minibatch accuracy: 50.0%
Minibatch loss at step 350: 1.340028
Minibatch accuracy: 68.8%
Minibatch loss at step 400: 0.745281
Minibatch accuracy: 81.2%
Minibatch loss at step 450: 1.324034
Minibatch accuracy: 56.2%
Minibatch loss at step 500: 0.855545
Minibatch accuracy: 68.8%
Minibatch loss at step 550: 1.006528
Minibatch accuracy: 81.2%
Minibatch loss at step 600: 0.957653
Minibatch accuracy: 81.2%
Minibatch loss at step 650: 0.968413
Minibatch accuracy: 68.8%
Minibatch loss at step 700: 1.202350
Minibatch accuracy: 62.5%
Minibatch loss at step 750: 1.000320
Minibat

Minibatch loss at step 6450: 0.551584
Minibatch accuracy: 81.2%
Minibatch loss at step 6500: 1.154112
Minibatch accuracy: 62.5%
Minibatch loss at step 6550: 0.769454
Minibatch accuracy: 75.0%
Minibatch loss at step 6600: 0.489729
Minibatch accuracy: 81.2%
Minibatch loss at step 6650: 0.677721
Minibatch accuracy: 87.5%
Minibatch loss at step 6700: 0.333930
Minibatch accuracy: 87.5%
Minibatch loss at step 6750: 1.487664
Minibatch accuracy: 75.0%
Minibatch loss at step 6800: 0.245672
Minibatch accuracy: 93.8%
Minibatch loss at step 6850: 1.171322
Minibatch accuracy: 81.2%
Minibatch loss at step 6900: 0.783109
Minibatch accuracy: 68.8%
Minibatch loss at step 6950: 0.254762
Minibatch accuracy: 93.8%
Minibatch loss at step 7000: 0.383523
Minibatch accuracy: 93.8%
Minibatch loss at step 7050: 0.832189
Minibatch accuracy: 62.5%
Minibatch loss at step 7100: 0.775929
Minibatch accuracy: 75.0%
Minibatch loss at step 7150: 0.672980
Minibatch accuracy: 81.2%
Minibatch loss at step 7200: 0.446890
Mi

Minibatch loss at step 12850: 0.744735
Minibatch accuracy: 75.0%
Minibatch loss at step 12900: 0.687839
Minibatch accuracy: 68.8%
Minibatch loss at step 12950: 0.903536
Minibatch accuracy: 56.2%
Minibatch loss at step 13000: 0.523481
Minibatch accuracy: 81.2%
Minibatch loss at step 13050: 0.256481
Minibatch accuracy: 93.8%
Minibatch loss at step 13100: 0.758519
Minibatch accuracy: 75.0%
Minibatch loss at step 13150: 0.568899
Minibatch accuracy: 93.8%
Minibatch loss at step 13200: 0.042032
Minibatch accuracy: 100.0%
Minibatch loss at step 13250: 1.497740
Minibatch accuracy: 68.8%
Minibatch loss at step 13300: 0.204004
Minibatch accuracy: 93.8%
Minibatch loss at step 13350: 0.914136
Minibatch accuracy: 68.8%
Minibatch loss at step 13400: 0.604637
Minibatch accuracy: 81.2%
Minibatch loss at step 13450: 0.211322
Minibatch accuracy: 93.8%
Minibatch loss at step 13500: 0.715142
Minibatch accuracy: 81.2%
Minibatch loss at step 13550: 0.439647
Minibatch accuracy: 81.2%
Minibatch loss at step 1

Minibatch loss at step 19150: 0.710142
Minibatch accuracy: 81.2%
Minibatch loss at step 19200: 0.755698
Minibatch accuracy: 75.0%
Minibatch loss at step 19250: 0.053061
Minibatch accuracy: 100.0%
Minibatch loss at step 19300: 0.296963
Minibatch accuracy: 93.8%
Minibatch loss at step 19350: 0.123619
Minibatch accuracy: 100.0%
Minibatch loss at step 19400: 0.234892
Minibatch accuracy: 87.5%
Minibatch loss at step 19450: 1.022528
Minibatch accuracy: 68.8%
Minibatch loss at step 19500: 0.327940
Minibatch accuracy: 81.2%
Minibatch loss at step 19550: 0.332659
Minibatch accuracy: 93.8%
Minibatch loss at step 19600: 0.275607
Minibatch accuracy: 93.8%
Minibatch loss at step 19650: 0.300744
Minibatch accuracy: 93.8%
Minibatch loss at step 19700: 0.369276
Minibatch accuracy: 87.5%
Minibatch loss at step 19750: 1.242403
Minibatch accuracy: 68.8%
Minibatch loss at step 19800: 0.795265
Minibatch accuracy: 87.5%
Minibatch loss at step 19850: 0.257094
Minibatch accuracy: 93.8%
Minibatch loss at step 