Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = 'notMNIST_sanitized.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (195013, 28, 28) (195013,)
Validation set (9847, 28, 28) (9847,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (195013, 28, 28, 1) (195013, 10)
Validation set (9847, 28, 28, 1) (9847, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [5]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))
  layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
  layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



In [6]:
num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.749047
Minibatch accuracy: 18.8%
Validation accuracy: 12.1%
Minibatch loss at step 50: 1.759269
Minibatch accuracy: 37.5%
Validation accuracy: 61.4%
Minibatch loss at step 100: 0.563866
Minibatch accuracy: 87.5%
Validation accuracy: 68.1%
Minibatch loss at step 150: 0.461862
Minibatch accuracy: 81.2%
Validation accuracy: 76.3%
Minibatch loss at step 200: 0.974269
Minibatch accuracy: 62.5%
Validation accuracy: 74.4%
Minibatch loss at step 250: 0.452512
Minibatch accuracy: 93.8%
Validation accuracy: 78.8%
Minibatch loss at step 300: 0.497900
Minibatch accuracy: 87.5%
Validation accuracy: 77.6%
Minibatch loss at step 350: 0.708205
Minibatch accuracy: 75.0%
Validation accuracy: 79.2%
Minibatch loss at step 400: 0.434845
Minibatch accuracy: 87.5%
Validation accuracy: 79.8%
Minibatch loss at step 450: 1.419022
Minibatch accuracy: 68.8%
Validation accuracy: 77.5%
Minibatch loss at step 500: 0.527793
Minibatch accuracy: 87.5%
Validation accuracy: 80.7%
M

---
Problem 1
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [7]:
graph2 = tf.Graph()

with graph2.as_default():
    
    #Input
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape =(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    #Variables
    weights1 = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
    biases1 = tf.Variable(tf.zeros([depth]))
    weights2 = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
    biases2 = tf.Variable(tf.constant(1.0, shape=[depth]))
    weights3 = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    biases3 = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    weights4 = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    biases4 = tf.Variable(tf.constant(1.0, shape=[num_labels]))
    
    #Model
    def model(data):
        conv = tf.nn.conv2d(data, weights1, [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases1)
        pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        conv = tf.nn.conv2d(pool, weights2, [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases2)
        pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        shape = pool.get_shape().as_list()
        reshape = tf.reshape(pool, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights3) + biases3)
        return tf.matmul(hidden, weights4) + biases4
    
    #Training computations
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits))
    
    #Optimizer
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    
    #Predictions for the training, validation, and test data
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [8]:
num_steps = 1001

with tf.Session(graph=graph2) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 8.948732
Minibatch accuracy: 12.5%
Validation accuracy: 10.1%
Minibatch loss at step 50: 0.920574
Minibatch accuracy: 77.3%
Validation accuracy: 73.6%
Minibatch loss at step 100: 0.736072
Minibatch accuracy: 81.2%
Validation accuracy: 79.8%
Minibatch loss at step 150: 0.700567
Minibatch accuracy: 82.0%
Validation accuracy: 79.9%
Minibatch loss at step 200: 0.805390
Minibatch accuracy: 79.3%
Validation accuracy: 81.2%
Minibatch loss at step 250: 0.559883
Minibatch accuracy: 83.6%
Validation accuracy: 81.4%
Minibatch loss at step 300: 0.674942
Minibatch accuracy: 79.7%
Validation accuracy: 81.9%
Minibatch loss at step 350: 0.527532
Minibatch accuracy: 84.8%
Validation accuracy: 82.3%
Minibatch loss at step 400: 0.530665
Minibatch accuracy: 85.9%
Validation accuracy: 82.6%
Minibatch loss at step 450: 0.440252
Minibatch accuracy: 88.3%
Validation accuracy: 83.0%
Minibatch loss at step 500: 0.446531
Minibatch accuracy: 87.9%
Validation accuracy: 83.4%
M

Adding a max pooling layer after each convolutional layer increased the accuracy.

---
Problem 2
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

We train a simple model with two convolutional layers and three fully connected layers. Each convolutional layer is followed by a max pooling layer, and after each fully connected layer there is a dropout layer to prevent model from overfitting. We used Adam optimizer with its default parameters. 

In [9]:
from math import ceil

image_size = 28
batch_size = 128
patch_size = 5
depth1 = 32
depth2 = 64
num_hidden1 = 512
num_hidden2 = 256
num_hidden3 = 64
num_labels = 10
num_channels = 1 #grayscale

graph3 = tf.Graph()
with graph3.as_default():
    
    #Input
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape =(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    #Variables
    weights1 = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth1], stddev=0.01))
    biases1 = tf.Variable(tf.constant(0.0, shape=[depth1]))
    weights2 = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth1, depth2], stddev=0.01))
    biases2 = tf.Variable(tf.constant(0.0, shape=[depth2]))
    max_pooling_size = ceil(ceil(image_size / 2) / 2)
    weights3 = tf.Variable(tf.truncated_normal([max_pooling_size * max_pooling_size * depth2, num_hidden1], stddev=0.01))
    biases3 = tf.Variable(tf.constant(0.0, shape=[num_hidden1]))
    weights4 = tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=0.01))
    biases4 = tf.Variable(tf.constant(0.0, shape=[num_hidden2]))
    weights5 = tf.Variable(tf.truncated_normal([num_hidden2, num_hidden3], stddev=0.01))
    biases5 = tf.Variable(tf.constant(0.0, shape=[num_hidden3]))
    weights6 = tf.Variable(tf.truncated_normal([num_hidden3, num_labels], stddev=0.01))
    biases6 = tf.Variable(tf.constant(0.0, shape=[num_labels]))
    
    #Model
    def model(data, mode):
        if mode == 'train':
            conv = tf.nn.conv2d(data, weights1, [1, 1, 1, 1], padding='SAME')
            hidden = tf.nn.relu(conv + biases1)
            pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
            conv = tf.nn.conv2d(pool, weights2, [1, 1, 1, 1], padding='SAME')
            hidden = tf.nn.relu(conv + biases2)
            pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
            shape = pool.get_shape().as_list()
            reshape = tf.reshape(pool, [shape[0], shape[1] * shape[2] * shape[3]])
            hidden1 = tf.nn.relu(tf.matmul(reshape, weights3) + biases3)
            dropout1 = tf.nn.dropout(hidden1, 0.5)
            hidden2 = tf.nn.relu(tf.matmul(dropout1, weights4) + biases4)
            dropout2 = tf.nn.dropout(hidden2, 0.5)
            hidden3 = tf.nn.relu(tf.matmul(dropout2, weights5) + biases5)
            dropout3 = tf.nn.dropout(hidden3, 0.5)
            logits = tf.matmul(dropout3, weights6) + biases6
        elif mode == 'test':
            conv = tf.nn.conv2d(data, weights1, [1, 1, 1, 1], padding='SAME')
            hidden = tf.nn.relu(conv + biases1)
            pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
            conv = tf.nn.conv2d(pool, weights2, [1, 1, 1, 1], padding='SAME')
            hidden = tf.nn.relu(conv + biases2)
            pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
            shape = pool.get_shape().as_list()
            reshape = tf.reshape(pool, [shape[0], shape[1] * shape[2] * shape[3]])
            hidden1 = tf.nn.relu(tf.matmul(reshape, weights3) + biases3)
            hidden2 = tf.nn.relu(tf.matmul(hidden1, weights4) + biases4)
            hidden3 = tf.nn.relu(tf.matmul(hidden2, weights5) + biases5)
            logits = tf.matmul(hidden3, weights6) + biases6
        else:
            raise ValueError('\"mode\" argument should be either \"train\" or \"test\".')
        return logits
    
    #Training Computations
    logits = model(tf_train_dataset, 'train')
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=tf_train_labels))
    
    #Optimizer
    optimizer = optimizer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08).minimize(loss)
    
    #Predictions
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset, 'test'))
    test_prediction = tf.nn.softmax(model(tf_test_dataset, 'test'))

In [10]:
num_steps = 6001

with tf.Session(graph=graph3) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 500 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.302585
Minibatch accuracy: 13.3%
Validation accuracy: 9.2%
Minibatch loss at step 500: 0.468023
Minibatch accuracy: 87.5%
Validation accuracy: 85.4%
Minibatch loss at step 1000: 0.503305
Minibatch accuracy: 89.8%
Validation accuracy: 87.6%
Minibatch loss at step 1500: 0.397710
Minibatch accuracy: 86.7%
Validation accuracy: 89.0%
Minibatch loss at step 2000: 0.384518
Minibatch accuracy: 86.7%
Validation accuracy: 89.7%
Minibatch loss at step 2500: 0.346290
Minibatch accuracy: 91.4%
Validation accuracy: 90.3%
Minibatch loss at step 3000: 0.333045
Minibatch accuracy: 89.1%
Validation accuracy: 90.4%
Minibatch loss at step 3500: 0.262830
Minibatch accuracy: 88.3%
Validation accuracy: 90.4%
Minibatch loss at step 4000: 0.564753
Minibatch accuracy: 82.8%
Validation accuracy: 90.9%
Minibatch loss at step 4500: 0.340270
Minibatch accuracy: 89.8%
Validation accuracy: 91.2%
Minibatch loss at step 5000: 0.353010
Minibatch accuracy: 89.8%
Validation accuracy

The test set accuracy is __96.6__ percent, which is slightly more than the accuracy of the model proposed in the previous assignemnt which had fully connected layers only. The performance of the model can be improved by tuning the parameters or adding extra layers to make it deeper.