Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [4]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [5]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [6]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [7]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [1]:
'''
Notes to help others. :) Good luck!

In put image is 28 x 28 x 1, where each batch of stochastic Gradient Decent contains 16 (batch size). 

1) convolution 1: 

input: 16 x 28 x 28 x 1 
  convolution is applied to image by using 5x5 patch, moving by 2x2 stride - so the image size decrease by factor of 2. 
  28 x 28 to 14 x 14
  Additionally, depth (number of filters) is increased from 1 to 16, so the output is 
  16 x 14 x 14 x 16 = (# image x image_size x image_size x #filters)
       
2) convolution 2:  
  
input: 16 x 14 x 14 x 16
  convolution is applied to image by using 5x5 patch, moving 2x2 stride - so the image size is reduced further by factor of 2. 
  14 x 14 to 7 x 7
  Depth is kept the same, so number of filters remain as 16. 
  16 x 7 x 7 x 16
  
3) Hidden Network: 

input: 16 x (7 x 7 x 16) = 16 x (28/4 x 28/4 x 16) = 16 x 784
   This is just like original deep learning data from assignment 2. 
   X*W + B = (16 x 784) * (784 * #nueral_node) + (16 x #nueral_node) = [16 x #nueral_node]
   in below example: #nueral_node = num_hidden = 64
  
4) Output Network: 
  
input: (16 x 64)
    X * W + B = (16 x 64)*(64 x 10) + (16 x 10) = (16 x 10)
    output is 10 vector (one-hot encoding of each class category) for each of 16 data point.
'''

def convDeepModel(batch_size,num_steps,L2_weight):
    
    image_size = 28
    num_channels = 1 # grayscale
    num_labels = 10  # classification
    batch_size = 16  # number of data (image) per step for stochastic gradient decent
    patch_size = 5   # 5x5 patch
    depth = 16       # Number of filters or depth for convolution step
    num_hidden = 64  # Hidden network for the last step
    
    # Model.
    def model(data, weights, biases):
        
        # Convolution layer 1
        conv = tf.nn.conv2d(data, weights['conv1'],[1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['conv1'])
        
        # Convolution layer 2
        conv = tf.nn.conv2d(hidden, weights['conv2'],[1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['conv2'])
        
        # Reshape output of Conv2 - to prepare as input of hidden layer 3
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        ## Hidden layer 3
        hidden = tf.nn.relu(tf.matmul(reshape, weights['hidd3']) + biases['hidd3'])
        
        return tf.matmul(hidden, weights['out']) + biases['out']   

    graph = tf.Graph()
    with graph.as_default():
        
        # Input data. # shape = (16 x 28 x 28 x 1)
        tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size, image_size, num_channels))
        tf_train_labels = tf.placeholder(tf.float32,shape=(batch_size, num_labels)) # shape = (16 x 10)
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        
        # Store layers weight & bias
        weights = {
            'conv1': tf.Variable(tf.truncated_normal( 
                    [patch_size, patch_size, num_channels, depth], 
                    stddev=0.1)
                                ),          
            'conv2': tf.Variable(tf.truncated_normal(
                    [patch_size, patch_size, depth, depth], 
                    stddev=0.1)
                                ),         
            'hidd3': tf.Variable(tf.truncated_normal(
                    [image_size // 4 * image_size // 4 * depth, num_hidden], 
                    stddev=0.1)
                                ),
            'out': tf.Variable(tf.truncated_normal(
                    [num_hidden, num_labels], 
                    stddev=0.1)
                              )
        }
        biases = {
            'conv1': tf.Variable(tf.zeros([depth])),
            'conv2': tf.Variable(tf.constant(1.0, shape=[depth])),
            'hidd3': tf.Variable(tf.constant(1.0, shape=[num_hidden])),
            'out': tf.Variable(tf.constant(1.0, shape=[num_labels]))
        }
        
        # Training computation.
        logits = model(tf_train_dataset, weights, biases)
        loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))

        # Optimizer.
        optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

        # Predictions for the training, validation, and test data.
        train_prediction = tf.nn.softmax(logits)
        valid_prediction = tf.nn.softmax(model(tf_valid_dataset, weights, biases))
        test_prediction = tf.nn.softmax(model(tf_test_dataset, weights, biases))

    num_steps = 1001
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)

            if (step % 200 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))

        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

In [8]:
convDeepModel(batch_size=16,num_steps=1001,L2_weight=0.0)

Initialized
Minibatch loss at step 0: 3.460344
Minibatch accuracy: 12.5%
Validation accuracy: 12.6%
Minibatch loss at step 200: 0.510356
Minibatch accuracy: 81.2%
Validation accuracy: 77.1%
Minibatch loss at step 400: 0.727496
Minibatch accuracy: 75.0%
Validation accuracy: 79.9%
Minibatch loss at step 600: 0.617148
Minibatch accuracy: 87.5%
Validation accuracy: 81.5%
Minibatch loss at step 800: 0.328495
Minibatch accuracy: 93.8%
Validation accuracy: 81.8%
Minibatch loss at step 1000: 0.794244
Minibatch accuracy: 75.0%
Validation accuracy: 82.4%
Test accuracy: 89.4%


In [13]:
'''
Notes to help others. :) Good luck!

Added 3rd convolution layer. 
I kept the dept same (16 = number of filters) and used stride of 1x1 so images of 7x7 (from previous layer) 
do get further reduced. Well... since 7 is a prime number, you can't really do any better than that! I do 
not think adding another really helps in anyway at all, since I can't reduce image size any futher. 
'''

def convDeepModel_2(batch_size,num_steps,L2_weight):
    
    image_size = 28
    num_channels = 1 # grayscale
    num_labels = 10  # classification
    batch_size = 16  # number of data (image) per step for stochastic gradient decent
    patch_size = 5   # 5x5 patch
    depth = 16       # Number of filters or depth for convolution step
    num_hidden = 64  # Hidden network for the last step
    
    # Model.
    def model(data, weights, biases):
        
        # Convolution layer 1
        conv = tf.nn.conv2d(data, weights['conv1'],[1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['conv1'])
        
        # Convolution layer 2
        conv = tf.nn.conv2d(hidden, weights['conv2'],[1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['conv2'])
        
        # Convolution layer 3
        conv = tf.nn.conv2d(hidden, weights['conv3'],[1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['conv3'])
        
        # Reshape output of Conv3 - to prepare as input of hidden layer 3
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        ## Hidden layer 3
        hidden = tf.nn.relu(tf.matmul(reshape, weights['hidd3']) + biases['hidd3'])
        
        return tf.matmul(hidden, weights['out']) + biases['out']   

    graph = tf.Graph()
    with graph.as_default():
        
        # Input data. # shape = (16 x 28 x 28 x 1)
        tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size, image_size, num_channels))
        tf_train_labels = tf.placeholder(tf.float32,shape=(batch_size, num_labels)) # shape = (16 x 10)
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        
        # Store layers weight & bias
        weights = {
            'conv1': tf.Variable(tf.truncated_normal( 
                    [patch_size, patch_size, num_channels, depth], 
                    stddev=0.1)
                                ),          
            'conv2': tf.Variable(tf.truncated_normal(
                    [patch_size, patch_size, depth, depth], 
                    stddev=0.1)
                                ),  
            'conv3': tf.Variable(tf.truncated_normal(
                    [patch_size, patch_size, depth, depth], 
                    stddev=0.1)
                                ),  
            'hidd3': tf.Variable(tf.truncated_normal(
                    [image_size // 4 * image_size // 4 * depth, num_hidden], 
                    stddev=0.1)
                                ),
            'out': tf.Variable(tf.truncated_normal(
                    [num_hidden, num_labels], 
                    stddev=0.1)
                              )
        }
        biases = {
            'conv1': tf.Variable(tf.zeros([depth])),
            'conv2': tf.Variable(tf.constant(1.0, shape=[depth])),
            'conv3': tf.Variable(tf.constant(1.0, shape=[depth])),
            'hidd3': tf.Variable(tf.constant(1.0, shape=[num_hidden])),
            'out': tf.Variable(tf.constant(1.0, shape=[num_labels]))
        }
        
        # Training computation.
        logits = model(tf_train_dataset, weights, biases)
        loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))

        # Optimizer.
        optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

        # Predictions for the training, validation, and test data.
        train_prediction = tf.nn.softmax(logits)
        valid_prediction = tf.nn.softmax(model(tf_valid_dataset, weights, biases))
        test_prediction = tf.nn.softmax(model(tf_test_dataset, weights, biases))

    num_steps = 1001
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)

            if (step % 200 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))

        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))
        
convDeepModel_2(batch_size=16,num_steps=1001,L2_weight=0.0)

Initialized
Minibatch loss at step 0: 2.935316
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Minibatch loss at step 200: 0.597066
Minibatch accuracy: 87.5%
Validation accuracy: 77.3%
Minibatch loss at step 400: 0.791855
Minibatch accuracy: 75.0%
Validation accuracy: 80.6%
Minibatch loss at step 600: 0.446352
Minibatch accuracy: 81.2%
Validation accuracy: 81.7%
Minibatch loss at step 800: 0.202790
Minibatch accuracy: 100.0%
Validation accuracy: 82.2%
Minibatch loss at step 1000: 0.751617
Minibatch accuracy: 68.8%
Validation accuracy: 83.1%
Test accuracy: 89.7%


-----------------------------------------
One can probably do better by adding methods such as dropout or L2 regularization and doing PCA, on the image before proceding with convolution. Also, one can add more hidden layers If one want to.  
-----------------------------------------

---
Problem 1
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [44]:
def convDeepModel_3(batch_size,num_steps,L2_weight):
    
    image_size = 28
    num_channels = 1 # grayscale
    num_labels = 10  # classification
    patch_size = 5   # 5x5 patch
    depth = 16       # Number of filters or depth for convolution step
    num_hidden = 64  # Hidden network for the last step

    def conv2d(x, W, b, strides=1):
        x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
        x = tf.nn.bias_add(x, b)
        return tf.nn.relu(x)

    def maxpool2d(x, k=2):
        # MaxPool2D wrapper
        return tf.nn.max_pool(x, ksize=[1, k, k, 1], 
                              strides=[1, k, k, 1],
                              padding='SAME')
    
    # Model.
    def model(data, weights, biases):
        
        # Convolution layer 1
        conv = conv2d(data, weights['conv1'],biases['conv1'],strides=1)
        pool = maxpool2d(conv, k=2)
        
        # Convolution layer 2
        conv = conv2d(pool, weights['conv2'],biases['conv2'],strides=1)
        pool = maxpool2d(conv, k=2)
        
        # Reshape output of Conv2 - to prepare as input of hidden layer 3
        shape = pool.get_shape().as_list()
        reshape = tf.reshape(pool, [shape[0], shape[1] * shape[2] * shape[3]])
        ## Hidden layer 3
        hidden = tf.nn.relu(tf.matmul(reshape, weights['hidd3']) + biases['hidd3'])
        
        return tf.matmul(hidden, weights['out']) + biases['out']   

    graph = tf.Graph()
    with graph.as_default():
        
        # Input data. # shape = (16 x 28 x 28 x 1)
        tf_train_dataset = tf.placeholder(tf.float32,shape=(batch_size, image_size, image_size, num_channels))
        tf_train_labels = tf.placeholder(tf.float32,shape=(batch_size, num_labels)) # shape = (16 x 10)
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        
        # Store layers weight & bias
        weights = {
            'conv1': tf.Variable(tf.truncated_normal( 
                    [patch_size, patch_size, num_channels, depth], 
                    stddev=0.1)
                                ),          
            'conv2': tf.Variable(tf.truncated_normal(
                    [patch_size, patch_size, depth, depth], 
                    stddev=0.1)
                                ),         
            'hidd3': tf.Variable(tf.truncated_normal(
                    [7 * 7 * depth, num_hidden], 
                    stddev=0.1)
                                ),
            'out': tf.Variable(tf.truncated_normal(
                    [num_hidden, num_labels], 
                    stddev=0.1)
                              )
        }
        biases = {
            'conv1': tf.Variable(tf.zeros([depth])),
            'conv2': tf.Variable(tf.constant(1.0, shape=[depth])),
            'hidd3': tf.Variable(tf.constant(1.0, shape=[num_hidden])),
            'out': tf.Variable(tf.constant(1.0, shape=[num_labels]))
        }
        
        # Training computation.
        logits = model(tf_train_dataset, weights, biases)
        loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))

        # Optimizer.
        optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

        # Predictions for the training, validation, and test data.
        train_prediction = tf.nn.softmax(logits)
        valid_prediction = tf.nn.softmax(model(tf_valid_dataset, weights, biases))
        test_prediction = tf.nn.softmax(model(tf_test_dataset, weights, biases))

    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)

            if (step % 200 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))

        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

In [46]:
convDeepModel_3(batch_size=16,num_steps=2001,L2_weight=0.0)
convDeepModel_3(batch_size=16,num_steps=1001,L2_weight=0.0)

Initialized
Minibatch loss at step 0: 3.576232
Minibatch accuracy: 12.5%
Validation accuracy: 10.0%
Minibatch loss at step 200: 0.463078
Minibatch accuracy: 81.2%
Validation accuracy: 78.1%
Minibatch loss at step 400: 0.709035
Minibatch accuracy: 75.0%
Validation accuracy: 80.4%
Minibatch loss at step 600: 0.532466
Minibatch accuracy: 87.5%
Validation accuracy: 82.5%
Minibatch loss at step 800: 0.390249
Minibatch accuracy: 93.8%
Validation accuracy: 83.9%
Minibatch loss at step 1000: 0.718362
Minibatch accuracy: 75.0%
Validation accuracy: 84.3%
Minibatch loss at step 1200: 0.086665
Minibatch accuracy: 100.0%
Validation accuracy: 85.2%
Minibatch loss at step 1400: 0.576786
Minibatch accuracy: 81.2%
Validation accuracy: 85.5%
Minibatch loss at step 1600: 0.866188
Minibatch accuracy: 68.8%
Validation accuracy: 85.5%
Minibatch loss at step 1800: 0.484554
Minibatch accuracy: 81.2%
Validation accuracy: 86.2%
Minibatch loss at step 2000: 0.716943
Minibatch accuracy: 81.2%
Validation accuracy:

---
Problem 2
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---