Deep Learning with Tensorflow
=============

Assignment II
------------

During one of the lectures in [Lab 1](https://deep-learning-su.github.io/labs/lab-1/) we trained fully connected network to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters. 

The goal of this assignment is make the neural network convolutional.

For this exercise, you would need the `notMNIST.pickle` created in `Lab 1`. You can obtain it by rerunning the given paragraphs without having to solve the problems (although it is highly recommended to do it if you haven't already).

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range
import os

In [2]:
data_dir = 'data/'
pickle_file = os.path.join(data_dir, 'notMNIST.pickle')

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

## Problem 1
Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

Edit the snippet bellow by changing the `model` function.

### 1.1 - Define the model
Implement the `model` function bellow. Take a look at the following TF functions:
- **tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'):** given an input $X$ and a group of filters $W1$, this function convolves $W1$'s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation [here](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
- **tf.nn.relu(Z1):** computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/nn/relu)

### 1.2 - Compute loss

Implement the `compute_loss` function below. You might find these two functions helpful: 

- **tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y):** computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation  [here.](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)
- **tf.reduce_mean:** computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/reduce_mean)


In [5]:
def weight_variable(shape):
  # uses default std. deviation
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  # uses default bias
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

In [6]:
def compute_loss(labels, logits):
    entropy_loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)
    return tf.reduce_mean(entropy_loss)

In [7]:
batch_size = 16
patch_size = 5
depth = 16 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  # Convolutional Layer 1
  W_conv_1 = weight_variable([patch_size, patch_size, 1, depth])
  W_bias_1 = bias_variable([depth])

  # Convolutional Layer 2
  W_conv_2 = weight_variable([patch_size, patch_size, depth, depth])
  W_bias_2 = bias_variable([depth])
  
  # Dense Layer 1
  W_dense_1 = weight_variable([7 * 7 * 16, num_hidden])
  W_dense_1_bias = bias_variable([num_hidden])

  # Dense Layer 2
  W_dense_2 = weight_variable([num_hidden, num_labels])
  W_dense_2_bias = bias_variable([num_labels])
  
  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
    # * one fully connected layer
    # return the logits (last layer)

    conv_1_out = tf.nn.relu(tf.nn.conv2d(data, W_conv_1, [1, 2, 2, 1], 'SAME') + W_bias_1)
    conv_2_out = tf.nn.relu(tf.nn.conv2d(conv_1_out, W_conv_2, [1, 2, 2, 1], 'SAME') + W_bias_2)

    # Flatten
    flat = tf.reshape(conv_2_out, [-1, 7 * 7 * 16])

    dense_1_out = tf.nn.relu(tf.matmul(flat, W_dense_1) + W_dense_1_bias)   
    logits = tf.matmul(dense_1_out, W_dense_2) + W_dense_2_bias

    return logits

  # Training computation.
  logits = model(tf_train_dataset)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



### 1.3 - Measure the accuracy and tune your model

Run the snippet bellow to measure the accuracy of your model. Try to achieve a test accuracy of around 80%. Iterate on the filters size.

In [8]:
num_steps = 1001

def train(graph, optimizer, train_prediction, valid_prediction, test_prediction):
  with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()

    print('Initialized')
    
    for step in range(num_steps):
      offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
      batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
      batch_labels = train_labels[offset:(offset + batch_size), :]
      feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
      _, l, predictions = session.run(
        [optimizer, loss, train_prediction], feed_dict=feed_dict)
      if (step % 50 == 0):
        print(f'Minibatch loss at step {step}: {l}')
        print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
        print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

train(graph, optimizer, train_prediction, valid_prediction, test_prediction)

Initialized
Minibatch loss at step 0: 2.455214262008667
Minibatch accuracy: 6.2%
Validation accuracy: 11.4%
Minibatch loss at step 50: 1.0963480472564697
Minibatch accuracy: 56.2%
Validation accuracy: 55.8%
Minibatch loss at step 100: 0.524612545967102
Minibatch accuracy: 93.8%
Validation accuracy: 75.1%
Minibatch loss at step 150: 1.0666847229003906
Minibatch accuracy: 68.8%
Validation accuracy: 77.2%
Minibatch loss at step 200: 0.4244617223739624
Minibatch accuracy: 87.5%
Validation accuracy: 78.1%
Minibatch loss at step 250: 1.2042909860610962
Minibatch accuracy: 68.8%
Validation accuracy: 78.4%
Minibatch loss at step 300: 0.722869873046875
Minibatch accuracy: 81.2%
Validation accuracy: 80.3%
Minibatch loss at step 350: 0.6444584131240845
Minibatch accuracy: 81.2%
Validation accuracy: 79.1%
Minibatch loss at step 400: 0.40057799220085144
Minibatch accuracy: 87.5%
Validation accuracy: 80.5%
Minibatch loss at step 450: 0.37450891733169556
Minibatch accuracy: 87.5%
Validation accuracy:

---
Problem 2
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [9]:
batch_size = 16
patch_size = 5
depth = 16 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?

graph_pool = tf.Graph()

with graph_pool.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  # Convolutional Layer 1
  W_conv_1 = weight_variable([patch_size, patch_size, 1, depth])
  W_bias_1 = bias_variable([depth])

  # Convolutional Layer 2
  W_conv_2 = weight_variable([patch_size, patch_size, depth, depth])
  W_bias_2 = bias_variable([depth])
  
  # Dense Layer 1
  W_dense_1 = weight_variable([7 * 7 * 16, num_hidden])
  W_dense_1_bias = bias_variable([num_hidden])

  # Dense Layer 2
  W_dense_2 = weight_variable([num_hidden, num_labels])
  W_dense_2_bias = bias_variable([num_labels])
  
  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
    # * one fully connected layer
    # return the logits (last layer)

    # print(data.shape)

    conv_1_out = tf.nn.relu(tf.nn.conv2d(data, W_conv_1, [1, 1, 1, 1], 'SAME') + W_bias_1)
    pool_1_out = tf.nn.max_pool(conv_1_out, [1, 2, 2, 1], [1, 2, 2, 1], 'SAME')
    
    conv_2_out = tf.nn.relu(tf.nn.conv2d(pool_1_out, W_conv_2, [1, 1, 1, 1], 'SAME') + W_bias_2)
    pool_2_out = tf.nn.max_pool(conv_2_out, [1, 2, 2, 1], [1, 2, 2, 1], 'SAME')    

    # Flatten
    flat = tf.reshape(pool_2_out, [-1, 7 * 7 * 16])

    dense_1_out = tf.nn.relu(tf.matmul(flat, W_dense_1) + W_dense_1_bias)   
    logits = tf.matmul(dense_1_out, W_dense_2) + W_dense_2_bias

    return logits
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.005).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [10]:
train(graph_pool, optimizer, train_prediction, valid_prediction, test_prediction)

Initialized
Minibatch loss at step 0: 2.5468897819519043
Minibatch accuracy: 12.5%
Validation accuracy: 10.0%
Minibatch loss at step 50: 2.2210307121276855
Minibatch accuracy: 18.8%
Validation accuracy: 20.6%
Minibatch loss at step 100: 2.179996967315674
Minibatch accuracy: 18.8%
Validation accuracy: 31.3%
Minibatch loss at step 150: 2.063338279724121
Minibatch accuracy: 37.5%
Validation accuracy: 41.2%
Minibatch loss at step 200: 1.701377272605896
Minibatch accuracy: 68.8%
Validation accuracy: 48.9%
Minibatch loss at step 250: 1.7243231534957886
Minibatch accuracy: 43.8%
Validation accuracy: 54.4%
Minibatch loss at step 300: 1.4650031328201294
Minibatch accuracy: 62.5%
Validation accuracy: 62.5%
Minibatch loss at step 350: 1.0505726337432861
Minibatch accuracy: 68.8%
Validation accuracy: 65.7%
Minibatch loss at step 400: 0.9026930928230286
Minibatch accuracy: 75.0%
Validation accuracy: 70.5%
Minibatch loss at step 450: 0.8984693288803101
Minibatch accuracy: 75.0%
Validation accuracy: 

---
Problem 3
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

In [11]:
# Convert images from size of 28x28 to 32x32 as per LeNet specification.
train_dataset = np.pad(train_dataset, ((0,0),(2,2),(2,2),(0,0)), 'constant')
valid_dataset = np.pad(valid_dataset, ((0,0),(2,2),(2,2),(0,0)), 'constant')
test_dataset = np.pad(test_dataset, ((0,0),(2,2),(2,2),(0,0)), 'constant')
print(train_dataset.shape)
print(valid_dataset.shape)
print(test_dataset.shape)

(200000, 32, 32, 1)
(10000, 32, 32, 1)
(10000, 32, 32, 1)


In [12]:
batch_size = 256
image_size = 32

conv_1_filter_size = 5
conv_1_filter_count = 20
conv_1_stride = 1

conv_2_filter_size = 5
conv_2_filter_count = 50
conv_2_stride = 1

flattened_size = 5 * 5 * 50
dense_1_count = 500
dense_2_count = 84

lenet_5 = tf.Graph()

with lenet_5.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  # Convolutional layer 1
  w_conv_1 = weight_variable([conv_1_filter_size, conv_1_filter_size, 1, conv_1_filter_count])
  bias_conv_1 = bias_variable([conv_1_filter_count])

  # Convolutional layer 2
  w_conv_2 = weight_variable([conv_2_filter_size, conv_2_filter_size, conv_1_filter_count, conv_2_filter_count])
  bias_conv_2 = bias_variable([conv_2_filter_count])

  # Dense layer 1
  w_dense_1 = weight_variable([flattened_size, dense_1_count])
  bias_dense_1 = bias_variable([dense_1_count])

  # Dense layer 2
  w_dense_2 = weight_variable([dense_1_count, dense_2_count])
  bias_dense_2 = bias_variable([dense_2_count])

  # Output layer
  w_output = weight_variable([dense_2_count, num_labels])
  bias_output = bias_variable([num_labels])

  # Model.
  def model(data):
    conv_1_out = tf.nn.relu(tf.nn.conv2d(data, w_conv_1, strides=[1, 1, 1, 1], padding='VALID') + bias_conv_1)
    # Tried also with avg_pool but max_pool performance tends to be better
    pool_1_out = tf.nn.max_pool(conv_1_out, [1, 2, 2, 1], [1, 2, 2, 1], 'VALID')
    conv_2_out = tf.nn.relu(tf.nn.conv2d(pool_1_out, w_conv_2, strides=[1, 1, 1, 1], padding='VALID') + bias_conv_2)
    pool_2_out = tf.nn.max_pool(conv_2_out, [1, 2, 2, 1], [1, 2, 2, 1], 'VALID')
    flat = tf.reshape(pool_2_out, [-1, flattened_size])
    dense_1_out = tf.nn.relu(tf.matmul(flat, w_dense_1) + bias_dense_1)
    dropout_1 = tf.nn.dropout(dense_1_out, rate=0.5)
    dense_2_out = tf.nn.relu(tf.matmul(dropout_1, w_dense_2) + bias_dense_2)    
    dropout_2 = tf.nn.dropout(dense_2_out, rate=0.5)
    logits = tf.matmul(dropout_2, w_output) + bias_output
    return logits

  # Training computation.
  logits = model(tf_train_dataset)
  loss = compute_loss(tf_train_labels, logits)

  # Optimizer.
  optimizer = tf.train.AdamOptimizer(learning_rate=0.0008).minimize(loss)

  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [13]:
train(lenet_5, optimizer, train_prediction, valid_prediction, test_prediction)

Initialized
Minibatch loss at step 0: 4.662407875061035
Minibatch accuracy: 10.2%
Validation accuracy: 11.3%
Minibatch loss at step 50: 1.237686038017273
Minibatch accuracy: 59.0%
Validation accuracy: 58.8%
Minibatch loss at step 100: 1.0148578882217407
Minibatch accuracy: 70.7%
Validation accuracy: 74.6%
Minibatch loss at step 150: 0.7051034569740295
Minibatch accuracy: 80.9%
Validation accuracy: 79.3%
Minibatch loss at step 200: 0.706531286239624
Minibatch accuracy: 77.7%
Validation accuracy: 80.7%
Minibatch loss at step 250: 0.5412888526916504
Minibatch accuracy: 84.4%
Validation accuracy: 81.7%
Minibatch loss at step 300: 0.5233900547027588
Minibatch accuracy: 82.4%
Validation accuracy: 83.3%
Minibatch loss at step 350: 0.5371547937393188
Minibatch accuracy: 83.6%
Validation accuracy: 83.8%
Minibatch loss at step 400: 0.5158592462539673
Minibatch accuracy: 83.6%
Validation accuracy: 84.5%
Minibatch loss at step 450: 0.6056300401687622
Minibatch accuracy: 82.8%
Validation accuracy: 