Deep Learning with Tensorflow
=============

Assignment II - Convolutional Neural Networks
------------

Previously in `20210322-lab-1-notmnist.ipynb`, we created a pickle with formatted datasets for training, development and testing on the [notMNIST dataset](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html).

The goal of this assignment is make the neural network convolutional.

First reload the data we generated in `20210322-lab-1-notmnist.ipynb`. If you already have the data downloaded, you can just load the saved pickel files. This would work if you are running the notebook locally in jupyter. Then you can just load the data with the following snippet:

```
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print 'Training set', train_dataset.shape, train_labels.shape
  print 'Validation set', valid_dataset.shape, valid_labels.shape
  print 'Test set', test_dataset.shape, test_labels.shape
```

If you are running in colab, then just continue with the next two paragraphs. We will redownload the files using the same python methods as in the first lab excercise.

In [1]:
!wget https://gist.githubusercontent.com/nktaushanov/7aa762a4e1370b5ad287e87595c6499e/raw/4e6ee948d963d4efe16a9452036c6e380d0b30db/download_notmnist.py

--2022-05-18 19:02:40--  https://gist.githubusercontent.com/nktaushanov/7aa762a4e1370b5ad287e87595c6499e/raw/4e6ee948d963d4efe16a9452036c6e380d0b30db/download_notmnist.py
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7285 (7.1K) [text/plain]
Saving to: ‘download_notmnist.py.3’


2022-05-18 19:02:40 (55.5 MB/s) - ‘download_notmnist.py.3’ saved [7285/7285]



In [2]:
import download_notmnist
train_dataset, train_labels, test_dataset, test_labels, valid_dataset, valid_labels = download_notmnist.run()
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Found and verified ./notMNIST_large.tar.gz
Found and verified ./notMNIST_small.tar.gz
./notMNIST_large already present - Skipping extraction of ./notMNIST_large.tar.gz.
['./notMNIST_large/A', './notMNIST_large/B', './notMNIST_large/C', './notMNIST_large/D', './notMNIST_large/E', './notMNIST_large/F', './notMNIST_large/G', './notMNIST_large/H', './notMNIST_large/I', './notMNIST_large/J']
./notMNIST_small already present - Skipping extraction of ./notMNIST_small.tar.gz.
['./notMNIST_small/A', './notMNIST_small/B', './notMNIST_small/C', './notMNIST_small/D', './notMNIST_small/E', './notMNIST_small/F', './notMNIST_small/G', './notMNIST_small/H', './notMNIST_small/I', './notMNIST_small/J']
./notMNIST_large/A.pickle already present - Skipping pickling.
./notMNIST_large/B.pickle already present - Skipping pickling.
./notMNIST_large/C.pickle already present - Skipping pickling.
./notMNIST_large/D.pickle already present - Skipping pickling.
./notMNIST_large/E.pickle already present - Skipping p

In [3]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [4]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [5]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

In [6]:
batch_size = 16
train_tf_dataset = tf.data.Dataset.from_tensor_slices(
    (train_dataset, train_labels)).shuffle(1000).batch(batch_size)

## Problem 1
Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

Edit the snippet bellow by changing the `model` function.

### 1.1 - Define the model
Implement the `model` function bellow. Take a look at the following TF functions:
- **tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'):** given an input $X$ and a group of filters $W1$, this function convolves $W1$'s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation [here](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
- **tf.nn.relu(Z1):** computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/nn/relu)


In [7]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

layer1_weights = tf.Variable(tf.random.truncated_normal(
    [patch_size, patch_size, num_channels, depth], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([depth]))
layer2_weights = tf.Variable(tf.random.truncated_normal(
    [patch_size, patch_size, depth, depth], stddev=0.1))
layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
layer3_weights = tf.Variable(tf.random.truncated_normal(
    [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.random.truncated_normal(
    [num_hidden, num_labels], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))

def model(data):
  conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(conv + layer1_biases)
  conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(conv + layer2_biases)
  shape = hidden.get_shape().as_list()
  reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
  hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
  return tf.matmul(hidden, layer4_weights) + layer4_biases


### 1.2 - Compute loss

Implement the `compute_loss` function below. You might find these two functions helpful: 

- **tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y):** computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation  [here.](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)
- **tf.reduce_mean:** computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/reduce_mean)

In [8]:
def compute_loss(labels, logits):
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))
  return loss

### 1.3 - Improve your model
Try to achieve a test accuracy of around 80%. Iterate on the filters size.

In [9]:
num_steps = 1000
display_step = 100
learning_rate = 0.05

optimizer = tf.keras.optimizers.SGD(learning_rate)

# Run training for the given number of steps.
for step, (batch_x, batch_y) in enumerate(train_tf_dataset.take(num_steps), 1):  
  # Training computation.
  with tf.GradientTape() as g:
    logits = model(batch_x)
    loss = compute_loss(batch_y, logits)
    
  # Optimizer.
  optimizer.minimize(loss, g.watched_variables(), tape=g)
  
  # Predictions for the training, validation, and test data.
  if step % display_step == 0:
    train_prediction = tf.nn.softmax(logits)
    train_acc = accuracy(train_prediction, batch_y)
    
    valid_prediction = tf.nn.softmax(model(valid_dataset))
    valid_acc = accuracy(valid_prediction, valid_labels)
    print("step: %i, loss: %f, train acc: %f, validation acc: %f" % (step, loss, train_acc, valid_acc))

test_prediction = tf.nn.softmax(model(test_dataset))
test_acc = accuracy(test_prediction, test_labels)
print("test acc: %f" % (test_acc))

step: 100, loss: 2.015269, train acc: 43.750000, validation acc: 54.930000
step: 200, loss: 0.746942, train acc: 68.750000, validation acc: 77.470000
step: 300, loss: 0.995147, train acc: 75.000000, validation acc: 79.000000
step: 400, loss: 0.592980, train acc: 81.250000, validation acc: 80.410000
step: 500, loss: 0.404649, train acc: 93.750000, validation acc: 81.440000
step: 600, loss: 0.814277, train acc: 81.250000, validation acc: 81.810000
step: 700, loss: 0.430593, train acc: 87.500000, validation acc: 81.680000
step: 800, loss: 0.676272, train acc: 87.500000, validation acc: 82.610000
step: 900, loss: 0.469291, train acc: 87.500000, validation acc: 82.720000
step: 1000, loss: 0.611028, train acc: 81.250000, validation acc: 83.050000
test acc: 89.680000


---
Problem 2
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [10]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

layer1_weights = tf.Variable(tf.random.truncated_normal(
    [patch_size, patch_size, num_channels, depth], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([depth]))
layer2_weights = tf.Variable(tf.random.truncated_normal(
    [patch_size, patch_size, depth, depth], stddev=0.1))
layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
layer3_weights = tf.Variable(tf.random.truncated_normal(
    [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.random.truncated_normal(
    [num_hidden, num_labels], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))

def model_with_max_pool(data):
  conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
  pool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(pool + layer1_biases)
  conv = tf.nn.conv2d(hidden, layer2_weights, [1, 1, 1, 1], padding='SAME')
  pool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(pool + layer2_biases)
  shape = hidden.get_shape().as_list()
  reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
  hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
  return tf.matmul(hidden, layer4_weights) + layer4_biases

In [11]:
num_steps = 1000
display_step = 100
learning_rate = 0.05

optimizer = tf.keras.optimizers.SGD(learning_rate)

# Run training for the given number of steps.
for step, (batch_x, batch_y) in enumerate(train_tf_dataset.take(num_steps), 1):  
  # Training computation.
  with tf.GradientTape() as g:
    logits = model_with_max_pool(batch_x)
    loss = compute_loss(batch_y, logits)
    
  # Optimizer.
  optimizer.minimize(loss, g.watched_variables(), tape=g)
  
  # Predictions for the training, validation, and test data.
  if step % display_step == 0:
    train_prediction = tf.nn.softmax(logits)
    train_acc = accuracy(train_prediction, batch_y)
    
    valid_prediction = tf.nn.softmax(model(valid_dataset))
    valid_acc = accuracy(valid_prediction, valid_labels)
    print("step: %i, loss: %f, train acc: %f, validation acc: %f" % (step, loss, train_acc, valid_acc))

test_prediction = tf.nn.softmax(model(test_dataset))
test_acc = accuracy(test_prediction, test_labels)
print("test acc: %f" % (test_acc))

step: 100, loss: 1.355374, train acc: 56.250000, validation acc: 51.470000
step: 200, loss: 0.475660, train acc: 87.500000, validation acc: 66.670000
step: 300, loss: 0.294201, train acc: 87.500000, validation acc: 64.450000
step: 400, loss: 1.022111, train acc: 75.000000, validation acc: 70.030000
step: 500, loss: 0.342548, train acc: 87.500000, validation acc: 68.840000
step: 600, loss: 0.567398, train acc: 81.250000, validation acc: 73.830000
step: 700, loss: 0.328666, train acc: 93.750000, validation acc: 70.500000
step: 800, loss: 0.197961, train acc: 100.000000, validation acc: 72.620000
step: 900, loss: 0.769935, train acc: 81.250000, validation acc: 73.190000
step: 1000, loss: 0.666586, train acc: 75.000000, validation acc: 71.360000
test acc: 77.700000


---
Problem 3
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

In [12]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

layer1_weights = tf.Variable(tf.random.truncated_normal(
    [patch_size, patch_size, num_channels, depth], stddev=0.1))
layer1_biases = tf.Variable(tf.zeros([depth]))
layer2_weights = tf.Variable(tf.random.truncated_normal(
    [patch_size, patch_size, depth, depth], stddev=0.1))
layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
layer3_weights = tf.Variable(tf.random.truncated_normal(
    [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.random.truncated_normal(
    [num_hidden, num_labels], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))

def model_with_dropout(data):
  conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
  pool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(pool + layer1_biases)
  conv = tf.nn.conv2d(hidden, layer2_weights, [1, 1, 1, 1], padding='SAME')
  pool = tf.nn.max_pool(conv, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(pool + layer2_biases)
  shape = hidden.get_shape().as_list()
  reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
  hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
  drop = tf.nn.dropout(hidden, 0.5)
  return tf.matmul(drop, layer4_weights) + layer4_biases

In [13]:
num_steps = 1000
display_step = 100
learning_rate = 0.05

optimizer = tf.keras.optimizers.SGD(learning_rate)

# Run training for the given number of steps.
for step, (batch_x, batch_y) in enumerate(train_tf_dataset.take(num_steps), 1):  
  # Training computation.
  with tf.GradientTape() as g:
    logits = model_with_dropout(batch_x)
    loss = compute_loss(batch_y, logits)
    
  # Optimizer.
  optimizer.minimize(loss, g.watched_variables(), tape=g)
  
  # Predictions for the training, validation, and test data.
  if step % display_step == 0:
    train_prediction = tf.nn.softmax(logits)
    train_acc = accuracy(train_prediction, batch_y)
    
    valid_prediction = tf.nn.softmax(model(valid_dataset))
    valid_acc = accuracy(valid_prediction, valid_labels)
    print("step: %i, loss: %f, train acc: %f, validation acc: %f" % (step, loss, train_acc, valid_acc))

test_prediction = tf.nn.softmax(model(test_dataset))
test_acc = accuracy(test_prediction, test_labels)
print("test acc: %f" % (test_acc))

step: 100, loss: 1.126998, train acc: 68.750000, validation acc: 56.150000
step: 200, loss: 1.110299, train acc: 68.750000, validation acc: 67.970000
step: 300, loss: 0.970424, train acc: 56.250000, validation acc: 70.500000
step: 400, loss: 0.774720, train acc: 75.000000, validation acc: 73.660000
step: 500, loss: 1.406382, train acc: 62.500000, validation acc: 67.330000
step: 600, loss: 1.285267, train acc: 62.500000, validation acc: 74.640000
step: 700, loss: 0.890334, train acc: 75.000000, validation acc: 77.230000
step: 800, loss: 0.651588, train acc: 68.750000, validation acc: 75.060000
step: 900, loss: 1.094963, train acc: 62.500000, validation acc: 78.700000
step: 1000, loss: 0.645493, train acc: 81.250000, validation acc: 78.220000
test acc: 84.990000


---
Problem 4
---------

Migrate your best model (with highest accuracy) to graph execution with tf.function instead of running in eager mode. Use tf.config.run_functions_eagerly to test and debug your code.

---

In [17]:
num_steps = 1000
display_step = 100
learning_rate = 0.05

optimizer = tf.keras.optimizers.SGD(learning_rate)

graph_model = tf.function(model)

# Run training for the given number of steps.
for step, (batch_x, batch_y) in enumerate(train_tf_dataset.take(num_steps), 1):  
  # Training computation.
  with tf.GradientTape() as g:
    logits = graph_model(batch_x)
    loss = compute_loss(batch_y, logits)
    
  # Optimizer.
  optimizer.minimize(loss, g.watched_variables(), tape=g)
  
  # Predictions for the training, validation, and test data.
  if step % display_step == 0:
    train_prediction = tf.nn.softmax(logits)
    train_acc = accuracy(train_prediction, batch_y)
    
    valid_prediction = tf.nn.softmax(model(valid_dataset))
    valid_acc = accuracy(valid_prediction, valid_labels)
    print("step: %i, loss: %f, train acc: %f, validation acc: %f" % (step, loss, train_acc, valid_acc))

test_prediction = tf.nn.softmax(model(test_dataset))
test_acc = accuracy(test_prediction, test_labels)
print("test acc: %f" % (test_acc))

step: 100, loss: 0.251008, train acc: 87.500000, validation acc: 84.740000
step: 200, loss: 0.478967, train acc: 81.250000, validation acc: 85.260000
step: 300, loss: 0.390301, train acc: 81.250000, validation acc: 85.240000
step: 400, loss: 0.263747, train acc: 93.750000, validation acc: 85.580000
step: 500, loss: 0.782951, train acc: 81.250000, validation acc: 85.560000
step: 600, loss: 0.844131, train acc: 68.750000, validation acc: 85.800000
step: 700, loss: 0.369759, train acc: 87.500000, validation acc: 85.600000
step: 800, loss: 0.579013, train acc: 81.250000, validation acc: 85.040000
step: 900, loss: 0.528536, train acc: 81.250000, validation acc: 86.140000
step: 1000, loss: 0.104160, train acc: 100.000000, validation acc: 85.760000
test acc: 92.160000
