# Problem Set 4

## Solutions of Rex Wang
Designed by Ben Usman, Kun He, and Sarah Adel Bargal, with help from Kate Saenko and Brian Kulis.

This assignment will introduce you to:
1. Building and training a convolutional network
2. Saving snapshots of your trained model
3. Reloading weights from a saved model
4. Fine-tuning a pre-trained network
5. Visualizations using Tensorboard

This code has been tested and should for Python 3.5 and 2.7 with tensorflow 1.0.*. Since recently, you can update to recent tensorflow version just by doing `pip install tensorflow`,  or `pip install tensorflow-gpu` if you want to use GPU.

**Note:** This notebook contains problem descriptions and demo/starter code. However, you're welcome to implement and submit .py files directly, if that's easier for you. Starter .py files are provided in the same `pset4/` directory.

## Part 0: Tutorials

You will find these TensorFlow tutorials on CNNs useful:
 - [Deep MNIST for experts](https://www.tensorflow.org/get_started/mnist/pros)
 - [Convolutional Neural Networks](https://www.tensorflow.org/tutorials/deep_cnn)
 
Note that there are many ways to implement the same thing in TensorFlow, for example, both tf.nn and tf.layers provide convolutional layers but with slightly different interfaces. You will need to read the documentation of the functions provided below to understand how they work.

## Part 1: Building and Training a ConvNet on SVHN
(25 points)

First we provide demo code that trains a convolutional network on the [SVHN Dataset](http://ufldl.stanford.edu/housenumbers/).. 

You will need to download   __Format 2__ from the link above.
- Create a directory named `svhn_mat/` in the working directory. Or, you can create it anywhere you want, but change the path in `svhn_dataset_generator` to match it.
- Download `train_32x32.mat` and `test_32x32.mat` to this directory.
- `extra_32x32.mat` is NOT needed.
- You may find the `wget` command useful for downloading on linux. 



The following defines a generator for the SVHN Dataset, yielding the next batch every time next is invoked.

In [10]:
import copy
import os
import math
import numpy as np
import scipy
import scipy.io

from six.moves import range

import read_data

@read_data.restartable
def svhn_dataset_generator(dataset_name, batch_size):
    assert dataset_name in ['train', 'test']
    assert batch_size > 0 or batch_size == -1  # -1 for entire dataset
    
    path = './svhn_mat/' # path to the SVHN dataset you will download in Q1.1
    file_name = '%s_32x32.mat' % dataset_name
    file_dict = scipy.io.loadmat(os.path.join(path, file_name))
    X_all = file_dict['X'].transpose((3, 0, 1, 2))
    y_all = file_dict['y']
    data_len = X_all.shape[0]
    batch_size = batch_size if batch_size > 0 else data_len
    
    X_all_padded = np.concatenate([X_all, X_all[:batch_size]], axis=0)
    y_all_padded = np.concatenate([y_all, y_all[:batch_size]], axis=0)
    y_all_padded[y_all_padded == 10] = 0
    
    for slice_i in range(int(math.ceil(data_len / batch_size))):
        idx = slice_i * batch_size
        X_batch = X_all_padded[idx:idx + batch_size]
        y_batch = np.ravel(y_all_padded[idx:idx + batch_size])
        yield X_batch, y_batch

The following defines the CovNet Model. It has two identical conv layers with 32 5x5 convlution filters, followed by a fully-connected layer to output the logits.

In [11]:
import tensorflow as tf

def cnn_map(x_):
    with tf.variable_scope('conv1'):
        conv1 = tf.layers.conv2d(
                inputs=x_,
                filters=32,  # number of filters
                kernel_size=[5, 5],
                padding="same",
                activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits


def apply_classification_loss(model_function):
    with tf.Graph().as_default() as g:
        with tf.device("/gpu:0"):  # use gpu:0 if on GPU
            x_ = tf.placeholder(tf.float32, [None, 32, 32, 3])
            y_ = tf.placeholder(tf.int32, [None])
            y_logits = model_function(x_)
            
            y_dict = dict(labels=y_, logits=y_logits)
            losses = tf.nn.sparse_softmax_cross_entropy_with_logits(**y_dict)
            cross_entropy_loss = tf.reduce_mean(losses)
            trainer = tf.train.AdamOptimizer()
            train_op = trainer.minimize(cross_entropy_loss)
            
            y_pred = tf.argmax(tf.nn.softmax(y_logits), dimension=1)
            correct_prediction = tf.equal(tf.cast(y_pred, tf.int32), y_)
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    model_dict = {'graph': g, 'inputs': [x_, y_], 'train_op': train_op,
                  'accuracy': accuracy, 'loss': cross_entropy_loss}
    
    return model_dict

### Q1.2 Training SVHN Net
Now we train a `cnn_map` net on Format 2 of the SVHN Dataset. We will call this "SVHN net". 

**Note:** training will take a while, so you might want to use GPU.

In [12]:
def train_model(model_dict, dataset_generators, epoch_n, print_every):
    with model_dict['graph'].as_default(), tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        
        for epoch_i in range(epoch_n):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                sess.run(model_dict['train_op'], feed_dict=train_feed_dict)
                
                if iter_i % print_every == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        collect_arr.append(sess.run(to_compute, test_feed_dict))
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('epoch {:d} iter {:d}, loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))

In [13]:
dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}
    
model_dict = apply_classification_loss(cnn_map)
train_model(model_dict, dataset_generators, epoch_n=50, print_every=200)

epoch 0 iter 0, loss: 145.154, accuracy: 0.196
epoch 0 iter 200, loss: 1.098, accuracy: 0.664
epoch 1 iter 0, loss: 1.009, accuracy: 0.696
epoch 1 iter 200, loss: 0.914, accuracy: 0.733
epoch 2 iter 0, loss: 0.908, accuracy: 0.735
epoch 2 iter 200, loss: 0.861, accuracy: 0.753
epoch 3 iter 0, loss: 0.903, accuracy: 0.737
epoch 3 iter 200, loss: 0.898, accuracy: 0.748
epoch 4 iter 0, loss: 0.885, accuracy: 0.748
epoch 4 iter 200, loss: 0.856, accuracy: 0.766
epoch 5 iter 0, loss: 0.814, accuracy: 0.773
epoch 5 iter 200, loss: 0.842, accuracy: 0.770
epoch 6 iter 0, loss: 0.811, accuracy: 0.778
epoch 6 iter 200, loss: 0.862, accuracy: 0.773
epoch 7 iter 0, loss: 0.813, accuracy: 0.782
epoch 7 iter 200, loss: 0.901, accuracy: 0.775
epoch 8 iter 0, loss: 0.820, accuracy: 0.783
epoch 8 iter 200, loss: 0.942, accuracy: 0.771
epoch 9 iter 0, loss: 0.816, accuracy: 0.785
epoch 9 iter 200, loss: 1.012, accuracy: 0.767
epoch 10 iter 0, loss: 0.858, accuracy: 0.779
epoch 10 iter 200, loss: 0.991, 

### Q1.3 SVHN Net Variations
Now we vary the structure of the network. To keep things simple, we still use  two identical conv layers, but vary their parameters. 

Report the final test accuracy on 3 different number of filters, and 3 different number of strides. Each time when you vary one parameter, keep the other fixed at the original value.

|Stride|Accuracy|
|--|-------------------------------|
| 4 | 0.833 |
| 8 | 0.750 |
| 16 | 0.570 |

|Filters|Accuracy|
|--|-------------------------------|
| 24 | 0.821 |
| 16 | 0.818 |
| 8 | 0.821 |

A template for one sample modification is given below. 

**Note:** you're welcome to decide how many training epochs to use, if that gets you the same results but faster.

In [5]:
def cnn_modification(x_):
#     raise NotImplemented("Add your code here!")
        
    ###################################
    
    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=16)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=16)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits
    
    ###################################

modified_model_dict = apply_classification_loss(cnn_modification)
train_model(modified_model_dict, dataset_generators, epoch_n=50, print_every=200)

epoch 0 iter 0, loss: 8.938, accuracy: 0.077
epoch 0 iter 200, loss: 1.950, accuracy: 0.322
epoch 1 iter 0, loss: 1.748, accuracy: 0.406
epoch 1 iter 200, loss: 1.679, accuracy: 0.431
epoch 2 iter 0, loss: 1.663, accuracy: 0.439
epoch 2 iter 200, loss: 1.589, accuracy: 0.468
epoch 3 iter 0, loss: 1.518, accuracy: 0.491
epoch 3 iter 200, loss: 1.470, accuracy: 0.511
epoch 4 iter 0, loss: 1.488, accuracy: 0.508
epoch 4 iter 200, loss: 1.464, accuracy: 0.514
epoch 5 iter 0, loss: 1.472, accuracy: 0.515
epoch 5 iter 200, loss: 1.398, accuracy: 0.536
epoch 6 iter 0, loss: 1.491, accuracy: 0.509
epoch 6 iter 200, loss: 1.405, accuracy: 0.539
epoch 7 iter 0, loss: 1.518, accuracy: 0.504
epoch 7 iter 200, loss: 1.387, accuracy: 0.542
epoch 8 iter 0, loss: 1.495, accuracy: 0.512
epoch 8 iter 200, loss: 1.376, accuracy: 0.547
epoch 9 iter 0, loss: 1.464, accuracy: 0.520
epoch 9 iter 200, loss: 1.343, accuracy: 0.560
epoch 10 iter 0, loss: 1.444, accuracy: 0.529
epoch 10 iter 200, loss: 1.371, ac

## Part 2: Saving and Reloading Model Weights
(25 points)

In this section you learn to save the weights of a trained model, and to load the weights of a saved model. This is really useful when we would like to load an already trained model in order to continue training or to fine-tune it. **Often times we save “snapshots” of the trained model as training progresses in case the training is interrupted, or in case we would like to fall back to an earlier model, this is called *snapshot saving*.**

### Q2.1 Defining another network
Define a network with a slightly different structure in `def cnn_expanded(x_)` below. `cnn_expanded` is an expanded version of `cnn_model`. 
It should have: 
- a different size of kernel for the last convolutional layer, 
- followed by one additional convolutional layer, and 
- followed by one additional pooling layer.

The last fully-connected layer will stay the same.

In [6]:
# Define the new model (see cnn_map(x_) above for an example)
def cnn_expanded(x_):
#     raise NotImplemented("Add your code here!")
        
    ###################################
    with tf.variable_scope('conv1'):
        conv1 = tf.layers.conv2d(
                inputs=x_,
                filters=32,  # number of filters
                kernel_size=[5, 5],
                padding="same",
                activation=tf.nn.relu)
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[8, 8],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv3 = tf.layers.conv2d(
            inputs=pool2,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool3 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    pool_flat = tf.contrib.layers.flatten(pool3, scope='pool3flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits
    
    ###################################


### Q2.2 Saving and Loading Weights
`new_train_model()` below has two additional parameters `save_model=False, load_model=False` than `train_model` defined previously. Modify `new_train_model()` such that it would 
- save weights after the training is complete if `save_model` is `True`, and
- load weights on start-up before training if `load_model` is `True`.

*Hint:*  `tf.train.Saver()`.

Note: if you are unable to load weights into `cnn_expanded` network, use `cnn_map` in order to continue the assingment.

In [13]:
#### Modify this:
def new_train_model(model_dict, dataset_generators, epoch_n, print_every,
                    save_model=False, load_model=False):
    with model_dict['graph'].as_default(), tf.Session() as sess:
        to_save_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv1')
        model_saver = tf.train.Saver(to_save_vars)
        sess.run(tf.global_variables_initializer())
        # Model Loading Logic
        # ----------------------------------
        if load_model:           
            # Create Model Loader
            model_saver.restore(sess, './model.ckpt')
            print("Model Loaded!!!")
        # ----------------------------------
            
        for epoch_i in range(epoch_n):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                sess.run(model_dict['train_op'], feed_dict=train_feed_dict)
                
                if iter_i % print_every == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        collect_arr.append(sess.run(to_compute, test_feed_dict))
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('iteration {:d} {:d}\t loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))

        # Model Saving Logic
        # ----------------------------------
        if save_model:
            # Create Model Saver
            model_saver.save(sess, './model.ckpt')
            print("Model Saved!!!")
        # ----------------------------------

def test_saving():
    ### Hint: call the saver like this: tf.train.Saver(var_list)
    ### where var_list is a list of TF variables you want to save (from conv1 layer)
    model_dict = apply_classification_loss(cnn_map)
    new_train_model(model_dict, dataset_generators, epoch_n=50, print_every=200, save_model=True)
    ### Hint: call the saver like this: tf.train.Saver(var_list)
    ### where var_list is a list of TF variables you want to load from the checkpoint (for conv1 layer)
    cnn_expanded_dict = apply_classification_loss(cnn_expanded)
    new_train_model(cnn_expanded_dict, dataset_generators, epoch_n=20, print_every=200, load_model=True)

In [14]:
test_saving()

iteration 0 0	 loss: 86.894, accuracy: 0.111
iteration 0 200	 loss: 2.115, accuracy: 0.246
iteration 1 0	 loss: 2.064, accuracy: 0.252
iteration 1 200	 loss: 1.170, accuracy: 0.644
iteration 2 0	 loss: 1.066, accuracy: 0.676
iteration 2 200	 loss: 0.955, accuracy: 0.716
iteration 3 0	 loss: 0.922, accuracy: 0.727
iteration 3 200	 loss: 0.839, accuracy: 0.758
iteration 4 0	 loss: 0.830, accuracy: 0.760
iteration 4 200	 loss: 0.811, accuracy: 0.769
iteration 5 0	 loss: 0.829, accuracy: 0.769
iteration 5 200	 loss: 0.827, accuracy: 0.775
iteration 6 0	 loss: 0.823, accuracy: 0.776
iteration 6 200	 loss: 0.817, accuracy: 0.784
iteration 7 0	 loss: 0.795, accuracy: 0.792
iteration 7 200	 loss: 0.847, accuracy: 0.783
iteration 8 0	 loss: 0.788, accuracy: 0.805
iteration 8 200	 loss: 0.853, accuracy: 0.790
iteration 9 0	 loss: 0.761, accuracy: 0.811
iteration 9 200	 loss: 0.803, accuracy: 0.806
iteration 10 0	 loss: 0.765, accuracy: 0.824
iteration 10 200	 loss: 0.784, accuracy: 0.825
iterati

## Part 3: Fine-tuning a Pre-trained Network on CIFAR-10
(20 points)

[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) is another popular benchmark for image classification.
We provide you with modified version of the file cifar10.py from [https://github.com/Hvass-Labs/TensorFlow-Tutorials](https://github.com/Hvass-Labs/TensorFlow-Tutorials).


In [15]:
import read_cifar10 as cf10

We also provide a generator for the CIFAR-10 Dataset, yielding the next batch every time next is invoked.

In [16]:
@read_data.restartable
def cifar10_dataset_generator(dataset_name, batch_size, restrict_size=1000):
    assert dataset_name in ['train', 'test']
    assert batch_size > 0 or batch_size == -1  # -1 for entire dataset
    
    X_all_unrestricted, y_all = (cf10.load_training_data() if dataset_name == 'train'
                                 else cf10.load_test_data())
    
    actual_restrict_size = restrict_size if dataset_name == 'train' else int(1e10)
    X_all = X_all_unrestricted[:actual_restrict_size]
    data_len = X_all.shape[0]
    batch_size = batch_size if batch_size > 0 else data_len
    
    X_all_padded = np.concatenate([X_all, X_all[:batch_size]], axis=0)
    y_all_padded = np.concatenate([y_all, y_all[:batch_size]], axis=0)
    
    for slice_i in range(math.ceil(data_len / batch_size)):
        idx = slice_i * batch_size
        #X_batch = X_all_padded[idx:idx + batch_size]
        X_batch = X_all_padded[idx:idx + batch_size]*255  # bugfix: thanks Zezhou Sun!
        y_batch = np.ravel(y_all_padded[idx:idx + batch_size])
        yield X_batch.astype(np.uint8), y_batch.astype(np.uint8)

cifar10_dataset_generators = {
    'train': cifar10_dataset_generator('train', 1000),
    'test': cifar10_dataset_generator('test', -1)
}

### Q3.1 Fine-tuning
Let's fine-tune SVHN net on **1000 examples** from CIFAR-10. 
Compare test accuracies of the following scenarios: 
  - Training `cnn_map` from scratch on the 1000 CIFAR-10 examples
  - Fine-tuning SVHN net (`cnn_map` trained on SVHN dataset) on 1000 exampes from CIFAR-10. Use `new_train_model()` defined above to load SVHN net weights, but train on the CIFAR-10 examples.
  
**Important:** please do not change the `restrict_size=1000` parameter.

In [18]:
cnn_expanded_dict = apply_classification_loss(cnn_expanded)

In [19]:
## train a model from scratch
new_train_model(cnn_expanded_dict, cifar10_dataset_generators, epoch_n=50, 
                print_every=10, save_model=False)

- Download progress: 100.0%
Download finished. Extracting files.
Done.
iteration 0 0	 loss: 76.687, accuracy: 0.093
iteration 1 0	 loss: 77.076, accuracy: 0.129
iteration 2 0	 loss: 44.266, accuracy: 0.102
iteration 3 0	 loss: 26.613, accuracy: 0.105
iteration 4 0	 loss: 11.161, accuracy: 0.111
iteration 5 0	 loss: 4.251, accuracy: 0.100
iteration 6 0	 loss: 2.634, accuracy: 0.100
iteration 7 0	 loss: 2.346, accuracy: 0.105
iteration 8 0	 loss: 2.307, accuracy: 0.131
iteration 9 0	 loss: 2.304, accuracy: 0.129
iteration 10 0	 loss: 2.304, accuracy: 0.118
iteration 11 0	 loss: 2.304, accuracy: 0.116
iteration 12 0	 loss: 2.303, accuracy: 0.114
iteration 13 0	 loss: 2.303, accuracy: 0.114
iteration 14 0	 loss: 2.302, accuracy: 0.115
iteration 15 0	 loss: 2.302, accuracy: 0.115
iteration 16 0	 loss: 2.301, accuracy: 0.116
iteration 17 0	 loss: 2.301, accuracy: 0.116
iteration 18 0	 loss: 2.300, accuracy: 0.115
iteration 19 0	 loss: 2.298, accuracy: 0.117
iteration 20 0	 loss: 2.297, accur

In [20]:
## fine-tuning SVHN Net using Cifar-10 weights saved in Q2
new_train_model(cnn_expanded_dict, cifar10_dataset_generators, epoch_n=50, 
                print_every=10, load_model=True)

Model Loaded!!!
iteration 0 0	 loss: 113.406, accuracy: 0.140
iteration 1 0	 loss: 89.570, accuracy: 0.119
iteration 2 0	 loss: 57.906, accuracy: 0.144
iteration 3 0	 loss: 32.672, accuracy: 0.135
iteration 4 0	 loss: 17.652, accuracy: 0.120
iteration 5 0	 loss: 9.062, accuracy: 0.112
iteration 6 0	 loss: 5.299, accuracy: 0.102
iteration 7 0	 loss: 3.591, accuracy: 0.093
iteration 8 0	 loss: 2.747, accuracy: 0.093
iteration 9 0	 loss: 2.425, accuracy: 0.109
iteration 10 0	 loss: 2.347, accuracy: 0.127
iteration 11 0	 loss: 2.325, accuracy: 0.130
iteration 12 0	 loss: 2.316, accuracy: 0.130
iteration 13 0	 loss: 2.311, accuracy: 0.111
iteration 14 0	 loss: 2.308, accuracy: 0.105
iteration 15 0	 loss: 2.306, accuracy: 0.105
iteration 16 0	 loss: 2.305, accuracy: 0.102
iteration 17 0	 loss: 2.304, accuracy: 0.101
iteration 18 0	 loss: 2.304, accuracy: 0.101
iteration 19 0	 loss: 2.304, accuracy: 0.101
iteration 20 0	 loss: 2.304, accuracy: 0.101
iteration 21 0	 loss: 2.303, accuracy: 0.10

## Part 4: TensorBoard
(30 points)

[TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) is a very helpful tool for visualization of neural networks. 

### Q4.1 Plotting
Present at least one visualization for each of the following:
  - Filters
  - Loss
  - Accuracy

Modify code you have wrote above to also have summary writers. To  run tensorboard, the command is `tensorboard --logdir=path/to/your/log/directory`.

#### For this part, I re-implemented the cnn_expanded model, using tf.nn.conv2d() completely so that I can extract all variables I wanted to visualize, for example the filters.

In [21]:
# Filter, loss, and accuracy visualizations
import copy
import os
import math
import numpy as np
import scipy
import scipy.io
import tensorflow as tf

from six.moves import range

import read_data

# SVNH Date Generator
@read_data.restartable
def svhn_dataset_generator(dataset_name, batch_size):
    assert dataset_name in ['train', 'test']
    assert batch_size > 0 or batch_size == -1  # -1 for entire dataset

    path = './svhn_mat/'  # path to the SVHN dataset you will download in Q1.1
    file_name = '%s_32x32.mat' % dataset_name
    file_dict = scipy.io.loadmat(os.path.join(path, file_name))
    X_all = file_dict['X'].transpose((3, 0, 1, 2))
    y_all = file_dict['y']
    data_len = X_all.shape[0]
    batch_size = batch_size if batch_size > 0 else data_len

    X_all_padded = np.concatenate([X_all, X_all[:batch_size]], axis=0)
    y_all_padded = np.concatenate([y_all, y_all[:batch_size]], axis=0)
    y_all_padded[y_all_padded == 10] = 0

    for slice_i in range(int(math.ceil(data_len / batch_size))):
        idx = slice_i * batch_size
        X_batch = X_all_padded[idx:idx + batch_size]
        y_batch = np.ravel(y_all_padded[idx:idx + batch_size])
        yield X_batch, y_batch

# Helper function to Visualize conv. features as an image
# Refer to: https://gist.github.com/kukuruza/03731dc494603ceab0c5
def put_kernels_on_grid (kernel, grid_Y, grid_X, pad = 1):
    '''Visualize conv. features as an image (mostly for the 1st layer).
    Place kernel into a grid, with some paddings between adjacent filters.

    Args:
      kernel:            tensor of shape [Y, X, NumChannels, NumKernels]
      (grid_Y, grid_X):  shape of the grid. Require: NumKernels == grid_Y * grid_X
                           User is responsible of how to break into two multiples.
      pad:               number of black pixels around each filter (between them)

    Return:
      Tensor of shape [(Y+2*pad)*grid_Y, (X+2*pad)*grid_X, NumChannels, 1].
    '''
    x_min = tf.reduce_min(kernel)
    x_max = tf.reduce_max(kernel)

    kernel1 = (kernel - x_min) / (x_max - x_min)

    # pad X and Y
    x1 = tf.pad(kernel1, tf.constant( [[pad,pad],[pad, pad],[0,0],[0,0]] ), mode = 'CONSTANT')

    # X and Y dimensions, w.r.t. padding
    Y = kernel1.get_shape()[0] + 2 * pad
    X = kernel1.get_shape()[1] + 2 * pad

    channels = kernel1.get_shape()[2]

    # put NumKernels to the 1st dimension
    x2 = tf.transpose(x1, (3, 0, 1, 2))
    
    # organize grid on Y axis
    # x3 = tf.reshape(x2, tf.pack([grid_X, Y * grid_Y, X, channels])) #3
    # In Tensorflow v1.0, changed to:
    x3 = tf.reshape(x2, tf.stack([grid_X, Y * grid_Y, X, channels])) #3

    # switch X and Y axes
    x4 = tf.transpose(x3, (0, 2, 1, 3))
    
    # organize grid on X axis
    # x5 = tf.reshape(x4, tf.pack([1, X * grid_X, Y * grid_Y, channels])) #3
    # In Tensorflow v1.0, changed to:
    x5 = tf.reshape(x4, tf.stack([1, X * grid_X, Y * grid_Y, channels])) #3
    
    # back to normal order (not combining with the next step for clarity)
    x6 = tf.transpose(x5, (2, 1, 3, 0))

    # to tf.image_summary order [batch_size, height, width, channels],
    #   where in this case batch_size == 1
    x7 = tf.transpose(x6, (3, 0, 1, 2))

    # scale to [0, 255] and convert to uint8
    return tf.image.convert_image_dtype(x7, dtype = tf.uint8) 
        
# Helper function to create a Variable stored on CPU memory for convolution layer
def _variable_on_cpu(name, shape, initializer):
    """Helper function to create a Variable stored on CPU memory.
    Args:
      name: name of the variable
      shape: list of ints
      initializer: initializer for Variable
    Returns:
      Variable Tensor
    """
    with tf.device('/cpu:0'):
        var = tf.get_variable(name, shape, initializer=initializer)
    return var

# Helper function to create a Variable stored on GPU memory for convolution layer
def _variable_on_gpu(name, shape, initializer):
    """Helper to create a Variable stored on GPU memory.
    Args:
      name: name of the variable
      shape: list of ints
      initializer: initializer for Variable
    Returns:
      Variable Tensor
    """
    var = tf.get_variable(name, shape, initializer=initializer)
    return var

# Helper function to create an initialized Variable with weight decay for convolution layer
def _variable_with_weight_decay(name, shape, stddev, wd):
    """Helper to create an initialized Variable with weight decay.
    Note that the Variable is initialized with a truncated normal distribution.
    A weight decay is added only if one is specified.
    Args:
      name: name of the variable
      shape: list of ints
      stddev: standard deviation of a truncated Gaussian
      wd: add L2Loss weight decay multiplied by this float. If None, weight
          decay is not added for this Variable.
    Returns:
      Variable Tensor
    """
    var = _variable_on_gpu(name, shape,
                           tf.truncated_normal_initializer(stddev=stddev))
    if wd:
        weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss')
        tf.add_to_collection('losses', weight_decay)
    return var

# Helper function to create summaries for activations
def _activation_summary(x):
    """Helper to create summaries for activations.
    Creates a summary that provides a histogram of activations.
    Creates a summary that measure the sparsity of activations.
    Args:
      x: Tensor
    Returns:
      nothing
    """
    tensor_name = x.op.name
    tf.summary.histogram(tensor_name + '/activations', x)
    tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x))

# Remade CNN model
def cnn_remake(x_):
    # Conv layer 1
    with tf.variable_scope('conv1') as scope:
        kernel = _variable_with_weight_decay(name='weights', shape = [5, 5, 3, 32], stddev=0.1, wd=0.0) # [width, height, dim input, filter output]
        conv = tf.nn.conv2d(input=x_, filter=kernel, strides=[1, 1, 1, 1], padding="SAME")
        biases = _variable_on_gpu(name='biases', shape=[32], initializer=tf.constant_initializer(0.0))
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope.name)
        
        # Tensorboard visualization
        _activation_summary(conv1)
        grid = put_kernels_on_grid (kernel, grid_X=4, grid_Y=8)
        tf.summary.image(scope.name + 'kernel', grid, max_outputs=1)
        
        
    # Pooling layer 1
    with tf.variable_scope('pool1') as scope:
        pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
    
    # Conv layer 2
    with tf.variable_scope('conv2') as scope:
        kernel = _variable_with_weight_decay(name='weights', shape = [5, 5, 32, 32], stddev=0.1, wd=0.0) # [width, height, dim input, filter output]
        
        conv = tf.nn.conv2d(input=pool1, filter=kernel, strides=[1, 1, 1, 1], padding="SAME")
        biases = _variable_on_gpu(name='biases', shape=[32], initializer=tf.constant_initializer(0.1))
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope.name)
        
        # Tensorboard visualization
        _activation_summary(conv2)
        # grid = put_kernels_on_grid (kernel, grid_X=4, grid_Y=8)
        # tf.summary.image(scope.name + 'kernel', grid, max_outputs=1)
        
    # Pooling layer 2
    with tf.variable_scope('pool2') as scope:
        pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
    
    # Conv layer 3
    with tf.variable_scope('conv3') as scope:
        kernel = _variable_with_weight_decay(name='weights', shape = [5, 5, 32, 32], stddev=0.1, wd=0.0) # [width, height, dim input, filter output]
        
        conv = tf.nn.conv2d(input=pool2, filter=kernel, strides=[1, 1, 1, 1], padding="SAME")
        biases = _variable_on_gpu(name='biases', shape=[32], initializer=tf.constant_initializer(0.1))
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name=scope.name)
        
        # Tensorboard visualization
        _activation_summary(conv3)
        # grid = put_kernels_on_grid (kernel, grid_X=4, grid_Y=8)
        # tf.summary.image(scope.name + 'kernel', grid, max_outputs=1)
                
    # Pooling layer 3
    with tf.variable_scope('pool3') as scope:
        pool3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
    
    # Flatten layer: Flattens the input while maintaining the batch_size.
    with tf.variable_scope('pool-flat') as scope:
        pool_flat = tf.contrib.layers.flatten(pool3, scope='pool3flat')
    
    # Fully-connected layer 1
    with tf.variable_scope('fc1') as scope:
        fc1 = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    
    # Fully-connected layer 2: output logits
    with tf.variable_scope('logits') as scope:
        logits = tf.layers.dense(inputs=fc1, units=10)
        
    return logits

def apply_classification_loss(model_function):
    with tf.Graph().as_default() as g:
#         with tf.device("/gpu:0"):  # use gpu:0 if on GPU
            x_ = tf.placeholder(tf.float32, [None, 32, 32, 3])
            y_ = tf.placeholder(tf.int32, [None])
            y_logits = model_function(x_)
            
            y_dict = dict(labels=y_, logits=y_logits)
            losses = tf.nn.sparse_softmax_cross_entropy_with_logits(**y_dict)
            cross_entropy_loss = tf.reduce_mean(losses)
            
            # TensorBoard operation
            tf.summary.scalar('cross_entropy_loss', cross_entropy_loss)
            
            trainer = tf.train.AdamOptimizer()
            train_op = trainer.minimize(cross_entropy_loss)
            
            y_pred = tf.argmax(tf.nn.softmax(y_logits), dimension=1)
            correct_prediction = tf.equal(tf.cast(y_pred, tf.int32), y_)
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
            
            # TensorBoard operation
            tf.summary.scalar('accuracy', accuracy)
            
            # Tensorboard merge all ops
            merged = tf.summary.merge_all()
            
    model_dict = {'graph': g, 'inputs': [x_, y_], 'train_op': train_op,
                  'accuracy': accuracy, 'loss': cross_entropy_loss, 'merged_summary':merged}
    
    return model_dict


def train_model(model_dict, dataset_generators, epoch_n, print_every):
    with model_dict['graph'].as_default(), tf.Session() as sess:
        
        # TensorBoard
        train_writer = tf.summary.FileWriter('./logs/train', sess.graph)
        
        sess.run(tf.global_variables_initializer())
        
        for epoch_i in range(epoch_n):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                
                # Tensorboard Train
                train_summary, _ = sess.run([model_dict['merged_summary'], model_dict['train_op']], feed_dict=train_feed_dict)
                
                # Tensorboard train writer
                train_writer.add_summary(train_summary, iter_i)
                
                if iter_i % print_every == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        collect_arr.append(sess.run(to_compute, test_feed_dict))
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('epoch {:d} iter {:d}, loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))
def run():
    dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
    }
    
    model_dict = apply_classification_loss(cnn_remake)
    train_model(model_dict, dataset_generators, epoch_n=50, print_every=200)

run()

epoch 0 iter 0, loss: 425.365, accuracy: 0.136
epoch 0 iter 200, loss: 2.204, accuracy: 0.229
epoch 1 iter 0, loss: 2.144, accuracy: 0.257
epoch 1 iter 200, loss: 1.893, accuracy: 0.372
epoch 2 iter 0, loss: 1.825, accuracy: 0.398
epoch 2 iter 200, loss: 1.491, accuracy: 0.535
epoch 3 iter 0, loss: 1.396, accuracy: 0.577
epoch 3 iter 200, loss: 1.257, accuracy: 0.625
epoch 4 iter 0, loss: 1.203, accuracy: 0.643
epoch 4 iter 200, loss: 1.107, accuracy: 0.672
epoch 5 iter 0, loss: 1.072, accuracy: 0.686
epoch 5 iter 200, loss: 1.079, accuracy: 0.687
epoch 6 iter 0, loss: 0.991, accuracy: 0.713
epoch 6 iter 200, loss: 0.941, accuracy: 0.731
epoch 7 iter 0, loss: 0.914, accuracy: 0.740
epoch 7 iter 200, loss: 0.879, accuracy: 0.754
epoch 8 iter 0, loss: 0.855, accuracy: 0.760
epoch 8 iter 200, loss: 0.823, accuracy: 0.770
epoch 9 iter 0, loss: 0.818, accuracy: 0.770
epoch 9 iter 200, loss: 0.823, accuracy: 0.767
epoch 10 iter 0, loss: 0.824, accuracy: 0.774
epoch 10 iter 200, loss: 0.799, 

#### Some visualization results

So now if we just use `tensorboard --logdir=./logs/` (on linux based system), we could get following results:

**Accuracy**

<img src="accur.png" style="height:50%; width:50%;">

**Loss**

<img src="loss.png" sytle="width:400px";>

**Filter (Convolutional layer1 as example)**

<img src="filter1.png" sytle="width:400px";>

## Part 5: Bonus
(20 points)

### Q5.1 SVHN Net ++
Improve the accuracy of SVHN Net beyond that of the provided demo: SVHN Net ++.

In [14]:
def cnn_bonus(x_):
    
    # 3 layers seems better 2 layers; not tried 4 layers yet
    # stride 1 seems better than 2 and larger
    # filter = 16 seems better than 32 and larger
    
    #
    # The first convolution layer produces 16 features with 5x5 convolution filters
    # The second convolution layer outputs 512 features with 7x7 filters
    # The classifier is a 2-layer non-linear classifier with 20 hidden units
    
    # We use stochastic gradient descent as our optimization method and shuffle our dataset after
    # each training iteration
    
    # pooling layers: p = 2, 4, 12 give the best performance
    
    with tf.variable_scope('conv1'):
        conv1 = tf.layers.conv2d(
                inputs=x_,
                filters=32,  # number of filters
                kernel_size=[7, 7],
                padding="same",
                activation=tf.nn.relu)

        pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                        pool_size=[2, 2], 
                                        strides=2)  # convolution stride
    
    conv2 = tf.layers.conv2d(
            name='conv2',
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool2 = tf.layers.max_pooling2d(name='pool2',
                                    inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv3 = tf.layers.conv2d(
            name='conv3',
            inputs=pool2,
            filters=64, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool3 = tf.layers.max_pooling2d(name='pool3',
                                    inputs=conv3, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
    
    conv4 = tf.layers.conv2d(
            name='conv4',
            inputs=pool3,
            filters=64, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu)
    
    pool4 = tf.layers.max_pooling2d(name='pool4',
                                    inputs=conv4, 
                                    pool_size=[2, 2], 
                                    strides=2)  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool4, scope='pool4flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits


def apply_classification_loss(model_function):
    with tf.Graph().as_default() as g:
        with tf.device("/gpu:0"):  # use gpu:0 if on GPU
            x_ = tf.placeholder(tf.float32, [None, 32, 32, 3])
            y_ = tf.placeholder(tf.int32, [None])
            y_logits = model_function(x_)
            
            y_dict = dict(labels=y_, logits=y_logits)
            losses = tf.nn.sparse_softmax_cross_entropy_with_logits(**y_dict)
            cross_entropy_loss = tf.reduce_mean(losses)
            trainer = tf.train.AdamOptimizer()
            train_op = trainer.minimize(cross_entropy_loss)
            
            y_pred = tf.argmax(tf.nn.softmax(y_logits), dimension=1)
            correct_prediction = tf.equal(tf.cast(y_pred, tf.int32), y_)
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    model_dict = {'graph': g, 'inputs': [x_, y_], 'train_op': train_op,
                  'accuracy': accuracy, 'loss': cross_entropy_loss}
    
    return model_dict

In [15]:
def SVHN_plusplus():
    model_dict = apply_classification_loss(cnn_bonus)
    train_model(model_dict, dataset_generators, epoch_n=50, print_every=200)
SVHN_plusplus()

epoch 0 iter 0, loss: 11.354, accuracy: 0.142
epoch 0 iter 200, loss: 0.758, accuracy: 0.772
epoch 1 iter 0, loss: 0.711, accuracy: 0.790
epoch 1 iter 200, loss: 0.558, accuracy: 0.838
epoch 2 iter 0, loss: 0.526, accuracy: 0.849
epoch 2 iter 200, loss: 0.478, accuracy: 0.868
epoch 3 iter 0, loss: 0.478, accuracy: 0.866
epoch 3 iter 200, loss: 0.495, accuracy: 0.863
epoch 4 iter 0, loss: 0.467, accuracy: 0.872
epoch 4 iter 200, loss: 0.456, accuracy: 0.876
epoch 5 iter 0, loss: 0.475, accuracy: 0.872
epoch 5 iter 200, loss: 0.460, accuracy: 0.883
epoch 6 iter 0, loss: 0.483, accuracy: 0.882
epoch 6 iter 200, loss: 0.496, accuracy: 0.883
epoch 7 iter 0, loss: 0.524, accuracy: 0.872
epoch 7 iter 200, loss: 0.509, accuracy: 0.877
epoch 8 iter 0, loss: 0.547, accuracy: 0.872
epoch 8 iter 200, loss: 0.525, accuracy: 0.880
epoch 9 iter 0, loss: 0.520, accuracy: 0.874
epoch 9 iter 200, loss: 0.531, accuracy: 0.884
epoch 10 iter 0, loss: 0.544, accuracy: 0.873
epoch 10 iter 200, loss: 0.551, a