In [6]:
# Problem Set 4
Designed by Ben Usman, Kun He, and Sarah Adel Bargal, with help from Kate Saenko and Brian Kulis.

This assignment will introduce you to:
1. Building and training a convolutional network
2. Saving snapshots of your trained model
3. Reloading weights from a saved model
4. Fine-tuning a pre-trained network
5. Visualizations using Tensorboard

This code has been tested and should for Python 3.5 and 2.7 with tensorflow 1.0.*. Since recently, you can update to recent tensorflow version just by doing `pip install tensorflow`,  or `pip install tensorflow-gpu` if you want to use GPU.

**Note:** This notebook contains problem descriptions and demo/starter code. However, you're welcome to implement and submit .py files directly, if that's easier for you. Starter .py files are provided in the same `pset4/` directory.

SyntaxError: invalid syntax (<ipython-input-6-9ec2e831c779>, line 2)

## Part 0: Tutorials

You will find these TensorFlow tutorials on CNNs useful:
 - [Deep MNIST for experts](https://www.tensorflow.org/get_started/mnist/pros)
 - [Convolutional Neural Networks](https://www.tensorflow.org/tutorials/deep_cnn)
 
Note that there are many ways to implement the same thing in TensorFlow, for example, both tf.nn and tf.layers provide convolutional layers but with slightly different interfaces. You will need to read the documentation of the functions provided below to understand how they work.

## Part 1: Building and Training a ConvNet on SVHN
(25 points)

First we provide demo code that trains a convolutional network on the [SVHN Dataset](http://ufldl.stanford.edu/housenumbers/).. 

You will need to download   __Format 2__ from the link above.
- Create a directory named `svhn_mat/` in the working directory. Or, you can create it anywhere you want, but change the path in `svhn_dataset_generator` to match it.
- Download `train_32x32.mat` and `test_32x32.mat` to this directory.
- `extra_32x32.mat` is NOT needed.
- You may find the `wget` command useful for downloading on linux. 



The following defines a generator for the SVHN Dataset, yielding the next batch every time next is invoked.

In [16]:
import copy
import os
import math
import numpy as np
import scipy
import scipy.io

from six.moves import range

import read_data

@read_data.restartable
def svhn_dataset_generator(dataset_name, batch_size):
    assert dataset_name in ['train', 'test']
    assert batch_size > 0 or batch_size == -1  # -1 for entire dataset
    
    path = './svhn_mat/' # path to the SVHN dataset you will download in Q1.1
    file_name = '%s_32x32.mat' % dataset_name
    file_dict = scipy.io.loadmat(os.path.join(path, file_name))
    X_all = file_dict['X'].transpose((3, 0, 1, 2))
    #print(X_all.shape)
    y_all = file_dict['y']
    data_len = X_all.shape[0]
    batch_size = batch_size if batch_size > 0 else data_len
    
    X_all_padded = np.concatenate([X_all, X_all[:batch_size]], axis=0)
    y_all_padded = np.concatenate([y_all, y_all[:batch_size]], axis=0)
    y_all_padded[y_all_padded == 10] = 0
    
    for slice_i in range(int(math.ceil(data_len / batch_size))):
        idx = slice_i * batch_size
        X_batch = X_all_padded[idx:idx + batch_size]
        y_batch = np.ravel(y_all_padded[idx:idx + batch_size])
        yield X_batch, y_batch

The following defines the CovNet Model. It has two identical conv layers with 32 5x5 convlution filters, followed by a fully-connected layer to output the logits.

In [17]:
import tensorflow as tf

def cnn_map(x_):
    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu, name = 'conv1')
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2, name = 'pool1')  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu, name = 'conv2')
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2, name = 'pool2')  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits


def apply_classification_loss(model_function):
    with tf.Graph().as_default() as g:
        with tf.device("/gpu:0"):  # use gpu:0 if on GPU
            x_ = tf.placeholder(tf.float32, [None, 32, 32, 3])
            y_ = tf.placeholder(tf.int32, [None])
            y_logits = model_function(x_)
            
            y_dict = dict(labels=y_, logits=y_logits)
            losses = tf.nn.sparse_softmax_cross_entropy_with_logits(**y_dict)
            cross_entropy_loss = tf.reduce_mean(losses)
            trainer = tf.train.AdamOptimizer()
            train_op = trainer.minimize(cross_entropy_loss)
            
            y_pred = tf.argmax(tf.nn.softmax(y_logits), dimension=1)
            correct_prediction = tf.equal(tf.cast(y_pred, tf.int32), y_)
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    model_dict = {'graph': g, 'inputs': [x_, y_], 'train_op': train_op,
                  'accuracy': accuracy, 'loss': cross_entropy_loss}
    
    return model_dict

### Q1.2 Training SVHN Net
Now we train a `cnn_map` net on Format 2 of the SVHN Dataset. We will call this "SVHN net". 

**Note:** training will take a while, so you might want to use GPU.

**Result for the cnn_map training on SVHN Dataset: **   
    Accuracy: 0.825 (averaged over 3 runs)  
    Number of epochs: 100

In [18]:
def train_model(model_dict, dataset_generators, epoch_n, print_every):
    with model_dict['graph'].as_default(), tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        
        for epoch_i in range(epoch_n):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                sess.run(model_dict['train_op'], feed_dict=train_feed_dict)
                
                if iter_i % print_every == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        collect_arr.append(sess.run(to_compute, test_feed_dict))
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('epoch {:d} iter {:d}, loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))

In [19]:
dataset_generators = {
        'train': svhn_dataset_generator('train', 256),
        'test': svhn_dataset_generator('test', 256)
}

In [20]:
    
model_dict = apply_classification_loss(cnn_map)
train_model(model_dict, dataset_generators, epoch_n=50, print_every=20)

epoch 0 iter 0, loss: 27.950, accuracy: 0.184
epoch 0 iter 20, loss: 2.263, accuracy: 0.164
epoch 0 iter 40, loss: 2.240, accuracy: 0.194
epoch 0 iter 60, loss: 2.244, accuracy: 0.166
epoch 0 iter 80, loss: 2.231, accuracy: 0.199
epoch 0 iter 100, loss: 2.226, accuracy: 0.205
epoch 0 iter 120, loss: 2.224, accuracy: 0.201
epoch 0 iter 140, loss: 2.218, accuracy: 0.208
epoch 0 iter 160, loss: 2.176, accuracy: 0.225
epoch 0 iter 180, loss: 1.873, accuracy: 0.364
epoch 0 iter 200, loss: 1.560, accuracy: 0.494
epoch 0 iter 220, loss: 1.374, accuracy: 0.567
epoch 0 iter 240, loss: 1.335, accuracy: 0.578
epoch 0 iter 260, loss: 1.243, accuracy: 0.608
epoch 0 iter 280, loss: 1.208, accuracy: 0.624
epoch 1 iter 0, loss: 1.194, accuracy: 0.629
epoch 1 iter 20, loss: 1.152, accuracy: 0.643
epoch 1 iter 40, loss: 1.177, accuracy: 0.634
epoch 1 iter 60, loss: 1.144, accuracy: 0.652
epoch 1 iter 80, loss: 1.079, accuracy: 0.676
epoch 1 iter 100, loss: 1.082, accuracy: 0.669
epoch 1 iter 120, loss: 

### Q1.3 SVHN Net Variations
Now we vary the structure of the network. To keep things simple, we still use  two identical conv layers, but vary their parameters. 

Report the final test accuracy on 3 different number of filters, and 3 different number of strides. Each time when you vary one parameter, keep the other fixed at the original value.

|Stride|Accuracy|
|--|-------------------------------|
| 1 / 1 | /0.795 |
| 2 / 2| / 0.818|
| 3 / 3| /0.842 |
| 4 / 4| /0.831 |
| 2 / 3| /0.862 |

|Filters|Accuracy|
|--|-------------------------------|
| 32 / 32 | / 0.825|
| 64 / 64| / 0.834 |
| 128 / 128| / 0.829|
| 256 / 256| / 0.838|
| 32 / 64| / 0.851|


A template for one sample modification is given below. 

**Note:** you're welcome to decide how many training epochs to use, if that gets you the same results but faster.

The reslts for the the cnn_modification are shown above. I tried 3 different filter numbers(64,128, 256). For this I kept the same number of filters for both convolutional layers, while using a stride = 2. The results were averaged over 3 runs per filter number, however it doesn not seem to be a significant difference between the accuracies from the three different filter numbers. I also tried using different number of filters for the two convolutional layers (conv1 = 32 filters and conv2= 64) and the accuracy increased a bit.  
  
Also tried three diffren strides(1, 3, 4), with both pooling layers using the same stride. The results are shown above. It seams that a stride of 3 gives the best results. Also a stride of 2 in the first pooling layer and a stride of 3 in the second pooling layer gives the highest accuracy.  

  

In [24]:
def cnn_modification(x_):

    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu, name = 'conv1')
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2, name = 'pool1')  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu, name = 'conv2')
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=3, name = 'pool2')  # convolution stride
        
    pool_flat = tf.contrib.layers.flatten(pool2, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits

modified_model_dict = apply_classification_loss(cnn_modification)
train_model(modified_model_dict, dataset_generators, epoch_n=50, print_every=10)

epoch 0 iter 0, loss: 42.441, accuracy: 0.122
epoch 0 iter 10, loss: 2.831, accuracy: 0.146
epoch 0 iter 20, loss: 2.299, accuracy: 0.130
epoch 0 iter 30, loss: 2.295, accuracy: 0.157
epoch 0 iter 40, loss: 2.282, accuracy: 0.169
epoch 0 iter 50, loss: 2.262, accuracy: 0.210
epoch 0 iter 60, loss: 2.235, accuracy: 0.231
epoch 0 iter 70, loss: 2.160, accuracy: 0.255
epoch 0 iter 80, loss: 2.125, accuracy: 0.281
epoch 0 iter 90, loss: 2.036, accuracy: 0.307
epoch 0 iter 100, loss: 1.934, accuracy: 0.356
epoch 0 iter 110, loss: 1.900, accuracy: 0.362
epoch 0 iter 120, loss: 1.795, accuracy: 0.415
epoch 0 iter 130, loss: 1.742, accuracy: 0.441
epoch 0 iter 140, loss: 1.717, accuracy: 0.436
epoch 0 iter 150, loss: 1.686, accuracy: 0.456
epoch 0 iter 160, loss: 1.654, accuracy: 0.473
epoch 0 iter 170, loss: 1.623, accuracy: 0.483
epoch 0 iter 180, loss: 1.566, accuracy: 0.497
epoch 0 iter 190, loss: 1.576, accuracy: 0.503
epoch 0 iter 200, loss: 1.576, accuracy: 0.504
epoch 0 iter 210, loss:

## Part 2: Saving and Reloading Model Weights
(25 points)

In this section you learn to save the weights of a trained model, and to load the weights of a saved model. This is really useful when we would like to load an already trained model in order to continue training or to fine-tune it. Often times we save “snapshots” of the trained model as training progresses in case the training is interrupted, or in case we would like to fall back to an earlier model, this is called snapshot saving.

### Q2.1 Defining another network
Define a network with a slightly different structure in `def cnn_expanded(x_)` below. `cnn_expanded` is an expanded version of `cnn_model`. 
It should have: 
- a different size of kernel for the last convolutional layer, 
- followed by one additional convolutional layer, and 
- followed by one additional pooling layer.

The last fully-connected layer will stay the same.

In [13]:
# Define the new model (see cnn_map(x_) above for an example)
def cnn_expanded(x_):
    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=32,  # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu, name = 'conv1')
    
    pool1 = tf.layers.max_pooling2d(inputs=conv1, 
                                    pool_size=[2, 2], 
                                    strides=2, name = 'pool1')  # convolution stride
    
    conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=32, # number of filters
            kernel_size=[10, 10],
            padding="same",
            activation=tf.nn.relu, name = 'conv2')
    
    pool2 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2, name = 'pool1')  # convolution stride
    
    conv3 = tf.layers.conv2d(
            inputs=pool2,
            filters=32, # number of filters
            kernel_size=[5, 5],
            padding="same",
            activation=tf.nn.relu, name = 'conv3')
    
    pool3 = tf.layers.max_pooling2d(inputs=conv3, 
                                    pool_size=[2, 2], 
                                    strides=2, name = 'pool3')  # convolution stride
   
    
    pool_flat = tf.contrib.layers.flatten(pool3, scope='pool2flat')
    dense = tf.layers.dense(inputs=pool_flat, units=500, activation=tf.nn.relu)
    logits = tf.layers.dense(inputs=dense, units=10)
    return logits
    

### Q2.2 Saving and Loading Weights
`new_train_model()` below has two additional parameters `save_model=False, load_model=False` than `train_model` defined previously. Modify `new_train_model()` such that it would 
- save weights after the training is complete if `save_model` is `True`, and
- load weights on start-up before training if `load_model` is `True`.

*Hint:*  `tf.train.Saver()`.

Note: if you are unable to load weights into `cnn_expanded` network, use `cnn_map` in order to continue the assingment.

**Results:**
  
In cnn_expanded I changed the kernel_size =  [10, 10] for the second convolutional layer (conv2), added one additional convolutional layer, and an additional pooling layer. The architecture is shown above.   
  
For the new_train_model, if save_model=True I save the weight of the firts convolutional layer (conv1) in my-model.meta file. If load_model=True I load the conv1 weights, while randomly initializing the other layer weights. 

As shwon below: 

If load_model=False:  
Accuracy(after 10 epochs): 0.809  
Accuracy(after 100 epochs): 0.844

If load_model=True:
Accuracy(after 10 epochs): 0.830

This shows that loading the saved weights makes the results converge faster.

In [14]:
#### Modify this:
def new_train_model(model_dict, dataset_generators, epoch_n, print_every,
                    save_model=False, load_model=False):
    
    if load_model == False:
        with model_dict['graph'].as_default(), tf.Session() as sess:
            sess.run(tf.global_variables_initializer())

            for epoch_i in range(epoch_n):
                for iter_i, data_batch in enumerate(dataset_generators['train']):
                    train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                    sess.run(model_dict['train_op'], feed_dict=train_feed_dict)

                    if iter_i % print_every == 0:
                        collect_arr = []
                        for test_batch in dataset_generators['test']:
                            test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                            to_compute = [model_dict['loss'], model_dict['accuracy']]
                            collect_arr.append(sess.run(to_compute, test_feed_dict))
                        averages = np.mean(collect_arr, axis=0)
                        fmt = (epoch_i, iter_i, ) + tuple(averages)
                        print('iteration {:d} {:d}\t loss: {:.3f}, '
                              'accuracy: {:.3f}'.format(*fmt))
            if save_model == True:
                saver = tf.train.Saver(tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv1'))
                save_path = saver.save(sess, 'my-model')
                print("Model saved in file: %s" % save_path)
    else:
        print('True')
        with model_dict['graph'].as_default(), tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            new_saver = tf.train.import_meta_graph('my-model.meta')
            new_saver.restore(sess, tf.train.latest_checkpoint('./'))
            
            for epoch_i in range(epoch_n):
                for iter_i, data_batch in enumerate(dataset_generators['train']):
                    train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                    sess.run(model_dict['train_op'], feed_dict=train_feed_dict)

                    if iter_i % print_every == 0:
                        collect_arr = []
                        for test_batch in dataset_generators['test']:
                            test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                            to_compute = [model_dict['loss'], model_dict['accuracy']]
                            collect_arr.append(sess.run(to_compute, test_feed_dict))
                        averages = np.mean(collect_arr, axis=0)
                        fmt = (epoch_i, iter_i, ) + tuple(averages)
                        print('iteration {:d} {:d}\t loss: {:.3f}, '
                              'accuracy: {:.3f}'.format(*fmt))
        
 

    

def test_saving():
    model_dict = apply_classification_loss(cnn_map)
    new_train_model(model_dict, dataset_generators, epoch_n=100, print_every=10, save_model=True)
    cnn_expanded_dict = apply_classification_loss(cnn_expanded)
    new_train_model(cnn_expanded_dict, dataset_generators, epoch_n=10, print_every=10, load_model=True)

In [15]:
test_saving()

iteration 0 0	 loss: 95.395, accuracy: 0.196
iteration 0 10	 loss: 2.372, accuracy: 0.098
iteration 0 20	 loss: 2.267, accuracy: 0.195
iteration 0 30	 loss: 2.248, accuracy: 0.196
iteration 0 40	 loss: 2.244, accuracy: 0.196
iteration 0 50	 loss: 2.242, accuracy: 0.196
iteration 0 60	 loss: 2.250, accuracy: 0.195
iteration 0 70	 loss: 2.247, accuracy: 0.195
iteration 0 80	 loss: 2.241, accuracy: 0.196
iteration 0 90	 loss: 2.236, accuracy: 0.196
iteration 0 100	 loss: 2.232, accuracy: 0.196
iteration 0 110	 loss: 2.233, accuracy: 0.196
iteration 0 120	 loss: 2.230, accuracy: 0.196
iteration 0 130	 loss: 2.232, accuracy: 0.196
iteration 0 140	 loss: 2.230, accuracy: 0.196
iteration 0 150	 loss: 2.228, accuracy: 0.196
iteration 0 160	 loss: 2.226, accuracy: 0.196
iteration 0 170	 loss: 2.225, accuracy: 0.196
iteration 0 180	 loss: 2.225, accuracy: 0.196
iteration 0 190	 loss: 2.225, accuracy: 0.196
iteration 0 200	 loss: 2.224, accuracy: 0.196
iteration 0 210	 loss: 2.227, accuracy: 0.19

## Part 3: Fine-tuning a Pre-trained Network on CIFAR-10
(20 points)

[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) is another popular benchmark for image classification.
We provide you with modified verstion of the file cifar10.py from [https://github.com/Hvass-Labs/TensorFlow-Tutorials](https://github.com/Hvass-Labs/TensorFlow-Tutorials).


In [7]:
import read_cifar10 as cf10

We also provide a generator for the CIFAR-10 Dataset, yielding the next batch every time next is invoked.

In [8]:
@read_data.restartable
def cifar10_dataset_generator(dataset_name, batch_size, restrict_size=1000):
    assert dataset_name in ['train', 'test']
    assert batch_size > 0 or batch_size == -1  # -1 for entire dataset
    
    X_all_unrestricted, y_all = (cf10.load_training_data() if dataset_name == 'train'
                                 else cf10.load_test_data())
    
    actual_restrict_size = restrict_size if dataset_name == 'train' else int(1e10)
    X_all = X_all_unrestricted[:actual_restrict_size]
    data_len = X_all.shape[0]
    batch_size = batch_size if batch_size > 0 else data_len
    
    X_all_padded = np.concatenate([X_all, X_all[:batch_size]], axis=0)
    y_all_padded = np.concatenate([y_all, y_all[:batch_size]], axis=0)
    
    for slice_i in range(math.ceil(data_len / batch_size)):
        idx = slice_i * batch_size
        X_batch = X_all_padded[idx:idx + batch_size]*255
        y_batch = np.ravel(y_all_padded[idx:idx + batch_size])
        yield X_batch.astype(np.uint8), y_batch.astype(np.uint8)

cifar10_dataset_generators = {
    'train': cifar10_dataset_generator('train', 1000),
    'test': cifar10_dataset_generator('test', -1)
}


### Q3.1 Fine-tuning
Let's fine-tune SVHN net on **1000 examples** from CIFAR-10. 
Compare test accuracies of the following scenarios: 
  - Training `cnn_map` from scratch on the 1000 CIFAR-10 examples
  - Fine-tuning SVHN net (`cnn_map` trained on SVHN dataset) on 1000 exampes from CIFAR-10. Use `new_train_model()` defined above to load SVHN net weights, but train on the CIFAR-10 examples.
  
**Important:** please do not change the `restrict_size=1000` parameter.


**Results:**  
  
Run 1:
  
Accuracy for training cnn_map from scratch on the 1000 CIFAR examples: 0.352   
Accuracy for training cnn_map using loaded SVHN weights: 0.332 
(these results shown below)

Run 2:
  
Accuracy for training cnn_map from scratch on the 1000 CIFAR examples: 0.320 
Accuracy for training cnn_map using loaded SVHN weights: 0.343

My results show that I am getting similar accuracies with both methods.


In [12]:
cnn_expanded_dict = apply_classification_loss(cnn_expanded)

In [13]:
## train a model from scratch

new_train_model(cnn_expanded_dict, cifar10_dataset_generators, epoch_n=100, 
                print_every=10, save_model=False)

iteration 0 0	 loss: 44.126, accuracy: 0.110
iteration 1 0	 loss: 40.660, accuracy: 0.106
iteration 2 0	 loss: 45.088, accuracy: 0.100
iteration 3 0	 loss: 25.023, accuracy: 0.102
iteration 4 0	 loss: 18.011, accuracy: 0.103
iteration 5 0	 loss: 11.468, accuracy: 0.099
iteration 6 0	 loss: 7.698, accuracy: 0.100
iteration 7 0	 loss: 5.025, accuracy: 0.093
iteration 8 0	 loss: 3.763, accuracy: 0.097
iteration 9 0	 loss: 3.582, accuracy: 0.115
iteration 10 0	 loss: 3.169, accuracy: 0.112
iteration 11 0	 loss: 2.701, accuracy: 0.114
iteration 12 0	 loss: 2.440, accuracy: 0.114
iteration 13 0	 loss: 2.348, accuracy: 0.117
iteration 14 0	 loss: 2.323, accuracy: 0.106
iteration 15 0	 loss: 2.318, accuracy: 0.103
iteration 16 0	 loss: 2.318, accuracy: 0.099
iteration 17 0	 loss: 2.315, accuracy: 0.098
iteration 18 0	 loss: 2.310, accuracy: 0.099
iteration 19 0	 loss: 2.303, accuracy: 0.102
iteration 20 0	 loss: 2.298, accuracy: 0.106
iteration 21 0	 loss: 2.292, accuracy: 0.115
iteration 22 0

In [14]:
## fine-tuning SVHN Net using Cifar-10 weights saved in Q2
new_train_model(cnn_expanded_dict, cifar10_dataset_generators, epoch_n=100, 
                print_every=10, load_model=True)

True
iteration 0 0	 loss: 32.435, accuracy: 0.103
iteration 1 0	 loss: 46.614, accuracy: 0.103
iteration 2 0	 loss: 50.296, accuracy: 0.100
iteration 3 0	 loss: 26.691, accuracy: 0.130
iteration 4 0	 loss: 18.186, accuracy: 0.123
iteration 5 0	 loss: 12.374, accuracy: 0.102
iteration 6 0	 loss: 6.984, accuracy: 0.115
iteration 7 0	 loss: 4.781, accuracy: 0.111
iteration 8 0	 loss: 3.478, accuracy: 0.108
iteration 9 0	 loss: 2.761, accuracy: 0.126
iteration 10 0	 loss: 2.546, accuracy: 0.119
iteration 11 0	 loss: 2.465, accuracy: 0.127
iteration 12 0	 loss: 2.422, accuracy: 0.124
iteration 13 0	 loss: 2.388, accuracy: 0.123
iteration 14 0	 loss: 2.359, accuracy: 0.126
iteration 15 0	 loss: 2.338, accuracy: 0.128
iteration 16 0	 loss: 2.322, accuracy: 0.140
iteration 17 0	 loss: 2.310, accuracy: 0.146
iteration 18 0	 loss: 2.299, accuracy: 0.151
iteration 19 0	 loss: 2.290, accuracy: 0.155
iteration 20 0	 loss: 2.279, accuracy: 0.160
iteration 21 0	 loss: 2.268, accuracy: 0.164
iteration

## Part 4: TensorBoard
(30 points)

[TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) is a very helpful tool for visualization of neural networks. 

### Q4.1 Plotting
Present at least one visualization for each of the following:
  - Filters
  - Loss
  - Accuracy

Modify code you have wrote above to also have summary writers. To  run tensorboard, the command is `tensorboard --logdir=path/to/your/log/directory`.

In [25]:
# # Filter, loss, and accuracy visualizations

def train_model_tensorboard(model_dict, dataset_generators, epoch_n, print_every):
    with model_dict['graph'].as_default(), tf.Session() as sess:
        tf.summary.scalar('loss', model_dict['loss'])
        tf.summary.scalar('accuracy', model_dict['accuracy'])
        tf.summary.histogram('filters_conv1', tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv1')[0])
        tf.summary.histogram('filters_conv2', tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv2')[0])
        #tf.summary.histogram('filters_conv3', tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'conv3'))
        summary_op = tf.summary.merge_all()
        
        writer_train = tf.summary.FileWriter('./graphs' + '/train', sess.graph)
        writer_test = tf.summary.FileWriter('./graphs' + '/test')
        sess.run(tf.global_variables_initializer())
      
        
        for epoch_i in range(epoch_n):
            for iter_i, data_batch in enumerate(dataset_generators['train']):
                train_feed_dict = dict(zip(model_dict['inputs'], data_batch))
                _, summary_train = sess.run([model_dict['train_op'], summary_op], feed_dict=train_feed_dict)
                # write log
                writer_train.add_summary(summary_train, epoch_i)
                
                if iter_i % print_every == 0:
                    collect_arr = []
                    for test_batch in dataset_generators['test']:
                        test_feed_dict = dict(zip(model_dict['inputs'], test_batch))
                        to_compute = [model_dict['loss'], model_dict['accuracy']]
                        #collect_arr.append(sess.run(to_compute, test_feed_dict))
                        run_test, summary_test = sess.run([to_compute, summary_op], test_feed_dict)
                        writer_test.add_summary(summary_test, epoch_i)
                        collect_arr.append(run_test)
                        
                        #summary,_ = sess.run(merged, feed_dict = test_feed_dict)
                        #test_writer.add_summary(summary, iter_i)                    
                        
                    averages = np.mean(collect_arr, axis=0)
                    fmt = (epoch_i, iter_i, ) + tuple(averages)
                    print('epoch {:d} iter {:d}, loss: {:.3f}, '
                          'accuracy: {:.3f}'.format(*fmt))


In [None]:
model_dict = apply_classification_loss(cnn_map)
train_model_tensorboard(model_dict, dataset_generators, epoch_n=50, print_every=20)

epoch 0 iter 0, loss: 69.310, accuracy: 0.195
epoch 0 iter 20, loss: 2.270, accuracy: 0.181
epoch 0 iter 40, loss: 2.238, accuracy: 0.198
epoch 0 iter 60, loss: 2.217, accuracy: 0.207
epoch 0 iter 80, loss: 2.111, accuracy: 0.264
epoch 0 iter 100, loss: 1.914, accuracy: 0.340
epoch 0 iter 120, loss: 1.581, accuracy: 0.483
epoch 0 iter 140, loss: 1.453, accuracy: 0.536
epoch 0 iter 160, loss: 1.367, accuracy: 0.567
epoch 0 iter 180, loss: 1.288, accuracy: 0.595
epoch 0 iter 200, loss: 1.239, accuracy: 0.621
epoch 0 iter 220, loss: 1.199, accuracy: 0.630
epoch 0 iter 240, loss: 1.191, accuracy: 0.634
epoch 0 iter 260, loss: 1.142, accuracy: 0.647
epoch 0 iter 280, loss: 1.096, accuracy: 0.667
epoch 1 iter 0, loss: 1.110, accuracy: 0.661
epoch 1 iter 20, loss: 1.070, accuracy: 0.673
epoch 1 iter 40, loss: 1.085, accuracy: 0.668
epoch 1 iter 60, loss: 1.045, accuracy: 0.681
epoch 1 iter 80, loss: 1.007, accuracy: 0.691
epoch 1 iter 100, loss: 0.999, accuracy: 0.697
epoch 1 iter 120, loss: 

**Training and testing visualization results for the two convolutional layer network in part 1.2:**
      
1. Filters:

<img src="filters.png" style="weight:20px;"> 

2. Loss:

<img src="loss.png" style="weight:50px;"> 

3. Accuracy:

<img src="accuracy.png" style="weight:50px;"> 



The above plots show that the training accuracy(blue) is better then the testing accuracy(orange). Also the loss plot shown that we need regularization. The filters histograms show the weight values(x axis) vs. frequencies.

## Part 5: Bonus
(20 points)

### Q5.1 SVHN Net ++
Improve the accuracy of SVHN Net beyond that of the provided demo: SVHN Net ++.

For this part of the homework I implemented an architecture similar to VGG Net:

conv1: kernel_size = [3,3], filters = 64  
conv2: kernel_size = [3,3], filters = 64  
pool1: pool_size=[2, 2], strides=2  
conv3: kernel_size = [3,3], filters = 128  
conv4: kernel_size = [3,3], filters = 128  
pool2: pool_size=[2, 2], strides=2  
conv5: kernel_size = [3,3], filters = 256  
conv6: kernel_size = [3,3], filters = 256  
conv7: kernel_size = [3,3], filters = 256  
pool3: pool_size=[2, 2], strides=2  
conv8: kernel_size = [3,3], filters = 256  
conv9: kernel_size = [3,3], filters = 256  
conv10: kernel_size = [3,3], filters = 256  
pool4: pool_size=[2, 2], strides=2  
fc: units=1000  
fc: units=1000  
fc: units=500  
  
With this architecture I get an accuracy: 0.937  

I also tried AlexNet, however I got a lower accuracy compared with the above architecture implemetation. 


In [17]:
def apply_classification_loss_plus(model_function):
    with tf.Graph().as_default() as g:
        with tf.device("/gpu:0"):  # use gpu:0 if on GPU
            x_ = tf.placeholder(tf.float32, [None, 32, 32, 3])
            y_ = tf.placeholder(tf.int32, [None])
            y_logits = model_function(x_)
            
            y_dict = dict(labels=y_, logits=y_logits)
            losses = tf.nn.sparse_softmax_cross_entropy_with_logits(**y_dict)
            cross_entropy_loss = tf.reduce_mean(losses)
            trainer = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-03)
            train_op = trainer.minimize(cross_entropy_loss)
            
            y_pred = tf.argmax(tf.nn.softmax(y_logits), dimension=1)
            correct_prediction = tf.equal(tf.cast(y_pred, tf.int32), y_)
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    model_dict = {'graph': g, 'inputs': [x_, y_], 'train_op': train_op,
                  'accuracy': accuracy, 'loss': cross_entropy_loss}
    
    return model_dict

In [35]:
def cnn_vgg_net(x_):
    conv1 = tf.layers.conv2d(
            inputs=x_,
            filters=64,  # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv1')
    
    conv2 = tf.layers.conv2d(
            inputs=conv1,
            filters=64, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv2')
    
#     norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
#             name='norm2')
    
    pool1 = tf.layers.max_pooling2d(inputs=conv2, 
                                    pool_size=[2, 2], 
                                    strides=2, name = 'pool1')  # convolution stride
    
    conv3 = tf.layers.conv2d(
            inputs=pool1,
            filters=128, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv3')
    
#     norm3 = tf.nn.lrn(conv3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
#             name='norm3')
    
#     pool3 = tf.layers.max_pooling2d(inputs=conv3, 
#                                     pool_size=[2, 2], 
#                                     strides=3, name = 'pool3')  # convolution stride
    
    conv4 = tf.layers.conv2d(
            inputs=conv3,
            filters=128, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv4')
    
    pool2 = tf.layers.max_pooling2d(inputs=conv4, 
                                    pool_size=[2, 2], 
                                    strides=1, name = 'pool2')  # convolution stride
    conv5 = tf.layers.conv2d(
            inputs=pool2,
            filters=256, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv5')
    
    conv6 = tf.layers.conv2d(
            inputs=conv5,
            filters=256, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv6')
    
    conv7 = tf.layers.conv2d(
            inputs=conv6,
            filters=256, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv7')      
    
    
    pool3 = tf.layers.max_pooling2d(inputs=conv7, 
                                    pool_size=[2, 2], 
                                    strides=1, name = 'pool3')  # convolution stride
    
    conv8 = tf.layers.conv2d(
            inputs=pool3,
            filters=256, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv8')
    
    conv9 = tf.layers.conv2d(
            inputs=conv8,
            filters=256, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv9')
    
    conv10 = tf.layers.conv2d(
            inputs=conv9,
            filters=256, # number of filters
            kernel_size=[3, 3],
            padding="same",
            activation=tf.nn.relu, name = 'conv10')      
    
    
    pool4 = tf.layers.max_pooling2d(inputs=conv10, 
                                    pool_size=[2, 2], 
                                    strides=1, name = 'pool4')  # convolution stride
    
  
    
    pool_flat = tf.contrib.layers.flatten(pool4, scope='pool2flat')
    dense1 = tf.layers.dense(inputs=pool_flat, units=1000, activation=tf.nn.relu)
    dense2 = tf.layers.dense(inputs=dense1, units=1000, activation=tf.nn.relu)
    dense3 = tf.layers.dense(inputs=dense2, units=500, activation=tf.nn.relu)
    
    logits = tf.layers.dense(inputs=dense3, units=10)
    return logits
    

In [None]:
cnn_plus_dict_alex = apply_classification_loss_plus(cnn_vgg_net)
new_train_model(cnn_plus_dict_alex, dataset_generators, epoch_n=100, print_every=20, load_model=True)

True
iteration 0 0	 loss: 6.004, accuracy: 0.111
iteration 0 20	 loss: 2.235, accuracy: 0.196
iteration 0 40	 loss: 2.237, accuracy: 0.196
iteration 0 60	 loss: 2.237, accuracy: 0.167
iteration 0 80	 loss: 2.228, accuracy: 0.196
iteration 0 100	 loss: 2.204, accuracy: 0.196
iteration 0 120	 loss: 2.218, accuracy: 0.210
iteration 0 140	 loss: 2.038, accuracy: 0.291
iteration 0 160	 loss: 1.846, accuracy: 0.383
iteration 0 180	 loss: 1.651, accuracy: 0.470
iteration 0 200	 loss: 1.260, accuracy: 0.609
iteration 0 220	 loss: 1.135, accuracy: 0.641
iteration 0 240	 loss: 0.977, accuracy: 0.702
iteration 0 260	 loss: 0.761, accuracy: 0.766
iteration 0 280	 loss: 0.672, accuracy: 0.803
iteration 1 0	 loss: 0.654, accuracy: 0.805
iteration 1 20	 loss: 0.625, accuracy: 0.810
iteration 1 40	 loss: 0.561, accuracy: 0.834
iteration 1 60	 loss: 0.549, accuracy: 0.835
iteration 1 80	 loss: 0.520, accuracy: 0.850
iteration 1 100	 loss: 0.481, accuracy: 0.861
iteration 1 120	 loss: 0.455, accuracy: 0

Training and testing visualization results for the above architecture:
      
1. Filters:

<img src="filters_5.png" style="weight:20px;"> 

2. Loss:

<img src="loss_5.png" style="weight:20px;"> 

3. Accuracy:

<img src="accuracy_5.png" style="weight:20px;"> 

