# Homework 6
In this homework we will build deeper networks with 20 or more layers.

Development notes: 

1) If you are doing your homework in a Jupyter/iPython notebook you may need to 'Restart & Clear Output' after making a change and re-running a cell.  TensorFlow will not allow you to create multiple variables with the same name, which is what you are doing when you run a cell that creates a variable twice.<br/><br/>
2) Be careful with your calls to global_variables_initializer(). If you call it after training one network it will re-initialize your variables erasing your training.  In general, double check the outputs of your model after all training and before turning your model in. Ending a session will discard all your variable values.

## Part 0: Setup

In [None]:
import tensorflow as tf
import numpy as np
import util

# Load the data we are giving you
def load(filename, W=64, H=64):
    data = np.fromfile(filename, dtype=np.uint8).reshape((-1, W*H*3+1))
    images, labels = data[:, :-1].reshape((-1,H,W,3)), data[:, -1]
    return images, labels

image_data, label_data = load('tux_train.dat')

print('Input shape: ' + str(image_data.shape))
print('Labels shape: ' + str(label_data.shape))

num_classes = 6

## Part 1: Define your convnet

In [None]:
# Lets clear the tensorflow graph, so that you don't have to restart the notebook every time you change the network
tf.reset_default_graph()

# Set up your input placeholder
inputs = tf.placeholder(tf.float32, (None,64,64,3))

# Set up your input placeholder
training = tf.placeholder_with_default(False, (), name='training')

# Step 1: Augment the training data (try the following, not all might improve the performance)
#  * mirror the image
#  * color augmentations (keep the values to small ranges first then try to expand):
#    - brightness
#    - hue
#    - saturation
#    - contrast

def data_augmentation(I):
    # TODO: Put your data augmentation here (copy from HW5, you can skip the color augmentation for this homework)
    return I

# map_fn applies data_augmentation independently for each image in the batch, since we are not croping 
# let's apply the augmentation before whitening, it does make evaluation easier
aug_input = tf.map_fn(data_augmentation, inputs)

# During evaluation we don't want data augmentation
eval_inputs = tf.identity(aug_input, name='inputs')

# Whenever you deal with image data it's important to mean center it first and subtract the standard deviation
white_inputs = (eval_inputs - 100.) / 72.


# Set up your label placeholders
labels = tf.placeholder(tf.int64, (None), name='labels')

with tf.name_scope('model'), tf.variable_scope('model'):
    # Step 2: define the compute graph of your CNN here.
    #     Build the network out of 20 3x3 convolutions without striding and 5 pooling layers with stride=2.
    #     Hint: Use a for loop or two to define the model
    #     Hint: make sure your classification layer does not have a relu `activation_fn=None`
    #   Train this model first.
    # Step 3: Add batch normalization
    #     Hint: you don't need to use scale or center if you apply BN before a convolution.
    #           You do need center if BN is between the conv and a ReLU
    #     Don't forget to give the batch_normalization layer a 'training=training' argument.
    #  Train your model (you should see it converge much faster now).
    # Step 4: Add residual connections
    #  For simplicity you do not need to add a residual connection to every layer, but add them to at least 
    #      half of your layers
    #  Train your model (you should see it converge even faster now).
    h = white_inputs
    
    # TODO: Put your code here
    
    # The input here should be a   None x 1 x 1 x 6   tensor
    h = tf.contrib.layers.flatten(h)

    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=h, labels=labels))

output = tf.identity(h, name='output')

regularization_loss = tf.losses.get_regularization_loss()

# Let's weight the regularization loss down, otherwise it will hurt the model performance
# You can tune this weight if you wish
total_loss = loss + 1e-6 * regularization_loss

# create an optimizer
# NOTE: you might have to play with the learning rate as you try out 
# batch_normalization (0.001 might work well without BN, 0.1 with, 0.001 for resnets)
optimizer = tf.train.MomentumOptimizer(0.001, 0.9)

# use that optimizer on your loss function (control_dependencies makes sure any 
# batch_norm parameters are properly updated)
with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
    opt = optimizer.minimize(total_loss)
correct = tf.equal(tf.argmax(output, 1), labels)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

print( "Total number of variables used ", np.sum([v.get_shape().num_elements() for v in tf.trainable_variables()]), '/', 500000 )


## Part 2: Training

Training might take up to 20 min depending on your architecture (and if you have a GPU or not). A network without BN will train much slower, but try it first anyway.

In [None]:
image_val, label_val = load('tux_val.dat')
# Batch size
BS = 32

# Start a session
sess = tf.Session()

# Set up training
sess.run(tf.global_variables_initializer())

# An epoch is a single pass over the training data
for epoch in range(20):
    # Let's shuffle the data every epoch
    np.random.seed(epoch)
    np.random.shuffle(image_data)
    np.random.seed(epoch)
    np.random.shuffle(label_data)
    # Go through the entire dataset once
    accuracy_vals, loss_vals = [], []
    for i in range(0, image_data.shape[0]-BS+1, BS):
        # Train a single batch
        batch_images, batch_labels = image_data[i:i+BS], label_data[i:i+BS]
        accuracy_val, loss_val, _ = sess.run([accuracy, total_loss, opt], feed_dict={inputs: batch_images, labels: batch_labels, training:True})
        accuracy_vals.append(accuracy_val)
        loss_vals.append(loss_val)

    val_correct = []
    for i in range(0, image_val.shape[0], BS):
        batch_images, batch_labels = image_val[i:i+BS], label_val[i:i+BS]
        val_correct.extend( sess.run(correct, feed_dict={eval_inputs: batch_images, labels: batch_labels}) )
    print('[%3d] Accuracy: %0.3f  \t  Loss: %0.3f  \t  validation accuracy: %0.3f'%(epoch, np.mean(accuracy_vals), np.mean(loss_vals), np.mean(val_correct)))

## Part 3: Evaluation

### Compute the valiation accuracy

In [None]:
image_val, label_val = load('tux_val.dat')

print('Input shape: ' + str(image_val.shape))
print('Labels shape: ' + str(label_val.shape))

val_correct = []
for i in range(0, image_val.shape[0], BS):
    batch_images, batch_labels = image_val[i:i+BS], label_val[i:i+BS]
    val_correct.extend( sess.run(correct, feed_dict={eval_inputs: batch_images, labels: batch_labels}) )
print("ConvNet Validation Accuracy: ", np.mean(val_correct))

## Part 4: Save Model
Please note that we also want you to turn in your ipynb for this assignment.  Zip up the ipynb along with the tfg for your submission.

In [None]:
util.save('assignment6.tfg', session=sess)

### Part 5 (optional): See your model

In [None]:
# Show the current graph
util.show_graph(tf.get_default_graph().as_graph_def())