# MI-MVI tutorial 2 #

<span style="font-size:larger;">In the previous two tutorial, we classified images with Fully-connected Neural Networks. While these networks can achieve satisfying results on simple datasets, their ability to model complex images is significantly limited.</span>

## Part 1: Data Preparation

![animals](images/animals.png)

<span style="font-size:larger;">We created a small dataset of **animals**. From top to bottom: cat, deer, dog and horse. The task is to train a Neural Network to distinguish between the four classes.</span>

<span style="font-size:larger;">**Import** packages we will need.</span>

In [None]:
import os, pickle
import numpy as np
import tensorflow as tf

<span style="font-size:larger;">**Load** pictures of cars and airplanes. The pictures are already prepared for you as [numpy arrays](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html).</span>

In [None]:
dataset_path = "data/animals/dataset.pickle"
    
with open(dataset_path, "rb") as file:
    dataset = pickle.load(file)

<span style="font-size:larger;">We will work with 8000 training, 1000 validation and 1000 testing pictures. All pictures contain color and span 32 x 32 pixels.</span>

In [None]:
print("train airplanes:", dataset["train_data"].shape)
print("valid airplanes:", dataset["valid_data"].shape)
print("test airplanes:", dataset["test_data"].shape)

<span style="font-size:larger;">**Normalize** images so that they have zero mean and unit variance.</span>

In [None]:
mean = np.mean(dataset["train_data"], axis=0)
std = np.std(dataset["train_data"], axis=0)

dataset["train_data"] = (dataset["train_data"] - mean) / std
dataset["valid_data"] = (dataset["valid_data"] - mean) / std
dataset["test_data"] = (dataset["test_data"] - mean) / std

<span style="font-size:larger;">Notice that we compute means and standard deviations over the training set. As expected, the training set has zero mean and unit variance after normalization. However, the means and variances of the validation and testing set sligtly deviate. Why do we use means and standard deviations computed over the training set to normalize the validation and testing sets? *Hint: generalization*.</span>

In [None]:
print("training mean:", np.round(np.mean(dataset["train_data"]), 5), ", variance:", 
                        np.round(np.var(dataset["train_data"]), 5))
print("validation mean:", np.round(np.mean(dataset["valid_data"]), 5), ", variance:", 
                          np.round(np.var(dataset["valid_data"]), 5))
print("testing mean:", np.round(np.mean(dataset["test_data"]), 5), ", variance:", 
                       np.round(np.var(dataset["test_data"]), 5))

<span style="font-size:larger;">**Shuffle** the pictures.</span>

In [None]:
# https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison
def unison_shuffle(a, b):
    assert len(a) == len(b)
    p = np.random.permutation(len(a))
    return a[p], b[p]

In [None]:
train_dataset, train_labels = unison_shuffle(dataset["train_data"], dataset["train_labels"])
valid_dataset, valid_labels = unison_shuffle(dataset["valid_data"], dataset["valid_labels"])
test_dataset, test_labels = unison_shuffle(dataset["valid_data"], dataset["test_labels"])

## Part 2: Convolutional Neural Networks ##

<span style="font-size:larger;">Fully-connected neural network are not appropriate for modelling images becuase they aren't invariant to translations and have too many weights. For these reasons, a different type of neural network was developed. Convolutional Neural Networks (ConvNets) use filters and max-pooling layers to keep the number of weights low and to learn to recognize object regardless of their position in the image. Moreover, they are easy to implement in Tensorflow</span>

<span style="font-size:larger;">Implement a simple ConvNet with convolutional, max-pooling and dense layers.</span>

<span style="font-size:larger;">Useful links:</span>
* [lecture notes Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [Tensorflow tutorial](https://www.tensorflow.org/tutorials/layers)

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [None]:
def maybe_turn_to_one_hot(labels, num_labels=4):
  if len(labels.shape) == 1:
    one_hot = np.zeros((labels.shape[0], num_labels))
    one_hot[np.arange(len(labels)), labels] = 1
    return one_hot
  else:
    return labels

train_labels = maybe_turn_to_one_hot(train_labels)
valid_labels = maybe_turn_to_one_hot(valid_labels)
test_labels = maybe_turn_to_one_hot(test_labels)

print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', valid_labels.shape)
print('Test labels shape:', test_labels.shape)

In [None]:
import tensorflow as tf

# TF remembers everything you defined, this will keep the computation graph clean
tf.reset_default_graph()   

learning_rate = 0.05

# placeholders for data, we will fill these using the feed dictionary during training
input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1], train_dataset.shape[2], 3))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))

# define convolutional and pooling layers
conv1 = tf.layers.conv2d(input_data, 16, (3, 3), (1, 1), activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, (2, 2), (2, 2))

conv2 = tf.layers.conv2d(pool1, 32, (3, 3), (1, 1), activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, (2, 2), (2, 2))


flattened = tf.contrib.layers.flatten(pool2)

# define fully-connected (or dense) layers
dense1 = tf.layers.dense(flattened, 50)
logits = tf.layers.dense(dense1, 4)

# define loss and training operation
batch_loss = tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits)
loss = tf.reduce_mean(batch_loss)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

# calcualte average accuracy over a batch of images
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

In [None]:
# settings
num_steps = 1000
mini_batch_size = 64
log_frequency = 100

# how many steps are in one epoch (epoch = one pass through the dataset)
# e.g. number of training samples = 50, mini batch size = 10 => steps per epoch = 5
steps_per_epoch = train_dataset.shape[0] // mini_batch_size

with tf.Session() as session:
    
  # initialize all parameters of the neural network
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
        
    # step number relative to the current epoch (epoch = one pass through the dataset)
    # e.g. number of training samples = 50, step = 60 => epoch step = 60 % 50 = 10
    epoch_step = step % steps_per_epoch
        
    # start and end index for the current minibatch
    # e.g. mini batch size = 64, start = 10, end = 74 => take all images from index 10 to 74
    start = epoch_step * mini_batch_size
    end = (epoch_step +  1) * mini_batch_size
    
    # if this is the first step in the current epoch, shuffle the training set
    # we do this so that the model does not overfit on individual minibatches
    if epoch_step == 0:
        print("epoch", step // steps_per_epoch)
        train_dataset, train_labels = unison_shuffle(train_dataset, train_labels)
    
    # run one step of mini-batch gradient descent
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap")
    })
    
    # sometimes print the current loss
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  
  # evaluate the model on the validation set  
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels
  })
    
  print('Validation accuracy', validation_accuracy, '.')

# we do not save the model so the parameters are forgotten right after the training finishes

## Part 3: Regularizing Convolutional Networks ##

<span style="font-size:larger;">All neural networks are prone to overfitting if the training dataset is too small. Implement dropout for your ConvNet. See the reference notebook for a solution.</span>

In [None]:
import tensorflow as tf
tf.reset_default_graph()   # TF remembers everything you defined, this will keep the computation graph clean

learning_rate = 0.05
dropout_prob = 0.5

input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1], 
                                         train_dataset.shape[2], train_dataset.shape[3]))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))
is_training = tf.placeholder(tf.bool)

conv1 = tf.layers.conv2d(input_data, 16, (3, 3), (1, 1), activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, (2, 2), (2, 2))

conv2 = tf.layers.conv2d(pool1, 32, (3, 3), (1, 1), activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, (2, 2), (2, 2))

flattened = tf.contrib.layers.flatten(pool2)

dense1 = tf.layers.dense(flattened, 100)
dropout = tf.layers.dropout(dense1, rate=dropout_prob, training=is_training)

logits = tf.layers.dense(dropout, 4)

batch_loss = tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits)
loss = tf.reduce_mean(batch_loss)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

In [None]:
# settings
num_steps = 1000
mini_batch_size = 64
log_frequency = 100

# how many steps are in one epoch (epoch = one pass through the dataset)
# e.g. number of training samples = 50, mini batch size = 10 => steps per epoch = 5
steps_per_epoch = train_dataset.shape[0] // mini_batch_size

with tf.Session() as session:
    
  # initialize all parameters of the neural network
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
        
    # step number relative to the current epoch (epoch = one pass through the dataset)
    # e.g. number of training samples = 50, step = 60 => epoch step = 60 % 50 = 10
    epoch_step = step % steps_per_epoch
        
    # start and end index for the current minibatch
    # e.g. mini batch size = 64, start = 10, end = 74 => take all images from index 10 to 74
    start = epoch_step * mini_batch_size
    end = (epoch_step +  1) * mini_batch_size
    
    # if this is the first step in the current epoch, shuffle the training set
    # we do this so that the model does not overfit on individual minibatches
    if epoch_step == 0:
        print("epoch", step // steps_per_epoch)
        train_dataset, train_labels = unison_shuffle(train_dataset, train_labels)
    
    # run one step of mini-batch gradient descent
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap"),
      is_training: True
    })
    
    # sometimes print the current loss
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  
  # evaluate the model on the validation set  
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels,
    is_training: False
  })
    
  print('Validation accuracy', validation_accuracy, '.')

# we do not save the model so the parameters are forgotten right after the training finishes

## Additional Resources   ##

** Saving and restoring models in Tensorfow **
* [tutorial](https://www.tensorflow.org/programmers_guide/saved_model)

** Visualizing learning using Tensorboard **
* [tutorial](https://www.tensorflow.org/get_started/summaries_and_tensorboard)

** Convolutional Networks **
* [lecture notes Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [lecture video from the University of Oxford](https://www.youtube.com/watch?v=bEUX_56Lojc)
* [Tensorflow tutorial](https://www.tensorflow.org/tutorials/layers)

** Dropout **
* [documentation](https://www.tensorflow.org/api_docs/python/tf/layers/dropout)
* [paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

** Advanced data loading features in Tensorflow **
* [tutorial](https://www.tensorflow.org/programmers_guide/datasets)


## Try out different dataset ##

* [Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats)
* [CIFAR-10](https://www.kaggle.com/c/cifar-10)