# MI-MVI tutorial 2 #

In the previous two tutorial, we classified images with fully-connected neural networks. While these networks can achieve satisfactory results on simple image datasets, their ability to model complex natural images is significantly limited.

Consider the following four pictures of cats from our dataset:

![cats](images/cats.png)

The cats occupy different parts of the images. This is problematic for a fully-connected neural network because it has an individual weight for each pixel of the image (more precisely 3 weights for the red, green and blue values). The set of weights that detects a cat in the center of the last image is independent from the weights that find the cat in the top-left corner of the 3rd image. Therefore, the fully-connected network needs to learn the same pattern several times for different locations.

We would like our neural network to detect an object regardless of its location in the image. The property is called **location invariance**.

**Convolutional Neural Networks** were developed specifically for image classification. They posses several desirable properties including location invariance.

## Part 1: Data Preparation

![animals](images/animals.png)

We created a small dataset of four **animals**: cats, deers, dogs and horses. The task in this tutorial is to distinguish between the four classes. We will tackle the classification problem with Convolutional Neural Networks.

All pictures are 32 x 32 pixels in size and are encoded in the RGB format.

The dataset is a subset of a larger dataset called [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html). We encourage you to experiment with the whole dataset after you complete this tutorial. The *cv3-animals-prepare-data* notebook will help you prepare the dataset (although you will need to change some code to prepare all 10 classes).

**Download** the dataset from this [link](https://drive.google.com/file/d/1_jWafqplCoV4OAC0hoQv5Ibxe69N7UjN/view?usp=sharing) and place it in the **data/animals** folder.

**Import** packages we need.

In [None]:
import os, pickle
import numpy as np
import tensorflow as tf

**Load** pictures of the animals. The pictures are already prepared for you in a Python [pickle](https://docs.python.org/3/library/pickle.html).

In [None]:
dataset_path = "data/animals/dataset.pickle"
    
with open(dataset_path, "rb") as file:
    dataset = pickle.load(file)
    
print("The following items were loaded:")
print(list(dataset.keys()))

For each class, we include 4500 training, 250 validation and 250 testing images. In total, there should be 18000 training, 1000 validation and 1000 testing images.

In [None]:
print("train data:", dataset["train_data"].shape)
print("validation data:", dataset["valid_data"].shape)
print("testing data:", dataset["test_data"].shape)

The individual pixels are encoded as RGB values with integers in the range [0-255]. To help the classfier to converge, we **normalize** images so that they have zero mean and unit variance. 

**(Optional) Task**: Try skipping the normalization step or only subtracting means and see how it impacts the final performance (if the model trains at all).

In [None]:
mean = np.mean(dataset["train_data"], axis=0)
std = np.std(dataset["train_data"], axis=0)

dataset["train_data"] = (dataset["train_data"] - mean) / std
dataset["valid_data"] = (dataset["valid_data"] - mean) / std
dataset["test_data"] = (dataset["test_data"] - mean) / std

Notice that we compute means and standard deviations over the training set. As expected, the training set has zero mean and unit variance after normalization. However, the means and variances of the validation and testing set slightly deviate. 

**Question**: Why do we use means and standard deviations computed over the training set to normalize the validation and testing sets?

*Hint: generalization*.

In [None]:
print("dataset statistics")
print("training mean:", np.round(np.mean(dataset["train_data"]), 5), ", variance:", 
                        np.round(np.var(dataset["train_data"]), 5))
print("validation mean:", np.round(np.mean(dataset["valid_data"]), 5), ", variance:", 
                          np.round(np.var(dataset["valid_data"]), 5))
print("testing mean:", np.round(np.mean(dataset["test_data"]), 5), ", variance:", 
                       np.round(np.var(dataset["test_data"]), 5))

**Shuffle** the pictures to make sure that each mini-batch is an **unbiased sample** of the training data.

In [None]:
# https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison
def unison_shuffle(a, b):
    assert len(a) == len(b)
    p = np.random.permutation(len(a))
    return a[p], b[p]

In [None]:
train_dataset, train_labels = unison_shuffle(dataset["train_data"], dataset["train_labels"])
valid_dataset, valid_labels = unison_shuffle(dataset["valid_data"], dataset["valid_labels"])
test_dataset, test_labels = unison_shuffle(dataset["valid_data"], dataset["test_labels"])

Finally, the labels of images are encoded as integers (i.e. 0 = cat, 1 = deer, etc.). However, our Neural Network will output a vector of probabilities. These vectors will be compared with the ground-truth classes to compute the loss. For convenience, we turn the labels into **one-hot vectors**.

In [None]:
def maybe_turn_to_one_hot(labels, num_labels=4):
  if len(labels.shape) == 1:
    one_hot = np.zeros((labels.shape[0], num_labels))
    one_hot[np.arange(len(labels)), labels] = 1
    return one_hot
  else:
    return labels

train_labels = maybe_turn_to_one_hot(train_labels)
valid_labels = maybe_turn_to_one_hot(valid_labels)
test_labels = maybe_turn_to_one_hot(test_labels)

print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', valid_labels.shape)
print('Test labels shape:', test_labels.shape)

## Part 2: Convolutional Neural Networks ##

Convolutional Neural Networks (ConvNets) extend fully-connected networks with convolutional and pooling layers.

![convolutional network](images/convolutional_network.png)

[source of image](http://deeplearning.net/tutorial/lenet.html)

A common ConvNets contains several sets of convolutional layers interleaved with pooling layers followed by one or two fully-connected layers. The convolutional and pooling layers take care of detecting descriptive features in the image, whereas the fully-connected layers carry out the final classification.


The following section describes convolutional and pooling layers and explains how we can implement them in Tensorflow.

### Convolutional Layer ###

Convolutional Layer consists of several filters that are convolved over the image. Each filter is a small Tensor of weights. The convolutional operation involves multiplying the set of weight with an image patch at each location, summing the values of the resulting Tensor and saving the final value into the output Tensor.

![convolutional layer](images/convolutional_layer.jpeg)

[source of image](http://cs231n.github.io/convolutional-networks/)

Convolutional Layer can be added into the Tensorflow computational graph wth [tf.layers.conv2d](https://www.tensorflow.org/api_docs/python/tf/layers/conv2d).

**Sample code**

The following snippet creates a Convolutional Layer with 16 filters. Each filter is of size 3 x 3. A [ReLU](https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning) activation function is applied after the layer, so that the model can learn non-linear mappings.

```python
conv1 = tf.layers.conv2d(input_data, 16, (3, 3), (1, 1), activation=tf.nn.relu)
```

### Pooling Layer ###

Pooling layer applies a pooling function (often maximum or average) over each location of each feature map of the input. It is usually used to downsample the input by keeping only the maximum value of each patch of the input, hence **max-pooling**.

![pooling layer](images/pooling_layer.jpeg)

[source of image](http://cs231n.github.io/convolutional-networks/)

Add Max-Pooling Layer to the Tensorflow graph with [tf.layers.MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/layers/MaxPooling2D).

** Sample code **

The following snippet defines a Max-Pooling Layer that downsamples the input by a factor of 2.

```python
pool1 = tf.layers.max_pooling2d(conv1, (2, 2), (2, 2))
```

### Task 1 (bonus points) ###

Implement and train a Convolutional Neural Network on the animals dataset. There are many degrees of freedom when we create a ConvNet. We can choose the number of convolutions, where to put max-pooling layers, how many filters each convolutional layer has and so on. You can either come up with your own ConvNet or follow our recipe:

* Conv1 : 16 3x3 filters with stride 1
* Pool1 : downsample the input by a factor of 2
* Conv2 : 32 3x3 filters with stride 1
* Pool2 : downsample the input by a factor of 2
* Flatten : vectorize the input so that we can pass it into a fully-connected layer
* Dense1 : a fully-connected layer with 50 neurons
* Logits : a fully-connected layer with 4 neurons: each outputs the score for one class in the dataset

Useful links:
* [lecture notes from Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [Tensorflow tutorial on Convolutional Networks](https://www.tensorflow.org/tutorials/layers)

See the reference notebook for a solution.

In [None]:
import tensorflow as tf

# TF remembers everything you defined, this will keep the computation graph clean
tf.reset_default_graph()   

learning_rate = 0.05

# placeholders for data, we will fill these using the feed dictionary during training
input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1], train_dataset.shape[2], 3))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))

# define convolutional and pooling layers
conv1 = tf.layers.conv2d(input_data, 16, (3, 3), (1, 1), activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, (2, 2), (2, 2))

conv2 = tf.layers.conv2d(pool1, 32, (3, 3), (1, 1), activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, (2, 2), (2, 2))

flattened = tf.contrib.layers.flatten(pool2)

# define fully-connected (or dense) layers
dense1 = tf.layers.dense(flattened, 50)
logits = tf.layers.dense(dense1, 4)

# define loss and training operation
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits))
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

# calcualte average accuracy over a batch of images
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

In [None]:
# settings
num_steps = 1000
mini_batch_size = 64
log_frequency = 100

# how many steps are in one epoch (epoch = one pass through the dataset)
# e.g. number of training samples = 50, mini batch size = 10 => steps per epoch = 5
steps_per_epoch = train_dataset.shape[0] // mini_batch_size

with tf.Session() as session:
    
  # initialize all parameters of the neural network
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
        
    # step number relative to the current epoch (epoch = one pass through the dataset)
    # e.g. number of training samples = 50, step = 60 => epoch step = 60 % 50 = 10
    epoch_step = step % steps_per_epoch
        
    # start and end index for the current minibatch
    # e.g. mini batch size = 64, start = 10, end = 74 => take all images from index 10 to 74
    start = epoch_step * mini_batch_size
    end = (epoch_step +  1) * mini_batch_size
    
    # if this is the first step in the current epoch, shuffle the training set
    # we do this so that the model does not overfit on individual minibatches
    if epoch_step == 0:
        print("epoch", step // steps_per_epoch)
        train_dataset, train_labels = unison_shuffle(train_dataset, train_labels)
    
    # run one step of mini-batch gradient descent
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap")
    })
    
    # sometimes print the current loss
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  
  # evaluate the model on the validation set  
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels
  })
    
  print('Validation accuracy', validation_accuracy, '.')

# we do not save the model so the parameters are forgotten right after the training finishes

## Part 3: Regularizing Convolutional Networks ##

All neural networks are prone to overfitting if the training dataset is too small or the number of parameters too high.

Dropout Layer is a simple but effective regularization technique. During training, certain percentage (dropout_prob) of inputs are set to 0 and the rest is scaled by 1 / dropout_prob, to keep the sum of the inputs unchanged. During testing, the layers does not do anything.

Include dropout with [tf.layers.dropout](https://www.tensorflow.org/api_docs/python/tf/layers/dropout).



### Task 2 (bonus points) ###

Implement dropout for your ConvNet. See the reference notebook for a solution.

** Sample code **

The following snippet defines a Dropout Layer that drops 50% of the input values. The is_training placeholder decidedsif we are in training or testing mode and should be specified using the feed dictionary.

```python
is_training = tf.placeholder(tf.bool)
dropout = tf.layers.dropout(dense1, rate=0.5, training=is_training)
```

### (Optional) Task 3 ###

So far, we only evaluated our models on the validation set. After you are confident with your choice of the ConvNet architecture and hyperparameters, you should evaluate the model on the testing set to get a more accurate estimate of the model's accuracy. The hyperparameters you chose might work particularly well for the images in the validation set but fail when classifying the test set.

Evaluate your model on the testing set and compare the accuracy to the one computed over the validation set.

## Additional Resources   ##

** Convolutional Networks **
* [lecture notes from the Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [lecture video from the University of Oxford](https://www.youtube.com/watch?v=bEUX_56Lojc)
* [Tensorflow tutorial](https://www.tensorflow.org/tutorials/layers)

** Dropout **
* [documentation](https://www.tensorflow.org/api_docs/python/tf/layers/dropout)
* [paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

** Advanced data loading features in Tensorflow **
* [tutorial](https://www.tensorflow.org/programmers_guide/datasets)


## Other dataset to try ##

* [Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats)
* [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
* [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html)