# MI-MVI tutorial 2 #

In the first tutorial, we introduced you to **Tensorflow**, a Deep Learning framework. You learned how to **define a computation graph**, and **create and initialize a Session**. Finally, you trained a simple classification model on a dataset of digits called **MNIST**.

In this tutorial, we will experiments with various methods for the training of neural networks. Furthermore, we will show you how to use some advanced Tensorflow features like saving and loading models as well as visualizing the training. Lastly, we will learn how to regularize a neural network.

We thank the authors of this [Udacity tutorial](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/udacity) which was the main inspiration for this tutorial. We have reused some of their code snippets.

## Part 1: the notMNIST dataset ##

![notMNIST example](images/notMNIST.png)

The [notMNIST dataset](http://yaroslavvb.blogspot.cz/2011/09/notmnist-dataset.html) created by [Yaroslav Bulatov](https://www.blogger.com/profile/06139256691290554110) contains pictures of **letters from A - J** gathered from a multitude of publicly available fonts. The letters are in the same fromat as MNIST - 28x28 grayscale pictures. The dataset is harder than MNIST but small enough to be used on a laptop making it a perfect dataset for small-scale experiments.

We will work only with letters A - D in order for our experiments to run faster. You can use *cv2-notmnist-prepare-data* to load all letters (you will need to make some changes) if you want to experiment with the whole dataset.

Download the preprocessed dataset from [this](https://drive.google.com/file/d/1FE7thwYRktH-D8wi_4dYXsGPNG_Qno10/view?usp=sharing) link and place it in the **data/notMNIST** directory. Alternatively, you can download and preprocess the dataset using the *cv2-notmnist-prepare-data* notebook.

**Import** Python modules we will use.

In [None]:
import os, pickle
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

You should have the preprocessed subset of notMNIST saved as a [pickle](). You can use the code snippet below to **load** it any time.

In [None]:
# load a subset of the notMNIST dataset
data_root = 'data/notMNIST'
pickle_file = os.path.join(data_root, 'notMNIST.pickle')

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  labels = save['labels']

  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

**Display** a couple of images so that we know what we are working with. If possible, always visualize your data preprocessing pipeline in order to spot bugs.

In [None]:
print("run again for different images")

from mpl_toolkits.axes_grid1 import ImageGrid

fig = plt.figure(1, figsize=(10, 10))
grid = ImageGrid(fig, 111, nrows_ncols=(1, 4), axes_pad=0.2)

for i in range(4):
    index = np.random.randint(0, train_dataset.shape[0])
    
    grid[i].imshow(train_dataset[index] / 255, interpolation="bilinear", cmap="gray")
    grid[i].tick_params(axis='both', which='both', bottom='off', top='off', 
                    labelbottom='off', right='off', left='off', labelleft='off')

    for key, value in labels.items():
        if value == train_labels[index]:
            print(key)
    
plt.show()

The individual pixels are encoded as grayscale values with integers in the range [0-255]. To help the classfier to converge, we **normalize** images so that they have zero mean and unit variance. 

**(Optional) Task**: Try skipping the normalization step or only subtracting means and see how it impacts the final performance (if the model trains at all).

In [None]:
mean = np.mean(train_dataset, axis=0)
std = np.std(train_dataset, axis=0)

train_dataset = (train_dataset - mean) / std
valid_dataset = (valid_dataset - mean) / std
test_dataset = (test_dataset - mean) / std

Notice that we compute means and standard deviations over the training set. As expected, the training set has zero mean and unit variance after normalization. However, the means and variances of the validation and testing set slightly deviate. 

**Question**: Why do we use means and standard deviations computed over the training set to normalize the validation and testing sets?

*Hint: generalization*.

In [None]:
print("statistics after normalization")
print("training mean:", np.round(np.mean(train_dataset), 5), ", variance:", 
                        np.round(np.var(train_dataset), 5))
print("validation mean:", np.round(np.mean(valid_dataset), 5), ", variance:", 
                          np.round(np.var(valid_dataset), 5))
print("testing mean:", np.round(np.mean(test_dataset), 5), ", variance:", 
                       np.round(np.var(test_dataset), 5))

**Shuffle** the pictures to make sure that each mini-batch is an **unbiased sample** of the training data.

In [None]:
# https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison
def unison_shuffle(a, b):
    assert len(a) == len(b)
    p = np.random.permutation(len(a))
    return a[p], b[p]

In [None]:
train_dataset, train_labels = unison_shuffle(train_dataset, train_labels)
valid_dataset, valid_labels = unison_shuffle(valid_dataset, valid_labels)
test_dataset, test_labels = unison_shuffle(test_dataset, test_labels)

## Part 2: Building a classification model ##

In this section, we will build a **neural network** for letter classification and train it with **batch gradient descent**, **stochastic gradient descent** and **mini-batch gradient descent**.

The dataset stores each image as a Tensor of rank two (width x height). However, a fully-connected (standard) neural network only accepts vectors (Tensors of rank 1). Therefore, the following cell **vectorizes each image in the dataset**.

![flatten image](images/flatten_image.png)

In [None]:
# vectorize each image
def maybe_vectorize(dataset):
  if len(dataset.shape) == 3:
    return np.reshape(dataset, (dataset.shape[0], dataset.shape[1] * dataset.shape[2]))
  else:
    return dataset

train_dataset = maybe_vectorize(train_dataset)
valid_dataset = maybe_vectorize(valid_dataset)
test_dataset = maybe_vectorize(test_dataset)

print('Training dataset shape:', train_dataset.shape)
print('Validation dataset shape:', valid_dataset.shape)
print('Test dataset shape:', test_dataset.shape)

Furthermore, the dataset labels (which record what letter is depicted on each image) are stores as integers where 0 represents letter A and 3 represents D. A neural network usually outputs a vector in which each element represents the probability that an input image belongs to a certain label. In order to train the neural network, you will need to compare the predicted probabilities with the correct label. To do this, it's convenient to turn each label into a vector with a one in the position that corresponds to its index and the rest set to zero. This is called **one-hot encoding**.

![one-hot encoding](images/one_hot_encoding.jpg)

In [None]:
# one-hot encode each label
def maybe_turn_to_one_hot(labels, num_labels=4):
  if len(labels.shape) == 1:
    one_hot = np.zeros((labels.shape[0], num_labels))
    one_hot[np.arange(len(labels)), labels] = 1
    return one_hot
  else:
    return labels

train_labels = maybe_turn_to_one_hot(train_labels)
valid_labels = maybe_turn_to_one_hot(valid_labels)
test_labels = maybe_turn_to_one_hot(test_labels)

print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', valid_labels.shape)
print('Test labels shape:', test_labels.shape)

### Fully-connected Neural Network ###

In the following series of tasks, you will implement a classfication model from scratch. We recommend you to use the Jupyter notebook from the first tutorial as a reference and try to implement your model based on that. Alternatively, you check the reference notebook which has all the solution (except for Part 4 which contains a bonus-point task) but you won't learn much by copying them. The tasks aren't graded.

### Task 1A ###

* define a computation graph for a **fully-connected neural network**
* the network should have **2 hidden layers** with **200 neurons in the first** and **100 neurons in the second** hidden layer
* **input**: vectorized images in the shape (num_images, 784)
* **output**: predictions in the shape (num_images, 4)

See the reference notebook for a solution.

### Task 1B ###

* train the neural network you defined above using **Batch Gradient Descent**
* during each learning step, the network makes predictions for all 18000 training images and learns from its mistakes
* evaluate the network on the notMNIST evaluation set


* recommended learning rate: 0.1
* recommended number of training steps: 60

See the reference notebook for a solution.

### Task 1C ###

* train the neural network you defined above using **Stochastic Gradient Descent**
* during each learning step, the network makes predictions for a single image and learns from its mistakes
* evaluate the network on the notMNIST evaluation set


* recommended learning rate: 0.01
* recommended number of training steps: 5000

See the reference notebook for a solution.

### Task 1D ###

* train the neural network you defined above using **Mini-batch Gradient Descent**
* during each learning step, the network makes predictions for a small batch of images and learns from its mistakes
* evaluate the network on the notMNIST evaluation set


* recommended mini-batch size: 64
* recommended learning rate: 0.05
* recommended number of training steps: 2000

See the reference notebook for a the solution.

## Part 3: Saving models and visualizing learning in Tensorflow ##

It is not very convenient to train your model each time you want to use it. On top of that, complex Computer Vision models take weeks to train.

In this section, you will learn how to save and load a Tensorflow model. In addition, you will visualize how the loss and accuracy changes during the training of your neural network using Tensorboard.

### Task 2A ###

Copy the definition of your neural network and add the following lines to the end of the code snippet. You might need to change the names of the variables or delete the second line if you haven't defined an operation that measures the accuracy of your model.

```
tf.summary.scalar('loss', loss)
tf.summary.scalar('accuracy', accuracy)
summaries = tf.summary.merge_all()
```

The first two lines create summary operations for your model's loss and accuracy which will be recorded during each training step. The third line groups the two summaries together so that you have a single operation that is easy to work with.

See the reference notebook for a the solution.

### Task 2B ###

Add these lines to the beginning of your training script.

```
saver = tf.train.Saver()
summary_writer = tf.summary.FileWriter('data', graph=tf.get_default_graph())
```

You can save a summary using `summary_writer.add_summary(summary, global_step=step)` and the model using `saver.save(session, os.path.join('data/notMNIST-model-1'), global_step=step)`.

See the reference notebook for a the solution.

You can **load** and evaluate your model using the following script. When saving a model, Tensorflow generates three different files (data, meta and index). To load a model, simply specify its path without the extensions.

In [None]:
path_to_your_model = None

if path_to_your_model is None:
  print("Please specify the path to your saved model.")
else:
  saver = tf.train.Saver()

  with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    saver.restore(session, path_to_your_model)

    validation_accuracy = session.run(accuracy, feed_dict={
      input_data: valid_dataset,
      input_labels: valid_labels
    })
    print('Validation accuracy:', validation_accuracy)

To visualize the training, call the following command.

```
tensorboard --logdir data
```

You must activate the virtual environment that contains your instalation of Tensorflow before you can call this command.

## Part 4: Regularization ##

As you might have noticed, some of the models you have trained earlier report training accuracy that is much higher than the validation or testing accuracy. This is due to the model **overfitting** on the training data. Overfitting can be mitigated using **regularization** techniques.

**Dropout** is a popular technique that prevents overfitting by dropping some of the activations of a particular layer. Take a look at [dropout in Tensorflow](https://www.tensorflow.org/api_docs/python/tf/layers/dropout). You can add it into your computation graph as a layer similarly to the Dense layer.

However, there is one more thing you need to do before you can use it. Dropout should drop activations only during training because we don't want to loose any information at random when we make predictions with the model. You will need to define a boolean Tensor that will tell the dropout layer if it's in the training or testing mode.

### Task 3 (bonus points) ###

Add a dropout layer **before** the last Dense layer and make sure that activations are dropped only during training. Try changing the drop probability in order to obtain the best validation accuracy. Use the neural network you have defined above and train it with mini-batch gradient descent. Note that a model with a stronger regularization might need more training steps to converge properly.

The solution will be added to the reference notebook after the end of this tutorial.

## Additional Resources   ##

** Saving and restoring models in Tensorfow **
* [tutorial](https://www.tensorflow.org/programmers_guide/saved_model)

** Visualizing learning using Tensorboard **
* [tutorial](https://www.tensorflow.org/get_started/summaries_and_tensorboard)

** Dropout **
* [documentation](https://www.tensorflow.org/api_docs/python/tf/layers/dropout)
* [paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)