# MI-MVI tutorial 2 #

<span style="font-size:larger;">In the first tutorial, we introduced you to **Tensorflow**, a Deep Learning framework. You learned how to **define a computation graph**, and **create and initialize a Session**. Finally, you trained a simple classification model on a dataset of digits called **MNIST**.</span>

<span style="font-size:larger;">In this tutorial, you will download and preprocess a new dataset from scratch. Furthermore, we will show you how to use some advanced Tensorflow features like saving and loading models as well as visualizing the training. Lastly, you will experiment with various neural network architectures to obtain a good classifier.</span>

<span style="font-size:larger;">We thank the authors of this [Udacity tutorial](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/udacity) which was the main inspiration for this tutorial. We have reused some of their code snippets.</span>

## Part 1: the notMNIST dataset

![notMNIST example](images/notMNIST.png)

<span style="font-size:larger;">The [notMNIST](http://yaroslavvb.blogspot.cz/2011/09/notmnist-dataset.html) created by [Yaroslav Bulatov](https://www.blogger.com/profile/06139256691290554110) contains pictures of **letters from A - J** created from a multitude of publicly available fonts. The letters are in the same fromat as MNIST - 28x28 grayscale pictures. The dataset is harder than MNIST but small enough to be used on a laptop making it a perfect dataset for small-scale experiments.</span>

<span style="font-size:larger;">Let's start by downloading and preprocessing the dataset. By the end of this section, you should have a file containing 110000 preprocessed pictures of letters</span>

<span style="font-size:larger;">**Import** all packages that will be used.</span>

In [None]:
import os, sys, tarfile
from six.moves.urllib.request import urlretrieve
import numpy as np
from scipy import ndimage
from six.moves import cPickle as pickle
from IPython.display import display, Image

<span style="font-size:larger;">**Download** the dataset.</span>

In [None]:
url = 'https://commondatastorage.googleapis.com/books1000/'
last_percent_reported = None
data_root = 'data/notMNIST'

# make sure the dataset directory exists
if not os.path.isdir(data_root):
  os.makedirs(data_root)

def download_progress_hook(count, blockSize, totalSize):
  """A hook to report the progress of a download. This is mostly intended for users with
  slow internet connections. Reports every 5% change in download progress.
  """
  global last_percent_reported
  percent = int(count * blockSize * 100 / totalSize)

  if last_percent_reported != percent:
    if percent % 5 == 0:
      sys.stdout.write("%s%%" % percent)
      sys.stdout.flush()
    else:
      sys.stdout.write(".")
      sys.stdout.flush()
      
    last_percent_reported = percent
        
def maybe_download(filename, expected_bytes, force=False):
  """Download a file if not present, and make sure it's the right size."""
  dest_filename = os.path.join(data_root, filename)
  if force or not os.path.exists(dest_filename):
    print('Attempting to download:', filename) 
    filename, _ = urlretrieve(url + filename, dest_filename, reporthook=download_progress_hook)
    print('\nDownload Complete!')
  statinfo = os.stat(dest_filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified', dest_filename)
  else:
    raise Exception(
      'Failed to verify ' + dest_filename + '. Can you get to it with a browser?')
  return dest_filename

train_filename = maybe_download('notMNIST_large.tar.gz', 247336696)
test_filename = maybe_download('notMNIST_small.tar.gz', 8458043)

<span style="font-size:larger;">The dataset was downloaded as two tarballs. **Extract** both of them.</span>

In [None]:
num_classes = 10
np.random.seed(133)

def maybe_extract(filename, force=False):
  root = os.path.splitext(os.path.splitext(filename)[0])[0]  # remove .tar.gz
  if os.path.isdir(root) and not force:
    # You may override by setting force=True.
    print('%s already present - Skipping extraction of %s.' % (root, filename))
  else:
    print('Extracting data for %s. This may take a while. Please wait.' % root)
    tar = tarfile.open(filename)
    sys.stdout.flush()
    tar.extractall(data_root)
    tar.close()
  data_folders = [
    os.path.join(root, d) for d in sorted(os.listdir(root))
    if os.path.isdir(os.path.join(root, d))]
  if len(data_folders) != num_classes:
    raise Exception(
      'Expected %d folders, one per class. Found %d instead.' % (
        num_classes, len(data_folders)))
  print(data_folders)
  return data_folders
  
train_folders = maybe_extract(train_filename)
test_folders = maybe_extract(test_filename)

<span style="font-size:larger;">**Load all images** and create a single Tensor for each letter. For example, there are about 53000 pictures of letter A in the dataset from which we will choose 10000 - the script will create a single Tensor of dimensions (10000, 28, 28), where 28 is both the width and height of each image. Due to memory constraints, we will save each Tensor into a [pickle](https://docs.python.org/3/library/pickle.html).</span>

In [None]:
image_size = 28                             # pixel width and height
pixel_depth = 255.0                         # number of levels per pixel

max_training_images_per_class = 10000       # maximum number of training images to load per class
min_training_images_per_class = 10000       # minimum number of training images to load per class

max_testing_images_per_class = 2000         # maximum number of testing images to load per class
min_testing_images_per_class = 1800         # minimum number of testing images to load per class

def load_letter(folder, max_images, min_images):
  """Load the data for a single letter label."""
  image_files = os.listdir(folder)
  max_images = min(max_images, len(image_files))
  dataset = np.ndarray(shape=(max_images, image_size, image_size),
                         dtype=np.float32)

  num_images = 0
  for image in image_files:
    
    if num_images >= max_images:
      break
    
    image_file = os.path.join(folder, image)
    try:
      image_data = (ndimage.imread(image_file).astype(float) - 
                    pixel_depth / 2) / pixel_depth
      if image_data.shape != (image_size, image_size):
        raise Exception('Unexpected image shape: %s' % str(image_data.shape))
      dataset[num_images, :, :] = image_data
      num_images = num_images + 1
    except IOError as e:
      pass
      # print('Could not read:', image_file, ':', e, '- it\'s ok, skipping.')
    
  dataset = dataset[0:num_images, :, :]
  if num_images < min_images:
    raise Exception('Many fewer images than expected: %d < %d' %
                    (num_images, min_images))
    
  print('Full dataset tensor:', dataset.shape)
  print()
  # print('Mean:', np.mean(dataset))
  # print('Standard deviation:', np.std(dataset))
  return dataset
        
def maybe_pickle(data_folders, max_images, min_images, force=False):
  dataset_names = []
  for folder in data_folders:
    set_filename = folder + '.pickle'
    dataset_names.append(set_filename)
    if os.path.exists(set_filename) and not force:
      # You may override by setting force=True.
      print('%s already present - Skipping pickling.' % set_filename)
    else:
      print('Pickling %s.' % set_filename)
      dataset = load_letter(folder, max_images, min_images)
      try:
        with open(set_filename, 'wb') as f:
          pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
      except Exception as e:
        print('Unable to save data to', set_filename, ':', e)
  
  return dataset_names

train_datasets = maybe_pickle(train_folders, max_training_images_per_class, min_training_images_per_class)
test_datasets = maybe_pickle(test_folders, max_testing_images_per_class, min_testing_images_per_class)

<span style="font-size:larger;">Finally, **create a subset** for training, validation and testing, and **save** the preprocessed dataset.</span>

In [None]:
# create training, validation and testing splits
def make_arrays(nb_rows, img_size):
  if nb_rows:
    dataset = np.ndarray((nb_rows, img_size, img_size), dtype=np.float32)
    labels = np.ndarray(nb_rows, dtype=np.int32)
  else:
    dataset, labels = None, None
  return dataset, labels

def merge_datasets(pickle_files, train_size, valid_size=0):
  num_classes = len(pickle_files)
  valid_dataset, valid_labels = make_arrays(valid_size, image_size)
  train_dataset, train_labels = make_arrays(train_size, image_size)
  vsize_per_class = valid_size // num_classes
  tsize_per_class = train_size // num_classes
    
  start_v, start_t = 0, 0
  end_v, end_t = vsize_per_class, tsize_per_class
  end_l = vsize_per_class+tsize_per_class
  for label, pickle_file in enumerate(pickle_files):       
    try:
      with open(pickle_file, 'rb') as f:
        letter_set = pickle.load(f)
        # let's shuffle the letters to have random validation and training set
        np.random.shuffle(letter_set)
        if valid_dataset is not None:
          valid_letter = letter_set[:vsize_per_class, :, :]
          valid_dataset[start_v:end_v, :, :] = valid_letter
          valid_labels[start_v:end_v] = label
          start_v += vsize_per_class
          end_v += vsize_per_class
                    
        train_letter = letter_set[vsize_per_class:end_l, :, :]
        train_dataset[start_t:end_t, :, :] = train_letter
        train_labels[start_t:end_t] = label
        start_t += tsize_per_class
        end_t += tsize_per_class
    except Exception as e:
      print('Unable to process data from', pickle_file, ':', e)
      raise
    
  return valid_dataset, valid_labels, train_dataset, train_labels
            
            
train_size = 90000
valid_size = 10000
test_size = 10000

valid_dataset, valid_labels, train_dataset, train_labels = merge_datasets(
  train_datasets, train_size, valid_size)
_, _, test_dataset, test_labels = merge_datasets(test_datasets, test_size)

print('Training:', train_dataset.shape, train_labels.shape)
print('Validation:', valid_dataset.shape, valid_labels.shape)
print('Testing:', test_dataset.shape, test_labels.shape)

In [None]:
# shuffle the dataset
def randomize(dataset, labels):
  permutation = np.random.permutation(labels.shape[0])
  shuffled_dataset = dataset[permutation,:,:]
  shuffled_labels = labels[permutation]
  return shuffled_dataset, shuffled_labels
train_dataset, train_labels = randomize(train_dataset, train_labels)
test_dataset, test_labels = randomize(test_dataset, test_labels)
valid_dataset, valid_labels = randomize(valid_dataset, valid_labels)

In [None]:
# save the dataset as a pickle
pickle_file = os.path.join(data_root, 'notMNIST.pickle')

try:
  f = open(pickle_file, 'wb')
  save = {
    'train_dataset': train_dataset,
    'train_labels': train_labels,
    'valid_dataset': valid_dataset,
    'valid_labels': valid_labels,
    'test_dataset': test_dataset,
    'test_labels': test_labels,
    }
  pickle.dump(save, f, pickle.HIGHEST_PROTOCOL)
  f.close()
except Exception as e:
  print('Unable to save data to', pickle_file, ':', e)
  raise
    
statinfo = os.stat(pickle_file)
print('Compressed pickle size:', statinfo.st_size)

<span style="font-size:larger;">Alternatively, you can download the preprocessed dataset from [this](https://drive.google.com/file/d/0BwaX_s62pOTnUkxlbHh3Zzl4UlU/view?usp=sharing) link and place it in the **data/notMNIST** directory.</span>

## Part 2: Building a classification model ##

<span style="font-size:larger;">In this section, you will build a **neural network** for letter classification and train it with **batch gradient descent**, **stochastic gradient descent** and **mini-batch gradient descent**.</span>

<span style="font-size:larger;">You should have the preprocessed subset of notMNIST saved as a [pickle](). You can use the code snippet below to **load** it any time.</span>

In [None]:
# load a subset of the notMNIST dataset
data_root = 'data/notMNIST'
pickle_file = os.path.join(data_root, 'notMNIST.pickle')

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

<span style="font-size:larger;">The dataset stores each image as a Tensor of rank two. However, a fully-connected (standard) neural network only accepts vectors (Tensors of rank 1). Therefore, the following cell **vectorizes each image in the dataset**.</span>

![flatten image](images/flatten_image.png)

In [None]:
# vectorize each image
def maybe_vectorize(dataset):
  if len(dataset.shape) == 3:
    return np.reshape(dataset, (dataset.shape[0], dataset.shape[1] * dataset.shape[2]))
  else:
    return dataset

train_dataset = maybe_vectorize(train_dataset)
valid_dataset = maybe_vectorize(valid_dataset)
test_dataset = maybe_vectorize(test_dataset)

print('Training dataset shape:', train_dataset.shape)
print('Validation dataset shape:', valid_dataset.shape)
print('Test dataset shape:', test_dataset.shape)

<span style="font-size:larger;">Furthermore, the dataset labels (which record what letter is depicted on each image) are stores as integers where 0 represents letter A and 9 represents J. A neural network usually outputs a vector in which each element represents the probability that an input image belongs to a certain label. In order to train the neural network, you will need to compare the predicted probabilities with the correct label. To do this, it's convenient to turn each label into a vector with a one in the position that corresponds to its index and the rest set to zero. This is called **one-hot encoding**.</span>

![one-hot encoding](images/one_hot_encoding.jpg)

In [None]:
# one-hot encode each label
def maybe_turn_to_one_hot(labels, num_labels=10):
  if len(labels.shape) == 1:
    one_hot = np.zeros((labels.shape[0], num_labels))
    one_hot[np.arange(len(labels)), labels] = 1
    return one_hot
  else:
    return labels

train_labels = maybe_turn_to_one_hot(train_labels)
valid_labels = maybe_turn_to_one_hot(valid_labels)
test_labels = maybe_turn_to_one_hot(test_labels)

print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', valid_labels.shape)
print('Test labels shape:', test_labels.shape)

### Fully-connected Neural Network ###

<span style="font-size:larger;">In the following series of tasks, you will implement a classfication model from scratch. We recommend you to use the Jupyter notebook from the first tutorial as a reference and try to implement your model based on that. Alternatively, you check the reference notebook which has all the solution (except for Part 4 which contains a bonus-point task) but you won't learn much by copying them. The tasks aren't graded.</span>

### Task 1A ###

* <span style="font-size:larger;">define a computation graph for a **fully-connected neural network**</span>
* <span style="font-size:larger;">the network should have **2 hidden layers** with **200 neurons in the first** and **100 neurons in the second** hidden layer</span>
* <span style="font-size:larger;">**input**: vectorized images in the shape (num_images, 784)</span>
* <span style="font-size:larger;">**output**: predictions in the shape (num_images, 10)</span>

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [None]:
# TODO

### Task 1B ###

* <span style="font-size:larger;">train the neural network you defined above using **Batch Gradient Descent**</span>
* <span style="font-size:larger;">during each learning step, the network makes predictions for all 90000 training images and learns from its mistakes</span>
* <span style="font-size:larger;">evaluate the network on the notMNIST evaluation set</span>


* <span style="font-size:larger;">recommended learning rate: 0.1</span>
* <span style="font-size:larger;">recommended number of training steps: 60</span>

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [None]:
# TODO

### Task 1C ###

* <span style="font-size:larger;">train the neural network you defined above using **Stochastic Gradient Descent**</span>
* <span style="font-size:larger;">during each learning step, the network makes predictions for a single image and learns from its mistakes</span>
* <span style="font-size:larger;">evaluate the network on the notMNIST evaluation set</span>


* <span style="font-size:larger;">recommended learning rate: 0.01</span>
* <span style="font-size:larger;">recommended number of training steps: 20000</span>

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [None]:
# TODO

### Task 1D ###

* <span style="font-size:larger;">train the neural network you defined above using **Mini-batch Gradient Descent**</span>
* <span style="font-size:larger;">during each learning step, the network makes predictions for a small batch of images and learns from its mistakes</span>
* <span style="font-size:larger;">evaluate the network on the notMNIST evaluation set</span>


* <span style="font-size:larger;">recommended mini-batch size: 64</span>
* <span style="font-size:larger;">recommended learning rate: 0.05</span>
* <span style="font-size:larger;">recommended number of training steps: 5000</span>

<span style="font-size:larger;">See the reference notebook a the solution.</span>

In [None]:
# TODO

## Part 3: Saving models and visualizing learning in Tensorflow ##

<span style="font-size:larger;">It isn't very convenient to train your model each time you want to use it. On top of that, more complex Computer Vision models take weeks to train.</span>

<span style="font-size:larger;">In this section, you will learn how to save and load a Tensorflow model. In addition, you will visualize how the loss and accuracy changes during the training of your neural network using Tensorboard.</span>

### Task 2A ###

<span style="font-size:larger;">Copy the definition of your neural network and add the following lines to the end of the code snippet. You might need to change the names of the variables or delete the second line if you haven't defined an operation that measures the accuracy of your model.</span>

```
<span style="font-size:larger;">tf.summary.scalar('loss', loss)</span>
<span style="font-size:larger;">tf.summary.scalar('accuracy', accuracy)</span>
<span style="font-size:larger;">summaries = tf.summary.merge_all()</span>
```

<span style="font-size:larger;">The first two lines create summary operations for your model's loss and accuracy which will be recorded during each training step. The third line groups the two summaries together so that you have a single operation that is easy to work with.</span>

<span style="font-size:larger;">See the reference notebook a the solution.</span>

In [None]:
# TODO

### Task 2B ###

<span style="font-size:larger;">Add these lines to the beginning of your trainig script.</span>

```
saver = tf.train.Saver()
summary_writer = tf.summary.FileWriter('data', graph=tf.get_default_graph())
```

<span style="font-size:larger;">You can save a summary using `summary_writer.add_summary(summary, global_step=step)` and the model using `saver.save(session, os.path.join('data/notMNIST-model-1'), global_step=step)`.</span>

<span style="font-size:larger;">See the reference notebook a the solution.</span>

In [None]:
# TODO

<span style="font-size:larger;">You can **load** and evaluate your model using the following script. When saving a model, Tensorflow generates three different files (data, meta and index). To load a model, simply specify its path without the extension.</span>

In [None]:
path_to_your_model = None

if path_to_your_model is None:
  print("Please specify the path to your saved model.")
else:
  saver = tf.train.Saver()

  with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    saver.restore(session, path_to_your_model)

    validation_accuracy = session.run(accuracy, feed_dict={
      input_data: valid_dataset,
      input_labels: valid_labels
    })
    print('Validation accuracy:', validation_accuracy)

<span style="font-size:larger;">To visualize the training, call the following command.</span>

```
tensorboard --logdir data
```

<span style="font-size:larger;">You must activate the virtual environment that contains your instalation of Tensorflow before you can call this command.</span>

## Part 4: Regularization ##

<span style="font-size:larger;">As you might have noticed, some of the models you have trained earlier report training accuracy that is much higher than the validation or testing accuracy. This is due to the model **overfitting** on the training data. Overfitting can be mitigated using **regularization** techniques.</span>

<span style="font-size:larger;">**Dropout** is a popular technique that prevents overfitting by dropping some of the activation of a particular layer. Take a look at [dropout in Tensorflow](https://www.tensorflow.org/api_docs/python/tf/layers/dropout). You can add it into your computation graph as a layer similarly to the Dense layer.</span> 

<span style="font-size:larger;">However, there is one more thing you need to do before you can use it. Dropout should drop the activations only during training because we don't want to loose any information when we use the model. You will need to define a boolean Tensor that will tell the dropout layer if it's in the training or testing mode.</span>

### Task 3 (bonus points) ###

<span style="font-size:larger;">Add a dropout layer **before** the last Dense layer and make sure that activations are dropped only during training. Try changing the drop probability in order to obtain the best validation accuracy. Use the neural network you have defined above and train it with mini-batch gradient descent.</span>

<span style="font-size:larger;">The solution will be added to the reference notebook after the end of this tutorial.</span>

In [None]:
# TODO: computation graph definition

In [None]:
# TODO: training

## (Optional) Part 5: Convolutional Neural Networks ##

<span style="font-size:larger;">Fully-connected neural network are not appropriate for modelling images becuase they aren't invariant to translations and have too many weights. For these reasons, a different type of neural network was developed. Convolutional Neural Networks (ConvNets) use filters and max-pooling layers to keep the number of weights low and to learn to recognize object regardless of their position in the image. Moreover, they are easy to implement in Tensorflow</span>

<span style="font-size:larger;">Implement a simple ConvNet with convolutional, max-pooling and dense layers.</span>

<span style="font-size:larger;">Useful links:</span>
* [lecture notes Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [Tensorflow tutorial](https://www.tensorflow.org/tutorials/layers)

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [None]:
# load a subset of the notMNIST dataset
pickle_file = os.path.join(data_root, 'notMNIST.pickle')

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

In [None]:
def maybe_add_channels_dimension(dataset):
  if len(dataset.shape) == 3:
    return np.expand_dims(dataset, axis=-1)
  else:
    return dataset

train_dataset = maybe_add_channels_dimension(train_dataset)
valid_dataset = maybe_add_channels_dimension(valid_dataset)
test_dataset = maybe_add_channels_dimension(test_dataset)

print('Training dataset shape:', train_dataset.shape)
print('Validation dataset shape:', valid_dataset.shape)
print('Test dataset shape:', test_dataset.shape)

In [None]:
def maybe_turn_to_one_hot(labels, num_labels=10):
  if len(labels.shape) == 1:
    one_hot = np.zeros((labels.shape[0], num_labels))
    one_hot[np.arange(len(labels)), labels] = 1
    return one_hot
  else:
    return labels

train_labels = maybe_turn_to_one_hot(train_labels)
valid_labels = maybe_turn_to_one_hot(valid_labels)
test_labels = maybe_turn_to_one_hot(test_labels)

print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', valid_labels.shape)
print('Test labels shape:', test_labels.shape)

In [None]:
# TODO: computation graph definition

In [None]:
# TODO: training

## (Optional) Part 6: Regularizing ConvNets ##

<span style="font-size:larger;">All neural networks are prone to overfitting if the training dataset is too small. Implement dropout for your ConvNet. See the reference notebook for a solution.</span>

In [None]:
# TODO: computation graph definition

In [None]:
# TODO: training

## Additional Resources   ##

** Saving and restoring models in Tensorfow **
* [tutorial](https://www.tensorflow.org/programmers_guide/saved_model)

** Visualizing learning using Tensorboard **
* [tutorial](https://www.tensorflow.org/get_started/summaries_and_tensorboard)

** Convolutional Networks **
* [lecture notes Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [lecture video from the University of Oxford](https://www.youtube.com/watch?v=bEUX_56Lojc)
* [Tensorflow tutorial](https://www.tensorflow.org/tutorials/layers)

** Dropout **
* [documentation](https://www.tensorflow.org/api_docs/python/tf/layers/dropout)
* [paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

** Advanced data loading features in Tensorflow **
* [tutorial](https://www.tensorflow.org/programmers_guide/datasets)


## Try out different dataset ##

* [Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats)
* [CIFAR-10](https://www.kaggle.com/c/cifar-10)