# MI-MVI tutorial 2 #

<span style="font-size:larger;">In the first tutorial, we introduced you to **Tensorflow**, a Deep Learning framework. You learned how to **define a computation graph**, and **create and initialize Sessions**. Finally, you trained a simple classification model on a dataset of hand-written digits called **MNIST**.</span>

<span style="font-size:larger;">In this tutorial, you will download and preprocess a new dataset from scratch. Furthermore, we will show you how to use some advanced Tensorflow features like saving and loading models as well as visualizing the training. Lastly, you will experiment with various neural network architectures to obtain a good classifier.</span>

## Part 1: Data Preparation

![animals](images/animals.png)

<span style="font-size:larger;">We create a small dataset of **animals**. From top to bottom: cat, deer, dog and horse. The task is to train a Neural Network to distinguish between the four classes.</span>

<span style="font-size:larger;">**Import** packages we will need.</span>

In [2]:
import os, pickle
import numpy as np
import tensorflow as tf

<span style="font-size:larger;">**Load** pictures of cars and airplanes. The pictures are already prepared for you as [numpy arrays](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html).</span>

In [75]:
dataset_path = "data/animals/dataset.pickle"
    
with open(dataset_path, "rb") as file:
    dataset = pickle.load(file)

<span style="font-size:larger;">We will work with 8000 training, 1000 validation and 1000 testing pictures. All pictures contain color and span 32 x 32 pixels.</span>

In [76]:
print("train airplanes:", dataset["train_data"].shape)
print("valid airplanes:", dataset["valid_data"].shape)
print("test airplanes:", dataset["valid_data"].shape)

train airplanes: (8000, 32, 32, 3)
valid airplanes: (1000, 32, 32, 3)
test airplanes: (1000, 32, 32, 3)


<span style="font-size:larger;">**Shuffle** the pictures.</span>

In [77]:
# https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison
def unison_shuffle(a, b):
    assert len(a) == len(b)
    p = np.random.permutation(len(a))
    return a[p], b[p]

In [78]:
train_dataset, train_labels = unison_shuffle(dataset["train_data"], dataset["train_labels"])
valid_dataset, valid_labels = unison_shuffle(dataset["valid_data"], dataset["valid_labels"])
test_dataset, test_labels = unison_shuffle(dataset["valid_data"], dataset["test_labels"])

## Part 2: Building a classification model ##

<span style="font-size:larger;">In this section, you will build a **neural network** for letter classification and train it with **batch gradient descent**, **stochastic gradient descent** and **mini-batch gradient descent**.</span>

<span style="font-size:larger;">The dataset stores each image as a Tensor of rank two. However, a fully-connected (standard) neural network only accepts vectors (Tensors of rank 1). Therefore, the following cell **vectorizes each image in the dataset**.</span>

![flatten image](images/flatten_image.png)

In [46]:
train_dataset = np.mean(train_dataset, axis=-1)
valid_dataset = np.mean(valid_dataset, axis=-1)
test_dataset = np.mean(test_dataset, axis=-1)

In [47]:
# vectorize each image
def maybe_vectorize(dataset):
  if len(dataset.shape) == 3:
    return np.reshape(dataset, (dataset.shape[0], dataset.shape[1] * dataset.shape[2]))
  else:
    return dataset

train_dataset = maybe_vectorize(train_dataset)
valid_dataset = maybe_vectorize(valid_dataset)
test_dataset = maybe_vectorize(test_dataset)

print('Training dataset shape:', train_dataset.shape)
print('Validation dataset shape:', valid_dataset.shape)
print('Test dataset shape:', test_dataset.shape)

Training dataset shape: (8000, 1024)
Validation dataset shape: (1000, 1024)
Test dataset shape: (1000, 1024)


<span style="font-size:larger;">Furthermore, the dataset labels (which record what letter is depicted on each image) are stores as integers where 0 represents letter A and 9 represents J. A neural network usually outputs a vector in which each element represents the probability that an input image belongs to a certain label. In order to train the neural network, you will need to compare the predicted probabilities with the correct label. To do this, it's convenient to turn each label into a vector with a one in the position that corresponds to its index and the rest set to zero. This is called **one-hot encoding**.</span>

![one-hot encoding](images/one_hot_encoding.jpg)

In [49]:
# one-hot encode each label
def maybe_turn_to_one_hot(labels, num_labels=4):
  if len(labels.shape) == 1:
    one_hot = np.zeros((labels.shape[0], num_labels))
    one_hot[np.arange(len(labels)), labels] = 1
    return one_hot
  else:
    return labels

train_labels = maybe_turn_to_one_hot(train_labels)
valid_labels = maybe_turn_to_one_hot(valid_labels)
test_labels = maybe_turn_to_one_hot(test_labels)

print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', valid_labels.shape)
print('Test labels shape:', test_labels.shape)

Training labels shape: (8000, 4)
Validation labels shape: (1000, 4)
Test labels shape: (1000, 4)


### Fully-connected Neural Network ###

<span style="font-size:larger;">In the following series of tasks, you will implement a classfication model from scratch. We recommend you to use the Jupyter notebook from the first tutorial as a reference and try to implement your model based on that. Alternatively, you check the reference notebook which has all the solution (except for Part 4 which contains a bonus-point task) but you won't learn much by copying them. The tasks aren't graded.</span>

### Task 1A ###

* <span style="font-size:larger;">define a computation graph for a **fully-connected neural network**</span>
* <span style="font-size:larger;">the network should have **2 hidden layers** with **200 neurons in the first** and **100 neurons in the second** hidden layer</span>
* <span style="font-size:larger;">**input**: vectorized images in the shape (num_images, 784)</span>
* <span style="font-size:larger;">**output**: predictions in the shape (num_images, 10)</span>

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [56]:
import tensorflow as tf
tf.reset_default_graph()   # TF remembers everything you defined, this will keep the computation graph clean

#learning_rate = 0.1              # Batch Gradient Descent
#learning_rate = 0.01              # Stochastic Gradient Descent
learning_rate = 0.05             # Mini-batch Gradient Descent

input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1]))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))

layer1 = tf.layers.dense(input_data / 255, 200, activation=tf.nn.relu)
layer2 = tf.layers.dense(layer1, 100, activation=tf.nn.relu)
logits = tf.layers.dense(layer2, 4)

batch_loss = tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits)
loss = tf.reduce_mean(batch_loss)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

### Task 1B ###

* <span style="font-size:larger;">train the neural network you defined above using **Batch Gradient Descent**</span>
* <span style="font-size:larger;">during each learning step, the network makes predictions for all 90000 training images and learns from its mistakes</span>
* <span style="font-size:larger;">evaluate the network on the notMNIST evaluation set</span>


* <span style="font-size:larger;">recommended learning rate: 0.1</span>
* <span style="font-size:larger;">recommended number of training steps: 60</span>

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [53]:
num_steps = 60

with tf.Session() as session:
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset,
      input_labels: train_labels
    })
    print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels
  })
  print('Validation accuracy', validation_accuracy, '.')

step: 0 , loss: 1.502508 , training accuracy: 0.2485
step: 1 , loss: 1.5470589 , training accuracy: 0.237875
step: 2 , loss: 1.3833781 , training accuracy: 0.282375
step: 3 , loss: 1.373744 , training accuracy: 0.327875
step: 4 , loss: 1.3673692 , training accuracy: 0.314375
step: 5 , loss: 1.3626182 , training accuracy: 0.3485
step: 6 , loss: 1.3581783 , training accuracy: 0.319
step: 7 , loss: 1.3562946 , training accuracy: 0.357
step: 8 , loss: 1.351899 , training accuracy: 0.31975
step: 9 , loss: 1.3519707 , training accuracy: 0.360125
step: 10 , loss: 1.3451397 , training accuracy: 0.322875
step: 11 , loss: 1.3433777 , training accuracy: 0.37825
step: 12 , loss: 1.3372444 , training accuracy: 0.329375
step: 13 , loss: 1.3349733 , training accuracy: 0.39175
step: 14 , loss: 1.3312559 , training accuracy: 0.33575
step: 15 , loss: 1.3307893 , training accuracy: 0.391875
step: 16 , loss: 1.3299668 , training accuracy: 0.34
step: 17 , loss: 1.3332673 , training accuracy: 0.375875
step:

### Task 1C ###

* <span style="font-size:larger;">train the neural network you defined above using **Stochastic Gradient Descent**</span>
* <span style="font-size:larger;">during each learning step, the network makes predictions for a single image and learns from its mistakes</span>
* <span style="font-size:larger;">evaluate the network on the notMNIST evaluation set</span>


* <span style="font-size:larger;">recommended learning rate: 0.01</span>
* <span style="font-size:larger;">recommended number of training steps: 20000</span>

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [55]:
num_steps = 20000
log_frequency = 1000

with tf.Session() as session:
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
    index = step % train_dataset.shape[0] 
    
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset[index : index + 1],
      input_labels: train_labels[index : index + 1]
    })
    
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels
  })
  print('Validation accuracy', validation_accuracy, '.')

step: 0 , loss: 1.6357461 , training accuracy: 0.0
step: 1000 , loss: 1.5027313 , training accuracy: 0.0
step: 2000 , loss: 1.3430177 , training accuracy: 0.0
step: 3000 , loss: 1.4384973 , training accuracy: 0.0
step: 4000 , loss: 1.4315618 , training accuracy: 0.0
step: 5000 , loss: 1.4247622 , training accuracy: 0.0


KeyboardInterrupt: 

### Task 1D ###

* <span style="font-size:larger;">train the neural network you defined above using **Mini-batch Gradient Descent**</span>
* <span style="font-size:larger;">during each learning step, the network makes predictions for a small batch of images and learns from its mistakes</span>
* <span style="font-size:larger;">evaluate the network on the notMNIST evaluation set</span>


* <span style="font-size:larger;">recommended mini-batch size: 64</span>
* <span style="font-size:larger;">recommended learning rate: 0.05</span>
* <span style="font-size:larger;">recommended number of training steps: 5000</span>

<span style="font-size:larger;">See the reference notebook a the solution.</span>

In [57]:
num_steps = 5000
mini_batch_size = 64
log_frequency = 500

with tf.Session() as session:
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
    start = step
    end = step + mini_batch_size
    
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap")
    })
    
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels
  })
  print('Validation accuracy', validation_accuracy, '.')

step: 0 , loss: 1.5325315 , training accuracy: 0.25
step: 500 , loss: 0.78410023 , training accuracy: 0.75
step: 1000 , loss: 0.39410794 , training accuracy: 0.890625
step: 1500 , loss: 0.7149308 , training accuracy: 0.703125
step: 2000 , loss: 0.6830551 , training accuracy: 0.734375
step: 2500 , loss: 0.42496115 , training accuracy: 0.84375
step: 3000 , loss: 0.3554579 , training accuracy: 0.921875
step: 3500 , loss: 0.4985964 , training accuracy: 0.84375
step: 4000 , loss: 0.34191248 , training accuracy: 0.859375
step: 4500 , loss: 0.65498257 , training accuracy: 0.78125
Training finished after 5000 steps.
Validation accuracy 0.405 .


## Part 3: Saving models and visualizing learning in Tensorflow ##

<span style="font-size:larger;">It isn't very convenient to train your model each time you want to use it. On top of that, more complex Computer Vision models take weeks to train.</span>

<span style="font-size:larger;">In this section, you will learn how to save and load a Tensorflow model. In addition, you will visualize how the loss and accuracy changes during the training of your neural network using Tensorboard.</span>

### Task 2A ###

<span style="font-size:larger;">Copy the definition of your neural network and add the following lines to the end of the code snippet. You might need to change the names of the variables or delete the second line if you haven't defined an operation that measures the accuracy of your model.</span>

```
<span style="font-size:larger;">tf.summary.scalar('loss', loss)</span>
<span style="font-size:larger;">tf.summary.scalar('accuracy', accuracy)</span>
<span style="font-size:larger;">summaries = tf.summary.merge_all()</span>
```

<span style="font-size:larger;">The first two lines create summary operations for your model's loss and accuracy which will be recorded during each training step. The third line groups the two summaries together so that you have a single operation that is easy to work with.</span>

<span style="font-size:larger;">See the reference notebook a the solution.</span>

In [None]:
import tensorflow as tf
tf.reset_default_graph()   # TF remembers everything you defined, this will keep the computation graph clean

learning_rate = 0.1

input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1]))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))

layer1 = tf.layers.dense(input_data, 200, activation=tf.nn.relu)
layer2 = tf.layers.dense(layer1, 100, activation=tf.nn.relu)
logits = tf.layers.dense(layer2, 10)

batch_loss = tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits)
loss = tf.reduce_mean(batch_loss)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

tf.summary.scalar('loss', loss)
tf.summary.scalar('accuracy', accuracy)
summaries = tf.summary.merge_all()

### Task 2B ###

<span style="font-size:larger;">Add these lines to the beginning of your trainig script.</span>

```
saver = tf.train.Saver()
summary_writer = tf.summary.FileWriter('data', graph=tf.get_default_graph())
```

<span style="font-size:larger;">You can save a summary using `summary_writer.add_summary(summary, global_step=step)` and the model using `saver.save(session, os.path.join('data/notMNIST-model-1'), global_step=step)`.</span>

<span style="font-size:larger;">See the reference notebook a the solution.</span>

In [None]:
num_steps = 10000
mini_batch_size = 64
log_frequency = 500

saver = tf.train.Saver()
summary_writer = tf.summary.FileWriter('data', graph=tf.get_default_graph())

with tf.Session() as session:
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
    start = step
    end = step + mini_batch_size
    
    batch_loss, batch_accuracy, summary, _ = session.run([loss, accuracy, summaries, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap")
    })
    summary_writer.add_summary(summary, global_step=step)
    
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels
  })
  print('Validation accuracy', validation_accuracy, '.')

  saver.save(session, os.path.join('data/notMNIST-model-1'), global_step=step)

<span style="font-size:larger;">You can **load** and evaluate your model using the following script. When saving a model, Tensorflow generates three different files (data, meta and index). To load a model, simply specify its path without the extension.</span>

In [None]:
path_to_your_model = None

if path_to_your_model is None:
  print("Please specify the path to your saved model.")
else:
  saver = tf.train.Saver()

  with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    saver.restore(session, path_to_your_model)

    validation_accuracy = session.run(accuracy, feed_dict={
      input_data: valid_dataset,
      input_labels: valid_labels
    })
    print('Validation accuracy:', validation_accuracy)

<span style="font-size:larger;">To visualize the training, call the following command.</span>

```
tensorboard --logdir data
```

<span style="font-size:larger;">You must activate the virtual environment that contains your instalation of Tensorflow before you can call this command.</span>

## Part 4: Regularization ##

<span style="font-size:larger;">As you might have noticed, some of the models you have trained earlier report training accuracy that is much higher than the validation or testing accuracy. This is due to the model **overfitting** on the training data. Overfitting can be mitigated using **regularization** techniques.</span>

<span style="font-size:larger;">**Dropout** is a popular technique that prevents overfitting by dropping some of the activation of a particular layer. Take a look at [dropout in Tensorflow](https://www.tensorflow.org/api_docs/python/tf/layers/dropout). You can add it into your computation graph as a layer similarly to the Dense layer.</span> 

<span style="font-size:larger;">However, there is one more thing you need to do before you can use it. Dropout should drop the activations only during training because we don't want to loose any information when we use the model. You will need to define a boolean Tensor that will tell the dropout layer if it's in the training or testing mode.</span>

### Task 3 (bonus points) ###

<span style="font-size:larger;">Add a dropout layer **before** the last Dense layer and make sure that activations are dropped only during training. Try changing the drop probability in order to obtain the best validation accuracy. Use the neural network you have defined above and train it with mini-batch gradient descent.</span>

<span style="font-size:larger;">The solution will be added to the reference notebook after the end of this tutorial.</span>

In [65]:
import tensorflow as tf
tf.reset_default_graph()   # TF remembers everything you defined, this will keep the computation graph clean

learning_rate = 0.05
rate = 0.55

is_training = tf.placeholder(tf.bool)

input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1]))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))

layer1 = tf.layers.dense(input_data / 255, 200, activation=tf.nn.relu)

dropout1 = tf.layers.dropout(layer1, rate=rate, training=is_training)

layer2 = tf.layers.dense(dropout1, 200, activation=tf.nn.relu)

dropout2 = tf.layers.dropout(layer2, rate=rate, training=is_training)

logits = tf.layers.dense(dropout2, 4)

batch_loss = tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits)
loss = tf.reduce_mean(batch_loss)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

In [66]:
num_steps = 5000
mini_batch_size = 64
log_frequency = 500

with tf.Session() as session:
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
    start = step
    end = step + mini_batch_size
    
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap"),
      is_training: True
    })
    
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels,
    is_training: False
  })
  print('Validation accuracy', validation_accuracy, '.')

step: 0 , loss: 1.6472831 , training accuracy: 0.28125
step: 500 , loss: 1.0496225 , training accuracy: 0.515625
step: 1000 , loss: 0.9597064 , training accuracy: 0.59375
step: 1500 , loss: 1.2615576 , training accuracy: 0.421875
step: 2000 , loss: 0.9666988 , training accuracy: 0.578125
step: 2500 , loss: 1.1226506 , training accuracy: 0.5
step: 3000 , loss: 1.0655395 , training accuracy: 0.453125
step: 3500 , loss: 1.0706731 , training accuracy: 0.46875
step: 4000 , loss: 1.0345829 , training accuracy: 0.515625
step: 4500 , loss: 0.9784797 , training accuracy: 0.515625
Training finished after 5000 steps.
Validation accuracy 0.379 .


## (Optional) Part 5: Convolutional Neural Networks ##

<span style="font-size:larger;">Fully-connected neural network are not appropriate for modelling images becuase they aren't invariant to translations and have too many weights. For these reasons, a different type of neural network was developed. Convolutional Neural Networks (ConvNets) use filters and max-pooling layers to keep the number of weights low and to learn to recognize object regardless of their position in the image. Moreover, they are easy to implement in Tensorflow</span>

<span style="font-size:larger;">Implement a simple ConvNet with convolutional, max-pooling and dense layers.</span>

<span style="font-size:larger;">Useful links:</span>
* [lecture notes Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [Tensorflow tutorial](https://www.tensorflow.org/tutorials/layers)

<span style="font-size:larger;">See the reference notebook for a solution.</span>

In [71]:
# load a subset of the notMNIST dataset
pickle_file = os.path.join(data_root, 'notMNIST.pickle')

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

NameError: name 'data_root' is not defined

In [79]:
def maybe_add_channels_dimension(dataset):
  if len(dataset.shape) == 3:
    return np.expand_dims(dataset, axis=-1)
  else:
    return dataset

train_dataset = maybe_add_channels_dimension(train_dataset)
valid_dataset = maybe_add_channels_dimension(valid_dataset)
test_dataset = maybe_add_channels_dimension(test_dataset)

print('Training dataset shape:', train_dataset.shape)
print('Validation dataset shape:', valid_dataset.shape)
print('Test dataset shape:', test_dataset.shape)

Training dataset shape: (8000, 32, 32, 3)
Validation dataset shape: (1000, 32, 32, 3)
Test dataset shape: (1000, 32, 32, 3)


In [80]:
def maybe_turn_to_one_hot(labels, num_labels=4):
  if len(labels.shape) == 1:
    one_hot = np.zeros((labels.shape[0], num_labels))
    one_hot[np.arange(len(labels)), labels] = 1
    return one_hot
  else:
    return labels

train_labels = maybe_turn_to_one_hot(train_labels)
valid_labels = maybe_turn_to_one_hot(valid_labels)
test_labels = maybe_turn_to_one_hot(test_labels)

print('Training labels shape:', train_labels.shape)
print('Validation labels shape:', valid_labels.shape)
print('Test labels shape:', test_labels.shape)

Training labels shape: (8000, 4)
Validation labels shape: (1000, 4)
Test labels shape: (1000, 4)


In [87]:
import tensorflow as tf
tf.reset_default_graph()   # TF remembers everything you defined, this will keep the computation graph clean

learning_rate = 0.05

input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1], train_dataset.shape[2], 3))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))

conv1 = tf.layers.conv2d(input_data / 255, 16, (3, 3), (1, 1), activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, (2, 2), (2, 2))

conv2 = tf.layers.conv2d(pool1, 32, (3, 3), (1, 1), activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, (2, 2), (2, 2))

flattened = tf.contrib.layers.flatten(pool2)

dense1 = tf.layers.dense(flattened, 50)
logits = tf.layers.dense(dense1, 4)

batch_loss = tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits)
loss = tf.reduce_mean(batch_loss)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

In [88]:
num_steps = 1000
mini_batch_size = 64
log_frequency = 100

with tf.Session() as session:
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
    start = step
    end = step + mini_batch_size
    
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap")
    })
    
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels
  })
  print('Validation accuracy', validation_accuracy, '.')

step: 0 , loss: 1.3556564 , training accuracy: 0.328125
step: 100 , loss: 1.0885996 , training accuracy: 0.5
step: 200 , loss: 0.6834685 , training accuracy: 0.828125
step: 300 , loss: 0.5894709 , training accuracy: 0.8125
step: 400 , loss: 0.15445615 , training accuracy: 0.96875
step: 500 , loss: 0.896245 , training accuracy: 0.65625
step: 600 , loss: 0.62140757 , training accuracy: 0.78125
step: 700 , loss: 0.30756885 , training accuracy: 0.90625
step: 800 , loss: 0.3614478 , training accuracy: 0.921875
step: 900 , loss: 0.40963924 , training accuracy: 0.859375
Training finished after 1000 steps.
Validation accuracy 0.432 .


## (Optional) Part 6: Regularizing ConvNets ##

<span style="font-size:larger;">All neural networks are prone to overfitting if the training dataset is too small. Implement dropout for your ConvNet. See the reference notebook for a solution.</span>

In [91]:
import tensorflow as tf
tf.reset_default_graph()   # TF remembers everything you defined, this will keep the computation graph clean

learning_rate = 0.05
dropout_prob = 0.95

input_data = tf.placeholder(tf.float32, (None, train_dataset.shape[1], train_dataset.shape[2], 3))
input_labels = tf.placeholder(tf.int32, (None, train_labels.shape[1]))
is_training = tf.placeholder(tf.bool)

conv1 = tf.layers.conv2d(input_data / 255, 16, (3, 3), (1, 1), activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(conv1, (2, 2), (2, 2))

conv2 = tf.layers.conv2d(pool1, 32, (3, 3), (1, 1), activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(conv2, (2, 2), (2, 2))

flattened = tf.contrib.layers.flatten(pool2)

dense1 = tf.layers.dense(flattened, 100)
dropout = tf.layers.dropout(dense1, rate=dropout_prob, training=is_training)

logits = tf.layers.dense(dropout, 4)

batch_loss = tf.nn.softmax_cross_entropy_with_logits(labels=input_labels, logits=logits)
loss = tf.reduce_mean(batch_loss)
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(input_labels, 1)), tf.float32))

In [92]:
num_steps = 1500
mini_batch_size = 64
log_frequency = 100

with tf.Session() as session:
  session.run(tf.global_variables_initializer())

  for step in range(num_steps):
    start = step
    end = step + mini_batch_size
    
    batch_loss, batch_accuracy, _ = session.run([loss, accuracy, train_op], feed_dict={
      input_data: train_dataset.take(range(start, end), axis=0, mode="wrap"),
      input_labels: train_labels.take(range(start, end), axis=0, mode="wrap"),
      is_training: True
    })
    
    if step % log_frequency == 0:
      print('step:', step, ', loss:', batch_loss, ', training accuracy:', batch_accuracy)
    
  print('Training finished after', num_steps, 'steps.')
  validation_accuracy = session.run(accuracy, feed_dict={
    input_data: valid_dataset,
    input_labels: valid_labels,
    is_training: False
  })
  print('Validation accuracy', validation_accuracy, '.')

step: 0 , loss: 1.796901 , training accuracy: 0.234375
step: 100 , loss: 1.3522882 , training accuracy: 0.296875
step: 200 , loss: 1.2479856 , training accuracy: 0.390625
step: 300 , loss: 1.2983861 , training accuracy: 0.40625
step: 400 , loss: 1.2403922 , training accuracy: 0.46875
step: 500 , loss: 1.2443997 , training accuracy: 0.421875
step: 600 , loss: 1.1499411 , training accuracy: 0.5
step: 700 , loss: 1.1049087 , training accuracy: 0.578125
step: 800 , loss: 1.1692582 , training accuracy: 0.46875
step: 900 , loss: 0.91896075 , training accuracy: 0.640625
step: 1000 , loss: 1.0065439 , training accuracy: 0.5625
step: 1100 , loss: 1.0984967 , training accuracy: 0.53125
step: 1200 , loss: 0.94727653 , training accuracy: 0.640625
step: 1300 , loss: 1.0428303 , training accuracy: 0.65625
step: 1400 , loss: 1.0510213 , training accuracy: 0.625
Training finished after 1500 steps.
Validation accuracy 0.507 .


## Additional Resources   ##

** Saving and restoring models in Tensorfow **
* [tutorial](https://www.tensorflow.org/programmers_guide/saved_model)

** Visualizing learning using Tensorboard **
* [tutorial](https://www.tensorflow.org/get_started/summaries_and_tensorboard)

** Convolutional Networks **
* [lecture notes Stanford University **(recommended)**](http://cs231n.github.io/convolutional-networks/)
* [lecture video from the University of Oxford](https://www.youtube.com/watch?v=bEUX_56Lojc)
* [Tensorflow tutorial](https://www.tensorflow.org/tutorials/layers)

** Dropout **
* [documentation](https://www.tensorflow.org/api_docs/python/tf/layers/dropout)
* [paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

** Advanced data loading features in Tensorflow **
* [tutorial](https://www.tensorflow.org/programmers_guide/datasets)


## Try out different dataset ##

* [Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats)
* [CIFAR-10](https://www.kaggle.com/c/cifar-10)