# Exercises

### 1. The *width* (the hidden layer size) of the algorithm. Try a hidden layer size of 200. How does the validation accuracy of the model change? What about the time it took the algorithm to train? Can you find a hidden layer size that does better?

**Solution**

Find the variable: "hidden_layer_size" and change it to 200.

The validation accuracy is significantly higher (as the algorithm with 50 hidden units was too simple of a model).

Naturally, it takes the algorithm much longer to train (unless early stopping is triggered too soon).

A hidden layer size of 500 (and not only) works better.

## Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" for machine learning because for most students it is their first example. The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional networks. The dataset provides 28x28 images of handwritten digits (1 per image) and the goal is to write an algorithm that detects which digit is written. Since there are only 10 digits, this is a classification problem with 10 classes. In order to exemplify what we've talked about in this section, we will build a network with 2 hidden layers between inputs and outputs.

## Import the relevant packages

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


## Outline the model

The whole code is in one cell, so you can simply rerun this cell (instead of the whole notebook) and train a new model.
The tf.reset_default_graph() function takes care of clearing the old parameters. From there on, a completely new training starts.

In [4]:
input_size = 784
output_size = 10

hidden_layer_size = 200


tf.compat.v1.reset_default_graph()


inputs = tf.compat.v1.placeholder(tf.float32, [None, input_size])
targets = tf.compat.v1.placeholder(tf.float32, [None, output_size])


weights_1 = tf.compat.v1.get_variable("weights_1", [input_size, hidden_layer_size])
biases_1 = tf.compat.v1.get_variable("biases_1", [hidden_layer_size])


outputs_1 = tf.nn.relu(tf.matmul(inputs, weights_1) + biases_1)


weights_2 = tf.compat.v1.get_variable("weights_2", [hidden_layer_size, hidden_layer_size])
biases_2 = tf.compat.v1.get_variable("biases_2", [hidden_layer_size])


outputs_2 = tf.nn.relu(tf.matmul(outputs_1, weights_2) + biases_2)


weights_3 = tf.compat.v1.get_variable("weights_3", [hidden_layer_size, output_size])
biases_3 = tf.compat.v1.get_variable("biases_3", [output_size])


outputs = tf.matmul(outputs_2, weights_3) + biases_3


loss = tf.nn.softmax_cross_entropy_with_logits(logits=outputs, labels=targets)


mean_loss = tf.reduce_mean(loss)


optimize = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001).minimize(mean_loss)


out_equals_target = tf.equal(tf.argmax(outputs, 1), tf.argmax(targets, 1))


accuracy = tf.reduce_mean(tf.cast(out_equals_target, tf.float32))


sess = tf.compat.v1.InteractiveSession()


initializer = tf.compat.v1.global_variables_initializer()
sess.run(initializer)


batch_size = 100

batches_number = mnist.train._num_examples // batch_size


max_epochs = 15


prev_validation_loss = 9999999.

import time
start_time = time.time()


for epoch_counter in range(max_epochs):
    
    
    curr_epoch_loss = 0.
    
    
    for batch_counter in range(batches_number):
        
        
        input_batch, target_batch = mnist.train.next_batch(batch_size)
        
        
        _, batch_loss = sess.run([optimize, mean_loss], 
            feed_dict={inputs: input_batch, targets: target_batch})
        
        
        curr_epoch_loss += batch_loss
        
    curr_epoch_loss /= batches_number
    
    
    input_batch, target_batch = mnist.validation.next_batch(mnist.validation._num_examples)
    
    
    validation_loss, validation_accuracy = sess.run([mean_loss, accuracy], 
        feed_dict={inputs: input_batch, targets: target_batch})
    
    
    print('Epoch '+str(epoch_counter+1)+
          '. Mean loss: '+'{0:.3f}'.format(curr_epoch_loss)+
          '. Validation loss: '+'{0:.3f}'.format(validation_loss)+
          '. Validation accuracy: '+'{0:.2f}'.format(validation_accuracy * 100.)+'%')
    
    
    if validation_loss > prev_validation_loss:
        break
        
        
    prev_validation_loss = validation_loss

    
print('End of training.')


print("Training time: %s seconds" % (time.time() - start_time))

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Epoch 1. Mean loss: 0.276. Validation loss: 0.127. Validation accuracy: 96.44%
Epoch 2. Mean loss: 0.103. Validation loss: 0.102. Validation accuracy: 97.00%
Epoch 3. Mean loss: 0.070. Validation loss: 0.072. Validation accuracy: 97.90%
Epoch 4. Mean loss: 0.050. Validation loss: 0.077. Validation accuracy: 97.58%
End of training.
Training time: 4.320937156677246 seconds


## Test the model

As we discussed in the lectures, after training on the training and validation sets, we test the final prediction power of our model by running it on the test dataset that the algorithm has not seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. The test is the absolute final instance. You should not test before you are completely done with adjusting your model.

In [5]:
input_batch, target_batch = mnist.test.next_batch(mnist.test._num_examples)
test_accuracy = sess.run([accuracy], 
    feed_dict={inputs: input_batch, targets: target_batch})


# print (test_accuracy)
test_accuracy_percent = test_accuracy[0] * 100.


print('Test accuracy: '+'{0:.2f}'.format(test_accuracy_percent)+'%')

Test accuracy: 97.38%


Using the initial model and hyperparameters given in this notebook, the final test accuracy should be roughly between 97% and 98%. Each time the code is rerunned, we get a different accuracy as the batches are shuffled, the weights are initialized in a different way, etc.

Finally, we have intentionally reached a suboptimal solution, so you can have space to build on it.