# Exercises

### 8. Adjust the learning rate. Try a value of 0.0001. Does it make a difference?

** Solution **

First, we have to define a custom optimizer (as we did in the TensorFlow intro).

We create the custom optimizer with:

    custom_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

Then we change the respective argument in model.compile to reflect this: 

    model.compile(optimizer=custom_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    

Since the learning rate is lower than normal, we may need to adjust the max_epochs (to, say, 50). 

The result is basically the same, but we reach it much slower.

While Adam adapts to the problem, if the orders of magnitude are too different, it may not have enough time to adjust accordingly.

# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf



import tensorflow_datasets as tfds


## Data

That's where we load and preprocess our data.

In [2]:


mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']


num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples

num_validation_samples = tf.cast(num_validation_samples, tf.int64)


num_test_samples = mnist_info.splits['test'].num_examples

num_test_samples = tf.cast(num_test_samples, tf.int64)


def scale(image, label):
    
    image = tf.cast(image, tf.float32)
    
    image /= 255.

    return image, label



scaled_train_and_validation_data = mnist_train.map(scale)


test_data = mnist_test.map(scale)


# let's also shuffle the data

BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)


validation_data = shuffled_train_and_validation_data.take(num_validation_samples)


train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


BATCH_SIZE = 100


train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)


test_data = test_data.batch(num_test_samples)


validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [3]:
input_size = 784
output_size = 10

hidden_layer_size = 50

model = tf.keras.Sequential([
    
    
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [4]:


custom_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

model.compile(optimizer=custom_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [5]:


NUM_EPOCHS = 50

model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/50
540/540 - 7s - loss: 1.2689 - accuracy: 0.6816 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/50
540/540 - 6s - loss: 0.4529 - accuracy: 0.8850 - val_loss: 0.3684 - val_accuracy: 0.9018
Epoch 3/50
540/540 - 6s - loss: 0.3437 - accuracy: 0.9059 - val_loss: 0.3054 - val_accuracy: 0.9178
Epoch 4/50
540/540 - 7s - loss: 0.2977 - accuracy: 0.9164 - val_loss: 0.2745 - val_accuracy: 0.9280
Epoch 5/50
540/540 - 6s - loss: 0.2705 - accuracy: 0.9237 - val_loss: 0.2533 - val_accuracy: 0.9325
Epoch 6/50
540/540 - 7s - loss: 0.2513 - accuracy: 0.9285 - val_loss: 0.2353 - val_accuracy: 0.9367
Epoch 7/50
540/540 - 7s - loss: 0.2345 - accuracy: 0.9331 - val_loss: 0.2217 - val_accuracy: 0.9410
Epoch 8/50
540/540 - 7s - loss: 0.2191 - accuracy: 0.9379 - val_loss: 0.2092 - val_accuracy: 0.9435
Epoch 9/50
540/540 - 8s - loss: 0.2054 - accuracy: 0.9420 - val_loss: 0.1994 - val_accuracy: 0.9460
Epoch 10/50
540/540 - 7s - loss: 0.1939 - accuracy: 0.9444 - val_loss: 0.1891 - val_accuracy

<tensorflow.python.keras.callbacks.History at 0x237cd972b70>

## Test the model

As we discussed in the lectures, after training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. 

The test is the absolute final instance. You should not test before you are completely done with adjusting your model.

If you adjust your model after testing, you will start overfitting the test dataset, which will defeat its purpose.

In [6]:
test_loss, test_accuracy = model.evaluate(test_data)

      1/Unknown - 4s 4s/step - loss: 0.1049 - accuracy: 0.96 - 4s 4s/step - loss: 0.1049 - accuracy: 0.9675

In [7]:

print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.10. Test accuracy: 96.75%
