# Exercises

There are several main adjustments you may try.

Please pay attention to the time it takes for each epoch to conclude.

Using the code from the lecture as the basis, fiddle with the hyperparameters of the algorithm.

1. The *width* (the hidden layer size) of the algorithm. Try a hidden layer size of 200. How does the validation accuracy of the model change? What about the time it took the algorithm to train? Can you find a hidden layer size that does better?

2. The *depth* of the algorithm. Add another hidden layer to the algorithm. This is an extremely important exercise! How does the validation accuracy change? What about the time it took the algorithm to train? Hint: Be careful with the shapes of the weights and the biases.

3. The *width and depth* of the algorithm. Add as many additional layers as you need to reach 5 hidden layers. Moreover, adjust the width of the algorithm as you find suitable. How does the validation accuracy change? What about the time it took the algorithm to train?

4. Fiddle with the activation functions. Try applying sigmoid transformation to both layers. The sigmoid activation is given by the string 'sigmoid'.

5. Fiddle with the activation functions. Try applying a ReLu to the first hidden layer and tanh to the second one. The tanh activation is given by the string 'tanh'.

6. Adjust the batch size. Try a batch size of 10000. How does the required time change? What about the accuracy?

7. Adjust the batch size. Try a batch size of 1. That's the SGD. How do the time and accuracy change? Is the result coherent with the theory?

8. Adjust the learning rate. Try a value of 0.0001. Does it make a difference?

9. Adjust the learning rate. Try a value of 0.02. Does it make a difference?

10. Combine all the methods above and try to reach a validation accuracy of 98.5+ percent.

Good luck!

# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [8]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

## Data

That's where we load and preprocess our data.

In [40]:
## I am changing dataDir of models
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True, data_dir="F:\Reference\AI\completeDsUdemy\dataDir")
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# as_supervised=True -> (input, target) 

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)



def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label


scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)


# let's also shuffle the data

BUFFER_SIZE = 10000
# this BUFFER_SIZE parameter is here for cases when we're dealing with enormous datasets
# BUFFER_SIZE in between - a computational optimization to approximate uniform shuffling

# luckily for us, there is a shuffle method readily available and we just need to specify the buffer size
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

# determine the batch size
BATCH_SIZE = 100

# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [17]:
input_size = 784
output_size = 10
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50
    
# define how the model will look like
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # since we don't know CNNs yet, we don't know how to feed such input into our net, so we must flatten the images
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows us to actually create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [18]:

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [20]:
import time
def train():
    start_time = time.time()
    NUM_EPOCHS = 5
    model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)
    print(f"--- %s seconds --- for {hidden_layer_size} size" % (time.time() - start_time))

train()

Epoch 1/5
540/540 - 1s - loss: 0.4089 - accuracy: 0.8838 - val_loss: 0.2055 - val_accuracy: 0.9388
Epoch 2/5
540/540 - 1s - loss: 0.1805 - accuracy: 0.9474 - val_loss: 0.1417 - val_accuracy: 0.9563
Epoch 3/5
540/540 - 1s - loss: 0.1385 - accuracy: 0.9586 - val_loss: 0.1127 - val_accuracy: 0.9652
Epoch 4/5
540/540 - 1s - loss: 0.1122 - accuracy: 0.9660 - val_loss: 0.0945 - val_accuracy: 0.9713
Epoch 5/5
540/540 - 1s - loss: 0.0931 - accuracy: 0.9719 - val_loss: 0.0907 - val_accuracy: 0.9708
--- 5.476787090301514 seconds --- for 50 size


### Try a hidden layer size of 200. 
 How does the validation accuracy of the model change? What about the time it took the algorithm to train? Can you find a hidden layer size that does better?

In [41]:
input_size = 784
output_size = 10
hidden_layer_size = 200
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(output_size, activation='softmax') 
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train()

Epoch 1/5
540/540 - 6s - loss: 0.2769 - accuracy: 0.9196 - val_loss: 0.1303 - val_accuracy: 0.9603
Epoch 2/5
540/540 - 4s - loss: 0.1033 - accuracy: 0.9684 - val_loss: 0.0871 - val_accuracy: 0.9727
Epoch 3/5
540/540 - 4s - loss: 0.0703 - accuracy: 0.9786 - val_loss: 0.0648 - val_accuracy: 0.9813
Epoch 4/5
540/540 - 4s - loss: 0.0517 - accuracy: 0.9839 - val_loss: 0.0513 - val_accuracy: 0.9843
Epoch 5/5
540/540 - 4s - loss: 0.0399 - accuracy: 0.9873 - val_loss: 0.0433 - val_accuracy: 0.9862
--- 23.7259578704834 seconds --- for 200 size


### The depth of the algorithm. Add another hidden layer to the algorithm.
This is an extremely important exercise! How does the validation accuracy change? What about the time it took the algorithm to train? Hint: Be careful with the shapes of the weights and the biases.

In [25]:
input_size = 784
output_size = 10
hidden_layer_size = 200
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='elu'), 
    tf.keras.layers.Dense(output_size, activation='softmax') 
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train()
print("with extra elu layer")

Epoch 1/5
540/540 - 2s - loss: 0.2512 - accuracy: 0.9257 - val_loss: 0.1185 - val_accuracy: 0.9665
Epoch 2/5
540/540 - 2s - loss: 0.0967 - accuracy: 0.9700 - val_loss: 0.0827 - val_accuracy: 0.9745
Epoch 3/5
540/540 - 2s - loss: 0.0677 - accuracy: 0.9792 - val_loss: 0.0599 - val_accuracy: 0.9812
Epoch 4/5
540/540 - 2s - loss: 0.0497 - accuracy: 0.9840 - val_loss: 0.0518 - val_accuracy: 0.9852
Epoch 5/5
540/540 - 2s - loss: 0.0398 - accuracy: 0.9871 - val_loss: 0.0399 - val_accuracy: 0.9883
--- 8.005591630935669 seconds --- for 200 size
with extra elu layer


### 5 hidden layers and Fiddle with the activation functions.

In [42]:

input_size = 784
output_size = 10
hidden_layer_size = 200
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='elu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='PReLU'),
    tf.keras.layers.Dense(hidden_layer_size, activation='sigmoid'), 
    tf.keras.layers.Dense(output_size, activation='softmax') 
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train()
print("5 hidden layers")

Epoch 1/5
540/540 - 8s - loss: 0.3155 - accuracy: 0.9057 - val_loss: 0.1343 - val_accuracy: 0.9577
Epoch 2/5
540/540 - 6s - loss: 0.1162 - accuracy: 0.9644 - val_loss: 0.1246 - val_accuracy: 0.9617
Epoch 3/5
540/540 - 6s - loss: 0.0824 - accuracy: 0.9748 - val_loss: 0.0800 - val_accuracy: 0.9773
Epoch 4/5
540/540 - 6s - loss: 0.0639 - accuracy: 0.9805 - val_loss: 0.0654 - val_accuracy: 0.9808
Epoch 5/5
540/540 - 6s - loss: 0.0498 - accuracy: 0.9843 - val_loss: 0.0589 - val_accuracy: 0.9832
--- 32.08390545845032 seconds --- for 200 size
5 hidden layers


In [43]:
test_loss, test_accuracy = model.evaluate(test_data)



### Try applying a ReLu to the first hidden layer and tanh to the second one. The tanh activation is given by the string 'tanh'.

In [28]:

input_size = 784
output_size = 10
hidden_layer_size = 50
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='tanh'), 
    tf.keras.layers.Dense(output_size, activation='softmax') 
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train()
print("relu tanh as hidden layers")

Epoch 1/5
540/540 - 6s - loss: 0.4017 - accuracy: 0.8917 - val_loss: 0.1897 - val_accuracy: 0.9457
Epoch 2/5
540/540 - 3s - loss: 0.1687 - accuracy: 0.9515 - val_loss: 0.1279 - val_accuracy: 0.9622
Epoch 3/5
540/540 - 3s - loss: 0.1217 - accuracy: 0.9645 - val_loss: 0.0985 - val_accuracy: 0.9725
Epoch 4/5
540/540 - 3s - loss: 0.0974 - accuracy: 0.9715 - val_loss: 0.0835 - val_accuracy: 0.9762
Epoch 5/5
540/540 - 3s - loss: 0.0802 - accuracy: 0.9759 - val_loss: 0.0770 - val_accuracy: 0.9777
--- 17.147287607192993 seconds --- for 50 size
relu tanh as hidden layers


### Adjust the batch size. 
Try a batch size of 10000. How does the required time change? What about the accuracy?

In [30]:
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

train_data = shuffled_train_and_validation_data.skip(num_validation_samples)
# determine the batch size
BATCH_SIZE = 10000

# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)
validation_inputs, validation_targets = next(iter(validation_data))
input_size = 784
output_size = 10
hidden_layer_size = 50
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='tanh'), 
    tf.keras.layers.Dense(output_size, activation='softmax') 
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train() 
print("Try a batch size of 10000 ")

Epoch 1/5
6/6 - 5s - loss: 2.1650 - accuracy: 0.2618 - val_loss: 1.8955 - val_accuracy: 0.5427
Epoch 2/5
6/6 - 2s - loss: 1.7520 - accuracy: 0.5952 - val_loss: 1.5164 - val_accuracy: 0.6685
Epoch 3/5
6/6 - 2s - loss: 1.3940 - accuracy: 0.6937 - val_loss: 1.2035 - val_accuracy: 0.7363
Epoch 4/5
6/6 - 2s - loss: 1.1106 - accuracy: 0.7545 - val_loss: 0.9681 - val_accuracy: 0.7837
Epoch 5/5
6/6 - 2s - loss: 0.8998 - accuracy: 0.7950 - val_loss: 0.7958 - val_accuracy: 0.8152
--- 12.377770900726318 seconds --- for 50 size
Try a batch size of 10000 


### Adjust the batch size. Try a batch size of 1. 

In [31]:
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

train_data = shuffled_train_and_validation_data.skip(num_validation_samples)
# determine the batch size
BATCH_SIZE = 1

# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)
validation_inputs, validation_targets = next(iter(validation_data))
input_size = 784
output_size = 10
hidden_layer_size = 50
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='tanh'), 
    tf.keras.layers.Dense(output_size, activation='softmax') 
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train() 
print("Try a batch size of 1 ")

Epoch 1/5
54000/54000 - 79s - loss: 0.2350 - accuracy: 0.9285 - val_loss: 0.1338 - val_accuracy: 0.9618
Epoch 2/5
54000/54000 - 78s - loss: 0.1393 - accuracy: 0.9587 - val_loss: 0.1116 - val_accuracy: 0.9690
Epoch 3/5
54000/54000 - 77s - loss: 0.1136 - accuracy: 0.9663 - val_loss: 0.1137 - val_accuracy: 0.9670
Epoch 4/5
54000/54000 - 77s - loss: 0.1048 - accuracy: 0.9699 - val_loss: 0.1020 - val_accuracy: 0.9738
Epoch 5/5
54000/54000 - 77s - loss: 0.0947 - accuracy: 0.9722 - val_loss: 0.0965 - val_accuracy: 0.9730
--- 387.3800530433655 seconds --- for 50 size
Try a batch size of 1 


### Adjust the learning rate. 
Try a value of 0.0001. Does it make a difference?

Adjust the learning rate. Try a value of 0.02. Does it make a difference?


In [36]:
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

# determine the batch size
BATCH_SIZE = 100

# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)
validation_inputs, validation_targets = next(iter(validation_data))
#-----------------------------------
input_size = 784
output_size = 10
hidden_layer_size = 200
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(output_size, activation='softmax') 
])
print("--------------------Try Adam with LR 0.02----------------")
opt = tf.keras.optimizers.Adam(learning_rate=0.02)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train()
print("--------------------Try Adam with LR 0.0001----------------")
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
train()


--------------------Try Adam with LR 0.02----------------
Epoch 1/5
540/540 - 5s - loss: 0.3109 - accuracy: 0.9108 - val_loss: 0.2087 - val_accuracy: 0.9422
Epoch 2/5
540/540 - 4s - loss: 0.1966 - accuracy: 0.9468 - val_loss: 0.1758 - val_accuracy: 0.9508
Epoch 3/5
540/540 - 3s - loss: 0.1937 - accuracy: 0.9485 - val_loss: 0.1841 - val_accuracy: 0.9487
Epoch 4/5
540/540 - 4s - loss: 0.1696 - accuracy: 0.9558 - val_loss: 0.1501 - val_accuracy: 0.9623
Epoch 5/5
540/540 - 4s - loss: 0.1531 - accuracy: 0.9606 - val_loss: 0.1393 - val_accuracy: 0.9645
--- 18.88952875137329 seconds --- for 200 size
--------------------Try Adam with LR 0.0001----------------
Epoch 1/5
540/540 - 4s - loss: 0.1123 - accuracy: 0.9698 - val_loss: 0.1124 - val_accuracy: 0.9705
Epoch 2/5
540/540 - 3s - loss: 0.0960 - accuracy: 0.9741 - val_loss: 0.1045 - val_accuracy: 0.9732
Epoch 3/5
540/540 - 3s - loss: 0.0893 - accuracy: 0.9750 - val_loss: 0.0997 - val_accuracy: 0.9733
Epoch 4/5
540/540 - 3s - loss: 0.0845 - acc

## Test the model - below is for original 2 layers and 50 of them



In [6]:
test_loss, test_accuracy = model.evaluate(test_data)



In [7]:

print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 96.81%


-