# Exercises

There are several main adjustments you may try.

Please pay attention to the time it takes for each epoch to conclude.

Using the code from the lecture as the basis, fiddle with the hyperparameters of the algorithm.

1. The *width* (the hidden layer size) of the algorithm. Try a hidden layer size of 200. How does the validation accuracy of the model change? What about the time it took the algorithm to train? Can you find a hidden layer size that does better?

2. The *depth* of the algorithm. Add another hidden layer to the algorithm. This is an extremely important exercise! How does the validation accuracy change? What about the time it took the algorithm to train? Hint: Be careful with the shapes of the weights and the biases.

3. The *width and depth* of the algorithm. Add as many additional layers as you need to reach 5 hidden layers. Moreover, adjust the width of the algorithm as you find suitable. How does the validation accuracy change? What about the time it took the algorithm to train?

4. Fiddle with the activation functions. Try applying sigmoid transformation to both layers. The sigmoid activation is given by the string 'sigmoid'.

5. Fiddle with the activation functions. Try applying a ReLu to the first hidden layer and tanh to the second one. The tanh activation is given by the string 'tanh'.

6. Adjust the batch size. Try a batch size of 10000. How does the required time change? What about the accuracy?

7. Adjust the batch size. Try a batch size of 1. That's the SGD. How do the time and accuracy change? Is the result coherent with the theory?

8. Adjust the learning rate. Try a value of 0.0001. Does it make a difference?

9. Adjust the learning rate. Try a value of 0.02. Does it make a difference?

10. Combine all the methods above and try to reach a validation accuracy of 98.5+ percent.

Good luck!

# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST that we'll use.
# It comes with the tensorflow-datasets module, therefore, if you haven't please install the package using
# pip install tensorflow-datasets 
# or
# conda install tensorflow-datasets

import tensorflow_datasets as tfds

# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

## Data

That's where we load and preprocess our data.

In [2]:
# remember the comment from above
# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

# tfds.load actually loads a dataset (or downloads and then loads if that's the first time you use it) 
# in our case, we are interesteed in the MNIST; the name of the dataset is the only mandatory argument
# there are other arguments we can specify, which we can find useful
# mnist_dataset = tfds.load(name='mnist', as_supervised=True)
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info

# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary
# obviously we prefer to have our inputs and targets separated 

# once we have loaded the dataset, we can easily extract the training and testing dataset with the built references
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

# by default, TF has training and testing datasets, but no validation sets
# thus we must split it on our own

# we start by defining the number of validation samples as a % of the train samples
# this is also where we make use of mnist_info (we don't have to count the observations)
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
# let's cast this number to an integer, as a float may cause an error along the way
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# let's also store the number of test samples in a dedicated variable (instead of using the mnist_info one)
num_test_samples = mnist_info.splits['test'].num_examples
# once more, we'd prefer an integer (rather than the default float)
num_test_samples = tf.cast(num_test_samples, tf.int64)


# normally, we would like to scale our data in some way to make the result more numerically stable
# in this case we will simply prefer to have inputs between 0 and 1
# let's define a function called: scale, that will take an MNIST image and its label
def scale(image, label):
    # we make sure the value is a float
    image = tf.cast(image, tf.float32)
    # since the possible values for the inputs are 0 to 255 (256 different shades of grey)
    # if we divide each element by 255, we would get the desired result -> all elements will be between 0 and 1 
    image /= 255.

    return image, label


# the method .map() allows us to apply a custom transformation to a given dataset
# we have already decided that we will get the validation data from mnist_train, so 
scaled_train_and_validation_data = mnist_train.map(scale)

# finally, we scale and batch the test data
# we scale it so it has the same magnitude as the train and validation
# there is no need to shuffle it, because we won't be training on the test data
# there would be a single batch, equal to the size of the test data
test_data = mnist_test.map(scale)


# let's also shuffle the data

BUFFER_SIZE = 10000
# this BUFFER_SIZE parameter is here for cases when we're dealing with enormous datasets
# then we can't shuffle the whole dataset in one go because we can't fit it all in memory
# so instead TF only stores BUFFER_SIZE samples in memory at a time and shuffles them
# if BUFFER_SIZE=1 => no shuffling will actually happen
# if BUFFER_SIZE >= num samples => shuffling is uniform
# BUFFER_SIZE in between - a computational optimization to approximate uniform shuffling

# luckily for us, there is a shuffle method readily available and we just need to specify the buffer size
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# once we have scaled and shuffled the data, we can proceed to actually extracting the train and validation
# our validation data would be equal to 10% of the training set, which we've already calculated
# we use the .take() method to take that many samples
# finally, we create a batch with a batch size equal to the total number of validation samples
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

# similarly, the train_data is everything else, so we skip as many samples as there are in the validation dataset
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

# determine the batch size
BATCH_SIZE = 100

# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)

# batch the test data
test_data = test_data.batch(num_test_samples)


# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [6]:
input_size = 784
output_size = 10
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50
    
# define how the model will look like
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # since we don't know CNNs yet, we don't know how to feed such input into our net, so we must flatten the images
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows us to actually create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [7]:
# we define the optimizer we'd like to use, 
# the loss function, 
# and the metrics we are interested in obtaining at each iteration
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [8]:
# determine the maximum number of epochs
NUM_EPOCHS = 5

# we fit the model, specifying the
# training data
# the total number of epochs
# and the validation data we just created ourselves in the format: (inputs,targets)
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
540/540 - 6s - loss: 0.3206 - accuracy: 0.9105 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/5
540/540 - 5s - loss: 0.1344 - accuracy: 0.9604 - val_loss: 0.1131 - val_accuracy: 0.9680
Epoch 3/5
540/540 - 5s - loss: 0.0969 - accuracy: 0.9710 - val_loss: 0.0972 - val_accuracy: 0.9745
Epoch 4/5
540/540 - 5s - loss: 0.0752 - accuracy: 0.9776 - val_loss: 0.0841 - val_accuracy: 0.9745
Epoch 5/5
540/540 - 5s - loss: 0.0609 - accuracy: 0.9812 - val_loss: 0.0681 - val_accuracy: 0.9790


<tensorflow.python.keras.callbacks.History at 0x229f22494a8>

## Test the model

As we discussed in the lectures, after training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. 

The test is the absolute final instance. You should not test before you are completely done with adjusting your model.

If you adjust your model after testing, you will start overfitting the test dataset, which will defeat its purpose.

In [9]:
test_loss, test_accuracy = model.evaluate(test_data)

      1/Unknown - 1s 1s/step - loss: 0.0897 - accuracy: 0.97 - 1s 1s/step - loss: 0.0897 - accuracy: 0.9728

In [10]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.09. Test accuracy: 97.28%


Using the initial model and hyperparameters given in this notebook, the final test accuracy should be roughly around 97%.

Each time the code is rerun, we get a different accuracy as the batches are shuffled, the weights are initialized in a different way, etc.

Finally, we have intentionally reached a suboptimal solution, so you can have space to build on it.

# Q1
The width (the hidden layer size) of the algorithm. Try a hidden layer size of 200. How does the validation accuracy of the model change? What about the time it took the algorithm to train? Can you find a hidden layer size that does better?

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [16]:
mnist_dataset,mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples,tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples, tf.int64)

def scale(image, label):
    image=tf.cast(image,tf.float32)
    image/=225.
    return image,label

scaled_train_and_validation_data=mnist_train.map(scale)

test_data=mnist_test.map(scale)

In [17]:
buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)

batch_size=100

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs,validation_targets=next(iter(validation_data))

In [18]:
input_size=28*28
output_size=10
hidden_layer_size=500

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets),verbose=2)

Epoch 1/5
540/540 - 14s - loss: 0.2177 - accuracy: 0.9344 - val_loss: 0.1262 - val_accuracy: 0.9618
Epoch 2/5
540/540 - 12s - loss: 0.0827 - accuracy: 0.9745 - val_loss: 0.0707 - val_accuracy: 0.9768
Epoch 3/5
540/540 - 14s - loss: 0.0519 - accuracy: 0.9835 - val_loss: 0.0430 - val_accuracy: 0.9873
Epoch 4/5
540/540 - 12s - loss: 0.0413 - accuracy: 0.9871 - val_loss: 0.0388 - val_accuracy: 0.9878
Epoch 5/5
540/540 - 12s - loss: 0.0281 - accuracy: 0.9912 - val_loss: 0.0431 - val_accuracy: 0.9872


<tensorflow.python.keras.callbacks.History at 0x63cbb6790>

In [19]:
test_loss,test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy))

Test loss: 0.08. Test accuracy: 0.98%.


Accuracy can be a little bit increased.

# Q2 
The depth of the algorithm. Add another hidden layer to the algorithm. This is an extremely important exercise! How does the validation accuracy change? What about the time it took the algorithm to train? Hint: Be careful with the shapes of the weights and the biases.

In [21]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [25]:
mnist_dataset,mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples,tf.int64)

def scale (image,label):
    image=tf.cast(image,tf.float32)
    image/=225.
    return image,label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)

batch_size=100

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs,validation_targets=next(iter(validation_data))

In [28]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs,validation_targets),verbose=2)

Epoch 1/5
540/540 - 9s - loss: 0.3998 - accuracy: 0.8829 - val_loss: 0.1757 - val_accuracy: 0.9463
Epoch 2/5
540/540 - 6s - loss: 0.1639 - accuracy: 0.9508 - val_loss: 0.1395 - val_accuracy: 0.9565
Epoch 3/5
540/540 - 6s - loss: 0.1263 - accuracy: 0.9620 - val_loss: 0.1170 - val_accuracy: 0.9620
Epoch 4/5
540/540 - 5s - loss: 0.1034 - accuracy: 0.9673 - val_loss: 0.0966 - val_accuracy: 0.9722
Epoch 5/5
540/540 - 6s - loss: 0.0896 - accuracy: 0.9728 - val_loss: 0.0820 - val_accuracy: 0.9723


<tensorflow.python.keras.callbacks.History at 0x63a191310>

In [29]:
test_loss,test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy:{1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.10. Test accuracy:97.00%.


It seems not obvious improvement of the accuracy.

So we should adjust the depth as well as width at the same time.


# Q3

The width and depth of the algorithm. Add as many additional layers as you need to reach 5 hidden layers. Moreover, adjust the width of the algorithm as you find suitable. How does the validation accuracy change? What about the time it took the algorithm to train?

In [30]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [31]:
mnist_dataset, mnist_info=tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples,tf.int64)

def scale(image, label):
    image=tf.cast(image,tf.float32)
    image/=225.
    return image, label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)


batch_size=100

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs, validation_targets=next(iter(validation_data))

In [32]:
input_size=28*28
output_size=10
hidden_layer_size=1000

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch,validation_data=(validation_inputs,validation_targets),verbose=2)

Epoch 1/5
540/540 - 65s - loss: 0.2367 - accuracy: 0.9287 - val_loss: 0.1480 - val_accuracy: 0.9593
Epoch 2/5
540/540 - 64s - loss: 0.1093 - accuracy: 0.9688 - val_loss: 0.1089 - val_accuracy: 0.9693
Epoch 3/5
540/540 - 65s - loss: 0.0814 - accuracy: 0.9771 - val_loss: 0.0708 - val_accuracy: 0.9783
Epoch 4/5
540/540 - 64s - loss: 0.0627 - accuracy: 0.9822 - val_loss: 0.0909 - val_accuracy: 0.9718
Epoch 5/5
540/540 - 64s - loss: 0.0558 - accuracy: 0.9840 - val_loss: 0.0756 - val_accuracy: 0.9848


<tensorflow.python.keras.callbacks.History at 0x63a604110>

In [33]:
test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.09. Test accuracy: 97.72%.


Change not so much among the accuracy. 

# Q4

Fiddle with the activation functions. Try applying sigmoid transformation to both layers. The sigmoid activation is given by the string 'sigmoid'.

In [None]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [34]:
mnist_dataset,mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples, tf.int64)

def scale(image, label):
    image=tf.cast(image, tf.float32)
    image/=225.
    return image,label

scaled_train_and_validation_data=mnist_train.map(scale)

test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)

batch_size=100

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs, validation_targets=next(iter(validation_data))

In [35]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='sigmoid'),
    tf.keras.layers.Dense(hidden_layer_size, activation='sigmoid'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 9s - loss: 1.0410 - accuracy: 0.7581 - val_loss: 0.4217 - val_accuracy: 0.8957
Epoch 2/5
540/540 - 6s - loss: 0.3301 - accuracy: 0.9128 - val_loss: 0.2676 - val_accuracy: 0.9277
Epoch 3/5
540/540 - 6s - loss: 0.2405 - accuracy: 0.9312 - val_loss: 0.2105 - val_accuracy: 0.9393
Epoch 4/5
540/540 - 6s - loss: 0.1959 - accuracy: 0.9436 - val_loss: 0.1772 - val_accuracy: 0.9495
Epoch 5/5
540/540 - 6s - loss: 0.1645 - accuracy: 0.9524 - val_loss: 0.1532 - val_accuracy: 0.9558


<tensorflow.python.keras.callbacks.History at 0x63d2fe410>

In [37]:
test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.16. Test accuracy: 95.24%.


Adjust the activation from 'relu' to 'sigmoid'

Generally, we should reach an inferior solution. That is because 'relu' is 'clean' the noise in the data. If a value is negative, relu filters it out, while if it is positive, it takes it into account. 

For the MNIST dataset, we care only about the intensely black and white parts in the images of the digits, so such filtering proves beneficial.

The sigmoid does not filter the signals as well as relu, but still reaches a respectable result, like around 95%.

Try using 'softmax' activation for all layers below.

In [38]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='softmax'),
    tf.keras.layers.Dense(hidden_layer_size, activation='softmax'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 6s - loss: 2.1939 - accuracy: 0.4535 - val_loss: 1.9750 - val_accuracy: 0.7478
Epoch 2/5
540/540 - 6s - loss: 1.6098 - accuracy: 0.7548 - val_loss: 1.2562 - val_accuracy: 0.7603
Epoch 3/5
540/540 - 6s - loss: 1.0076 - accuracy: 0.7649 - val_loss: 0.8413 - val_accuracy: 0.7788
Epoch 4/5
540/540 - 5s - loss: 0.7432 - accuracy: 0.7875 - val_loss: 0.6879 - val_accuracy: 0.8063
Epoch 5/5
540/540 - 5s - loss: 0.6387 - accuracy: 0.8153 - val_loss: 0.6223 - val_accuracy: 0.8138


<tensorflow.python.keras.callbacks.History at 0x639ec2610>

In [39]:
test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.62. Test accuracy: 80.95%.


If change the activation of all hidden layers from 'relu' to 'softmax', the accuracy will decrease very much.

# Q5

Fiddle with the activation functions. Try applying a ReLu to the first hidden layer and tanh to the second one. The tanh activation is given by the string 'tanh'.

In [40]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [43]:
mnist_dataset, mnist_info=tfds.load(name='mnist',with_info=True, as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples, tf.int64)

def scale(image, label):
    image=tf.cast(image, tf.float32)
    image/=225.
    return image,label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)


batch_size=100

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs,validation_targets=next(iter(validation_data))

In [44]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='tanh'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 9s - loss: 0.4063 - accuracy: 0.8875 - val_loss: 0.1947 - val_accuracy: 0.9423
Epoch 2/5
540/540 - 7s - loss: 0.1629 - accuracy: 0.9515 - val_loss: 0.1395 - val_accuracy: 0.9577
Epoch 3/5
540/540 - 6s - loss: 0.1211 - accuracy: 0.9636 - val_loss: 0.1135 - val_accuracy: 0.9670
Epoch 4/5
540/540 - 6s - loss: 0.0952 - accuracy: 0.9709 - val_loss: 0.0910 - val_accuracy: 0.9723
Epoch 5/5
540/540 - 6s - loss: 0.0774 - accuracy: 0.9768 - val_loss: 0.0885 - val_accuracy: 0.9725


<tensorflow.python.keras.callbacks.History at 0x63c97b090>

In [45]:
test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 96.67%.


Analogically, we can change the activation functions. This time though, we'll use different activators for the different layers.

The result is not significantly different. 

However, if with different width and depth, that may change.

# Q6

Adjust the batch size. Try a batch size of 10000. How does the required time change? What about the accuracy?

In [55]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [49]:
mnist_dataset, mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples,tf.int64)

def scale(image,label):
    image=tf.cast(image, tf.float32)
    image/=225.
    return image,label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)


batch_size=10000

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs, validation_targets=next(iter(validation_data))

In [50]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs,validation_targets), verbose=2)

Epoch 1/5
6/6 - 6s - loss: 2.2107 - accuracy: 0.2194 - val_loss: 1.9982 - val_accuracy: 0.3933
Epoch 2/5
6/6 - 3s - loss: 1.8799 - accuracy: 0.4570 - val_loss: 1.6574 - val_accuracy: 0.5593
Epoch 3/5
6/6 - 4s - loss: 1.5286 - accuracy: 0.6159 - val_loss: 1.3028 - val_accuracy: 0.7102
Epoch 4/5
6/6 - 4s - loss: 1.1825 - accuracy: 0.7441 - val_loss: 0.9843 - val_accuracy: 0.7802
Epoch 5/5
6/6 - 4s - loss: 0.8843 - accuracy: 0.7976 - val_loss: 0.7457 - val_accuracy: 0.8112


<tensorflow.python.keras.callbacks.History at 0x639550350>

In [51]:
test_loss,test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss,test_accuracy*100.))

Test loss: 0.72. Test accuracy: 82.33%.


change the batch_size from 100 to 10000.

A bigger batch size results in slower training. That's what we expected from the theory. We are taking advantage of batching becuz of the amazing speed increase.

Notice that the validation accuracy starts from a low number and with 5 epochs actually finishes at a lower number. That's becuz there are fewer updates in a single epoch.

If try a batch size of 30000 or 50000. That's very close to single batch GD for this problem. 

Change the max epochs to 100 (for instance), as 5 epochs won't be enough to train the model.

In [54]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)


def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)


BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


BATCH_SIZE = 30000

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))




input_size = 784
output_size = 10
hidden_layer_size = 50

model = tf.keras.Sequential([
                            tf.keras.layers.Flatten(input_shape=(28,28,1)),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(output_size, activation='softmax')   
                            ])



model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


NUM_EPOCHS = 5

model.fit(train_data, epochs = NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose=2)



test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Epoch 1/5
2/2 - 3s - loss: 2.3696 - accuracy: 0.0917 - val_loss: 2.2584 - val_accuracy: 0.1598
Epoch 2/5
2/2 - 2s - loss: 2.2353 - accuracy: 0.1898 - val_loss: 2.1624 - val_accuracy: 0.3148
Epoch 3/5
2/2 - 2s - loss: 2.1427 - accuracy: 0.3388 - val_loss: 2.0803 - val_accuracy: 0.4288
Epoch 4/5
2/2 - 2s - loss: 2.0599 - accuracy: 0.4426 - val_loss: 1.9964 - val_accuracy: 0.4917
Epoch 5/5
2/2 - 2s - loss: 1.9733 - accuracy: 0.5020 - val_loss: 1.9063 - val_accuracy: 0.5282
Test loss: 1.89. Test accuracy: 53.98%


If change the batch_size to 30000, it is like doing the single batch GD.

In [56]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)


def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)


BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


BATCH_SIZE = 50000

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))




input_size = 784
output_size = 10
hidden_layer_size = 50

model = tf.keras.Sequential([
                            tf.keras.layers.Flatten(input_shape=(28,28,1)),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(output_size, activation='softmax')   
                            ])



model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


NUM_EPOCHS = 5

model.fit(train_data, epochs = NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose=2)



test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Epoch 1/5
2/2 - 1s - loss: 2.3073 - accuracy: 0.1555 - val_loss: 2.2128 - val_accuracy: 0.2575
Epoch 2/5
2/2 - 0s - loss: 2.2080 - accuracy: 0.2656 - val_loss: 2.1215 - val_accuracy: 0.3787
Epoch 3/5
2/2 - 0s - loss: 2.1145 - accuracy: 0.3873 - val_loss: 2.0183 - val_accuracy: 0.4617
Epoch 4/5
2/2 - 0s - loss: 2.0101 - accuracy: 0.4761 - val_loss: 1.9024 - val_accuracy: 0.5312
Epoch 5/5
2/2 - 0s - loss: 1.8939 - accuracy: 0.5425 - val_loss: 1.7793 - val_accuracy: 0.5852
Test loss: 1.77. Test accuracy: 59.95%


If change the batch size to 50000, then it's like doing single batch GD.

# Q7

Adjust the batch size. Try a batch size of 1. That's the SGD. How do the time and accuracy change? Is the result coherent with the theory?

In [58]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [59]:
mnist_dataset, mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)

mnist_train, mnist_test=mnist_dataset['train'], mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples,tf.int64)

def scale(image, label):
    image=tf.cast(image, tf.float32)
    image/=225.
    return image, label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)


batch_size=1

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_input, validation_targets=next(iter(validation_data))

In [60]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size,activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size,activation='relu'),
    tf.keras.layers.Dense(output_size,activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
54000/54000 - 100s - loss: 0.2501 - accuracy: 0.9263 - val_loss: 12.1413 - val_accuracy: 0.0893
Epoch 2/5
54000/54000 - 93s - loss: 0.1555 - accuracy: 0.9573 - val_loss: 18.3165 - val_accuracy: 0.0885
Epoch 3/5
54000/54000 - 94s - loss: 0.1407 - accuracy: 0.9637 - val_loss: 22.2234 - val_accuracy: 0.0895
Epoch 4/5
54000/54000 - 99s - loss: 0.1316 - accuracy: 0.9658 - val_loss: 25.6107 - val_accuracy: 0.0895
Epoch 5/5
54000/54000 - 95s - loss: 0.1359 - accuracy: 0.9682 - val_loss: 26.1097 - val_accuracy: 0.0907


<tensorflow.python.keras.callbacks.History at 0x6393ffcd0>

In [61]:
test_loss,test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.17. Test accuracy: 95.90%.


Change batch_size from 100 to 1.

A batch size of 1 results in the SGD. It takes the algorithm very little time to process a single batch, as it is one data point, but there're thousands of batches (54000 to be precise), thus the algorithm is actually slow. Remember that this depends on the number of cores that you train on. If you are using a CPU with 4 or 8 cores, you can only train 4 or 8 batches at once. The middle ground (mini_batching such as 100 samples per batch) is optimal.

Notice that the validation accuracy starts from high number. That's becuz there're lots of updates in a single epoch. Once the training is over, the accuracy is lower than all other batch sizes (SGD was an approximation).




# Q8

Adjust the learning rate. Try a value of 0.0001. Does it make a difference?



In [68]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [69]:
mnist_dataset,mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples, tf.int64)

def scale(image, label):
    image=tf.cast(image, tf.float32)
    image/=225.
    return image, label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)
validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)


batch_size=100

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs, validation_targets=next(iter(validation_data))

In [70]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

custom_optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=custom_optimizer, loss='sparse_categorical_crossentropy',metrics=['accuracy'])

num_epoch=50
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/50
540/540 - 9s - loss: 1.2111 - accuracy: 0.6664 - val_loss: 0.5791 - val_accuracy: 0.8470
Epoch 2/50
540/540 - 6s - loss: 0.4404 - accuracy: 0.8839 - val_loss: 0.3847 - val_accuracy: 0.8947
Epoch 3/50
540/540 - 6s - loss: 0.3350 - accuracy: 0.9071 - val_loss: 0.3212 - val_accuracy: 0.9092
Epoch 4/50
540/540 - 6s - loss: 0.2903 - accuracy: 0.9190 - val_loss: 0.2877 - val_accuracy: 0.9157
Epoch 5/50
540/540 - 6s - loss: 0.2597 - accuracy: 0.9276 - val_loss: 0.2623 - val_accuracy: 0.9245
Epoch 6/50
540/540 - 6s - loss: 0.2385 - accuracy: 0.9335 - val_loss: 0.2419 - val_accuracy: 0.9320
Epoch 7/50
540/540 - 6s - loss: 0.2231 - accuracy: 0.9373 - val_loss: 0.2296 - val_accuracy: 0.9333
Epoch 8/50
540/540 - 6s - loss: 0.2087 - accuracy: 0.9410 - val_loss: 0.2158 - val_accuracy: 0.9390
Epoch 9/50
540/540 - 6s - loss: 0.1969 - accuracy: 0.9451 - val_loss: 0.2034 - val_accuracy: 0.9420
Epoch 10/50
540/540 - 6s - loss: 0.1869 - accuracy: 0.9470 - val_loss: 0.1950 - val_accuracy: 0.9437

<tensorflow.python.keras.callbacks.History at 0x638e83090>

In [71]:
test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.10. Test accuracy: 96.98%.


Since the learning_rate is lower than normal, need to adjust the max_epoch to bigger (try 50).

The result is basically the same, but we reach it much slower.

While Adam adapts to the problem, if the orders of magnitude are too different, it may not have enough time to adjust accordingly.



# Q9
Adjust the learning rate. Try a value of 0.02. Does it make a difference?



In [77]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [78]:
mnist_dataset, mnist_info=tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples, tf.int64)

def scale(image, label):
    image=tf.cast(image, tf.float32)
    image/=225.
    return image, label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000
shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)
validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)


batch_size=100

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs,validation_targets=next(iter(validation_data))

In [75]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

custom_optimizer=tf.keras.optimizers.Adam(learning_rate=0.02)
model.compile(optimizer=custom_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 9s - loss: 0.3159 - accuracy: 0.9055 - val_loss: 0.2445 - val_accuracy: 0.9360
Epoch 2/5
540/540 - 6s - loss: 0.2111 - accuracy: 0.9412 - val_loss: 0.2186 - val_accuracy: 0.9417
Epoch 3/5
540/540 - 6s - loss: 0.1904 - accuracy: 0.9474 - val_loss: 0.1982 - val_accuracy: 0.9468
Epoch 4/5
540/540 - 6s - loss: 0.1773 - accuracy: 0.9515 - val_loss: 0.2002 - val_accuracy: 0.9475
Epoch 5/5
540/540 - 6s - loss: 0.1695 - accuracy: 0.9539 - val_loss: 0.1708 - val_accuracy: 0.9542


<tensorflow.python.keras.callbacks.History at 0x636438450>

In [76]:
test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.19. Test accuracy: 95.20%.


While Adam adapts to the problem, if the orders of magnitude are too different, it may not have time to adjust accordingly. We start overfitting before we can reach a neat solution.

Therefore, for this problem, even 0.02 is a HIGH starting learning rate. 

Try 0.001, 0.0001, 0.00001. If it makes no difference, pick whatever, otherwise it makes sense to fiddle with the learning rate.



In [82]:
input_size=28*28
output_size=10
hidden_layer_size=50

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

custom_optimizer=tf.keras.optimizers.Adam(learning_rate=0.00001)
model.compile(optimizer=custom_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

num_epoch=5
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)


test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Epoch 1/5
540/540 - 6s - loss: 2.1598 - accuracy: 0.2674 - val_loss: 2.0341 - val_accuracy: 0.3998
Epoch 2/5
540/540 - 6s - loss: 1.8762 - accuracy: 0.5063 - val_loss: 1.7056 - val_accuracy: 0.5982
Epoch 3/5
540/540 - 6s - loss: 1.5360 - accuracy: 0.6496 - val_loss: 1.3815 - val_accuracy: 0.6878
Epoch 4/5
540/540 - 6s - loss: 1.2525 - accuracy: 0.7185 - val_loss: 1.1419 - val_accuracy: 0.7395
Epoch 5/5
540/540 - 6s - loss: 1.0461 - accuracy: 0.7621 - val_loss: 0.9683 - val_accuracy: 0.7755
Test loss: 0.94. Test accuracy: 79.03%.


Accuracy record: 

    learning_rate = 1: 9%
    
    learning_rate = 0.001: 95.4%
    
    learning_rate = 0.0001: 92.84%
    
    learning_rate = 0.00001: 79.02%
    
So learning_rate around 0.02 or just let 'Adam' handles it will be great. 

# Q10
Combine all the methods above and try to reach a validation accuracy of 98.5+ percent.

In [7]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [8]:
mnist_dataset, mnist_info=tfds.load(name='mnist',with_info=True,as_supervised=True)

mnist_train,mnist_test=mnist_dataset['train'],mnist_dataset['test']

num_validation_samples=0.1*mnist_info.splits['train'].num_examples
num_validation_samples=tf.cast(num_validation_samples, tf.int64)

num_test_samples=mnist_info.splits['test'].num_examples
num_test_samples=tf.cast(num_test_samples, tf.int64)

def scale(image, label):
    image=tf.cast(image, tf.float32)
    image/=225.
    return image, label

scaled_train_and_validation_data=mnist_train.map(scale)
test_data=mnist_test.map(scale)


buffer_size=10000

shuffled_train_and_validation_data=scaled_train_and_validation_data.shuffle(buffer_size)

validation_data=shuffled_train_and_validation_data.take(num_validation_samples)
train_data=shuffled_train_and_validation_data.skip(num_validation_samples)


batch_size=150

train_data=train_data.batch(batch_size)
validation_data=validation_data.batch(num_validation_samples)
test_data=test_data.batch(num_test_samples)

validation_inputs, validation_targets=next(iter(validation_data))

In [9]:
input_size=28*28
output_size=10
hidden_layer_size=5000

model=tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

num_epoch=10
model.fit(train_data, epochs=num_epoch, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/10
360/360 - 1486s - loss: 1.0597 - accuracy: 0.6385 - val_loss: 0.2777 - val_accuracy: 0.9347
Epoch 2/10
360/360 - 1309s - loss: 0.2227 - accuracy: 0.9470 - val_loss: 0.7366 - val_accuracy: 0.8513
Epoch 3/10
360/360 - 1341s - loss: 0.3108 - accuracy: 0.9270 - val_loss: 0.2272 - val_accuracy: 0.9575
Epoch 4/10
360/360 - 1372s - loss: 0.1690 - accuracy: 0.9638 - val_loss: 0.1780 - val_accuracy: 0.9645
Epoch 5/10
360/360 - 1379s - loss: 0.1861 - accuracy: 0.9644 - val_loss: 0.1268 - val_accuracy: 0.9717
Epoch 6/10
360/360 - 1371s - loss: 0.1841 - accuracy: 0.9643 - val_loss: 0.1430 - val_accuracy: 0.9720
Epoch 7/10
360/360 - 1305s - loss: 0.1222 - accuracy: 0.9735 - val_loss: 0.1998 - val_accuracy: 0.9638
Epoch 8/10
360/360 - 1344s - loss: 0.1166 - accuracy: 0.9749 - val_loss: 0.1315 - val_accuracy: 0.9773
Epoch 9/10
360/360 - 1373s - loss: 0.0994 - accuracy: 0.9793 - val_loss: 0.0773 - val_accuracy: 0.9778
Epoch 10/10
360/360 - 1352s - loss: 0.0778 - accuracy: 0.9839 - val_loss:

<tensorflow.python.keras.callbacks.History at 0x21c03342308>

In [10]:
test_loss, test_accuracy=model.evaluate(test_data)
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%.'.format(test_loss, test_accuracy*100.))

Test loss: 0.15. Test accuracy: 97.60%.


Combine all the methods and try to achieve 98.5%+ accuracy. Achieving 98.5% accuracy with the methodology we've seen so far is extremely hard. A more realistic exercise would be to achieve 98%+ accuracy. However, being pushed to the limit (trying to achieve 98.5%), you have probably learned a whole lot about the machine learning process.

Here is a link where you can check the results that some leading academics got on the MNIST (using different methodologies):
https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results


hidden_layer_size=5000 

batch_size=150

num_epochs=10

activation are 'relu'

There're better solutions using this methodology, this one is just superior to the one in the lessons. Due to the width and depth of the algorithm, it took like 3 hours and 47 mins to train it.

The final accuracy is like 97.60%.