# Experimenting with different optimizers, network layouts, and activation functions

Drawing on the simple MNIST classifier from task 6.2, evaluate different optimizers, network layouts, and activation functions.  
To this end, first learn how to visualize the training history of a Keras model:
* [Using the `History` of a training](https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras) (not detailed enough)
* [Visualizing loss and metrics during training](https://www.tensorflow.org/guide/keras/training_with_built_in_methods#visualizing_loss_and_metrics_during_training)
* [Using the `TensorBoard` callback in Keras](https://www.tensorflow.org/tensorboard/scalars_and_keras)

To see the detailed evolution of errors, you should consider a resolution as fine-grained as your mini-batches.

## Task 7.1

Consider the simple network below (with a single hidden layer of 64 neurons) and compare the learning curves of networks with ReLu, logistic and tanh activations.
What happens if you use different activation functions also in the output layer?

## Task 7.2

Going back to the original ReLu-based network, compare [various optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers), i.e. SGD, RMSProp, Adam, ...
How does their performance differ?

## Task 7.3

Compare various network layouts:
* Try to adapt the number of hidden neurons, e.g. 32, 64, 128.
* Try to add more hidden layers.
* What is the best performance you can achieve (neglecting all regularization approaches for now)?

In [3]:
from tensorflow import keras

# Load MNIST data set
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

In [None]:
# Neural Network with a single hidden layer
from tensorflow.keras import models, layers
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(10))

### Visualization of Training & Validation accuracies and losses
The 2 ways we could do so are:

- Manually plotting history values stored by Keras via matplotlib,
- or automatically with the help of TensorBoard!

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Function to plot model's training and validation loss/accuracy over time
def plot_results(histories, path_to_save=None, model_name=None):
    fig, axs = plt.subplots(1, 2, figsize=(25, 8))
    for key, history in histories.items():
      # summarize history for accuracy
      axs[0].plot(history['sparse_categorical_accuracy'], label=key + ' train')
      axs[0].plot(history['val_sparse_categorical_accuracy'], label=key + ' val')
      axs[0].set_title('Model Accuracy')
      axs[0].set_xlabel('Epoch')
      axs[0].set_ylabel('Accuracy')
      axs[0].legend()

      # summarize history for loss
      axs[1].plot(history['loss'], label=key + ' train')
      axs[1].plot(history['val_loss'], label=key + ' val')
      axs[1].set_title('Model Loss')
      axs[1].set_xlabel('Epoch')
      axs[1].set_ylabel('Loss')
      axs[1].legend()

    plt.show()

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard
# Clear any logs from previous runs
!rm -rf ./logs

In [None]:
def train_model(model, label, optimizer=keras.optimizers.legacy.RMSprop(), epochs=10):
    model.compile(optimizer=optimizer,
                  # Loss function to minimize
                  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  # List of metrics to monitor (additionally to loss)
                  metrics=['sparse_categorical_accuracy'])

    # Create TensorBoard callback writing its data into label-specific log dir
    tb = keras.callbacks.TensorBoard(log_dir=log_dir + label, update_freq='batch')
    # Fit data to the model!
    history = model.fit(x_train, y_train, batch_size=100, epochs=epochs,
                        validation_data=(x_val, y_val),
                        callbacks=[tb])  # callbacks are called after each batch and epoch

    # return Keras history as dictionary indexed by label
    return {label: history.history}

results = dict() # initialize empty results dict

### Task 7.1

In [ ]:
log_dir = 'logs/activations/'

for activation in ['relu', 'tanh', 'sigmoid']:
    model = models.Sequential()
    model.add(layers.Dense(64, activation=activation, input_shape=(28 * 28,)))
    model.add(layers.Dense(10))
    results.update(train_model(model, label=activation))

for activation in ['relu', 'tanh', 'sigmoid']:
    model = models.Sequential()
    model.add(layers.Dense(64, activation=activation, input_shape=(28 * 28,)))
    model.add(layers.Dense(10, activation=activation))
    results.update(train_model(model, label=activation + ' out'))

In [None]:
# Show the results
plot_results(results)
%tensorboard --logdir logs/activations

### Task 7.2

In [ ]:
log_dir = 'logs/optimizers/'

from tensorflow.keras.optimizers import Adadelta, Adagrad, Adam, RMSprop, SGD

results=dict()
for opt in [Adadelta(), Adagrad(), Adam(), RMSprop(), SGD()]:
    label = type(opt).__name__
    print(label)
    # recreate model to have a fresh set of random weights
    model = models.Sequential()
    model.add(layers.Dense(64, activation='relu', input_shape=(28 * 28,)))
    model.add(layers.Dense(10))
    results.update(train_model(model, label=label, optimizer=opt, epochs=20))

In [None]:
# Show the results
plot_results(results)
%tensorboard --logdir logs/optimizers

### Task 7.3

In [ ]:
log_dir = 'logs/nets/'

results=dict()
for layer in [[32], [64], [128], [256], [64, 64], [128, 64], [64, 64, 64]]:
    label = ' - '.join([str(n) for n in layer])
    print(label)
    model = models.Sequential()
    for num in layer:
        model.add(layers.Dense(num, activation='relu', input_shape=(28 * 28,)))
    model.add(layers.Dense(10))
    results.update(train_model(model, label=label, epochs=20))

In [None]:
# Show the results
plot_results(results)
%tensorboard --logdir logs/nets