# Experimenting with different optimizers, network layouts, and activation functions

Drawing on the simple MNIST classifier from task 6.2, evaluate different optimizers, network layouts, and activation functions.  
To this end, first learn how to visualize the training history of a Keras model:
* [Using the `History` of a training](https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras) (not detailed enough)
* [Visualizing loss and metrics during training](https://www.tensorflow.org/guide/keras/train_and_evaluate#visualizing_loss_and_metrics_during_training)
* [Using the `TensorBoard` callback in Keras](https://www.tensorflow.org/tensorboard/scalars_and_keras)

To see the detailed evolution of errors, you should consider a resolution as fine-grained as your mini-batches.

## Task 7.1

Consider the simple network below (with a single hidden layer of 64 neurons) and compare the learning curves of networks with ReLu, logistic and tanh activations.
What happens if you use different activation functions also in the output layer?

## Task 7.2

Going back to the original ReLu-based network, compare [various optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers), i.e. SGD, RMSProp, Adam, ...
How does their performance differ?

## Task 7.3

Compare various network layouts:
* Try to adapt the number of hidden neurons, e.g. 32, 64, 128.
* Try to add more hidden layers.
* What is the best performance you can achieve (neglecting all regularization approaches for now)?

In [3]:
from tensorflow import keras

# Load MNIST data set
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are NumPy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

In [None]:
# Neural Network with a single hidden layer
from keras import models, layers
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dense(10))

model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
            # Loss function to minimize
            loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            # List of metrics to monitor
            metrics=['sparse_categorical_accuracy'])

history = model.fit(x_train, y_train, batch_size=100, epochs=3,
                    validation_data=(x_val, y_val))