# Optimizers

On this notebook we will take a look at some of the different optimizers present in keras backend and will compare them in a Tensorboard visualization.

## The data
We will use our old friend MNIST for its simplicity. 

<font color=red><b>Load the dataset and preprocess it. Keep in mind that we are going to use convolutions and it needs to be transformed into 4 dim tensors.
<br>Hint: use the expand_dims function from numpy</b>
</font>

In [1]:
import os, time
from numpy import expand_dims
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

tf.keras.backend.clear_session() 
from tensorflow.keras.datasets import mnist

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

x_train = expand_dims(x_train, 3)
x_test = expand_dims(x_test, 3)

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


## Model Architecture
We are going to use convolutions on this example. Please don't be afraid, you will have a lot of convolution theory later on CNN block. Our model will consist on:
- A conv2d layer with 16 units and 3x3 conv shape, relu-activated.
- A maxPool layer
- A conv2d layer with 16 units and 3x3 conv shape, relu-activated.
- A maxPool layer
- A flatten layer, followed by a dense softmax activated with the amount of categories as the number of units.
- Our optimizer will be 'SGD' and we will optimize sparse categorical crossentropy. Add accuracy as a metric.

<font color=red><b> Build the model
</font>

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

optimizer = 'SGD'

model = Sequential()
model.add(Conv2D(16, (3,3), activation='relu', input_shape=x_train[0].shape))
model.add(MaxPooling2D())
model.add(Conv2D(16, (3,3), activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()




Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 16)        160       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 16)        2320      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten (Flatten)            (None, 400)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                4010      
Total params: 6,490
Trainable params: 6,490
Non-trainable params: 0
______________________________________________________

<font color=red><b> Train the model for 5 epochs, with a batch size of 32. Evaluate its performance
</font>

In [3]:
model.fit(x_train, y_train, batch_size=32, epochs=5)
model.evaluate(x_test, y_test, verbose=0)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.09189745162166656, 0.9724]

## Optimizer benchmark

We are going to compare more than one optimizer on the same problem. 

<font color=red><b> Create a build_model function with the optimizer as a parameter and the same model architecture. Try the one you want
</font>

In [4]:
def build_model(optimizer):
    model = Sequential()
    model.add(Conv2D(16, (3,3), activation='relu', input_shape=x_train[0].shape))
    model.add(MaxPooling2D())
    model.add(Conv2D(16, (3,3), activation='relu'))
    model.add(MaxPooling2D())
    model.add(Flatten())
    model.add(Dense(10, activation='softmax'))
    model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

Now let's run the model with all the optimizers in the list and view the results in tensorboard


<font color=red><b> Let's do precisely that!
    <br> Hint: remember to add the tensorboard as a callback for the training.
    <br> Hint2: use the function os.path.join to include the optimizer name on each model call
    
</font>

In [5]:
from tensorflow.keras.callbacks import TensorBoard
log_path = '/home/fer/data/formaciones/afi/tensorboard_log/optimizers_experiment'
optimizers = [
    'Adadelta',
    'Adagrad',
    'Adam',
    'Adamax',
    'Nadam',
    'RMSprop',
    'SGD']
for optimizer in optimizers:
    model = build_model(optimizer)
    tensorboard = TensorBoard(os.path.join(log_path,f'{optimizer}_{time.time()}'))
    model.fit(x_train, y_train, batch_size=32, epochs=5, callbacks=[tensorboard])

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
