# <center> Keras </center>
## <center>1.6.2 Optimizers</center>

# Optimizers

An optimizer determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent.

The optimizer function tries to minimize the loss.
<img src="img/structure_optimizer.PNG" width="60%" />
Some optimizer functions:
- SGD
- RMSprop
- Adagrad
- Adadelta
- Adam
- Adamax
- Nadam

### Follow opposite direction of gradient to achieve less loss. 
<img src="img/optimizer2.png" width="40%" />

### The algorithm could get stuck in a local minimum -> global minimum never reached.
<img src="img/optimizer1.png" width="40%" />

### Other functions can avoid this issue.
<img src="img/optimizer.gif" width="50%" />

### Evolution of optimizers
<img src="img/optimizers.png" width="70%" />

## Best practice
You can use Adam as a default.

# Code

Let us run the code from the previous section first.

In [None]:
# Importing the MNIST dataset
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Processing the input data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

# Processing the output data
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Build a network
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(units=512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(units=10, activation='softmax'))

Let us now examine the code related to compiling a network. 

In [None]:
# Compile the network
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

Here we are using `rmsprop` as the optimizer and `categorical_crossentropy` as the loss function. The performance metric is `accuracy`.

In [None]:
import matplotlib.pyplot as plt
def plot_training_history(history):
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    #loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()

In [None]:
# Train the network
history = network.fit(train_images, train_labels, epochs=5, batch_size=128, 
                      verbose=1, validation_data=(test_images, test_labels))

# Plot the training results
plot_training_history(history)

# Task
Change the optimizer and note down the results.

# Summary
In this section we learned about the optimizers.

# Feedback
<a href = "http://goto/ml101_doc/Keras10">Feedback: Optimizers</a> <br>

# Navigation

<div>
<span> <h3 style="display:inline">&lt;&lt; Prev: <a href = "keras09.ipynb">Loss functions</a></h3> </span>
<span style="float: right"><h3 style="display:inline">Next: <a href = "keras11.ipynb">Train a network</a> &gt;&gt; </h3></span>
</div>