# Different Optimizers
The most popular and well known optimizer is Stochastic Gradient Descent (SGD). This technique is widely used in other machine learning models as well. SGD is a method to find minima or maxima by iteration. There are many popular variants of SGD that try to speed up convergence and less tuning by using an adaptive learning rate.

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.optimizers import SGD, Adadelta, Adam, RMSprop, Adagrad, Nadam, Adamax

SEED = 2017

Using TensorFlow backend.


### Import the dataset and extract the target variable

In [2]:
data = pd.read_csv('data/winequality-red.csv', sep=';')
y = data['quality']
X = data.drop(['quality'], axis=1)

### Split the dataset for training, validation, and testing

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=SEED)

### Define a function that creates the model

In [4]:
def create_model(opt): 
    model = Sequential()
    model.add(Dense(100, input_dim=X_train.shape[1], activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(25, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1, activation='linear'))
    return model

### Create a function that defines callbacks we will be using during training

In [5]:
def create_callbacks(opt):
    callbacks = [
    EarlyStopping(monitor='val_acc', patience=200, verbose=2),
    ModelCheckpoint('data/optimizers_best_' + opt + '.h5', monitor='val_acc', save_best_only=True, verbose=0)
    ]
    return callbacks

### Create a dict of the optimizers we want to try

In [6]:
opts = dict({
    'sgd': SGD(),
     'sgd-0001': SGD(lr=0.0001, decay=0.00001),
     'adam': Adam(),
     'adadelta': Adadelta(),
     'rmsprop': RMSprop(),
     'rmsprop-0001': RMSprop(lr=0.0001),
     'nadam': Nadam(),
     'adamax': Adamax()
    })

### Train our networks and store results

In [7]:
batch_size = 128
n_epochs = 1000

results = []
# Loop through the optimizers
for opt in opts:
    model = create_model(opt)
    callbacks = create_callbacks(opt)
    model.compile(loss='mse', optimizer=opts[opt], metrics=['accuracy'])
    hist = model.fit(X_train.values, y_train, batch_size=batch_size, epochs=n_epochs, validation_data=(X_val.values, y_val), verbose=0,
    callbacks=callbacks)
    best_epoch = np.argmax(hist.history['val_acc'])
    best_acc = hist.history['val_acc'][best_epoch] 
    best_model = create_model(opt)
    # Load the model weights with the highest validation accuracy 
    best_model.load_weights('data/optimizers_best_' + opt + '.h5')
    best_model.compile(loss='mse', optimizer=opts[opt], metrics=['accuracy'])
    score = best_model.evaluate(X_test.values, y_test, verbose=0)
    results.append([opt, best_epoch, best_acc, score[1]])

Epoch 00201: early stopping
Epoch 00422: early stopping
Epoch 00393: early stopping
Epoch 00471: early stopping
Epoch 00450: early stopping
Epoch 00484: early stopping
Epoch 00292: early stopping
Epoch 00455: early stopping


### Compare the results

In [8]:
res = pd.DataFrame(results)
res.columns = ['optimizer', 'epochs', 'val_accuracy', 'test_accuracy']
res

Unnamed: 0,optimizer,epochs,val_accuracy,test_accuracy
0,sgd,0,0.0,0.0
1,sgd-0001,221,0.566406,0.59375
2,adam,192,0.589844,0.59375
3,adadelta,270,0.582031,0.56875
4,rmsprop,249,0.578125,0.590625
5,rmsprop-0001,283,0.574219,0.58125
6,nadam,91,0.578125,0.571875
7,adamax,254,0.585938,0.603125


Results of training with different optimizers on the Wine Quality dataset