We will analyze the effect of learning rate adaption while training the models on the skin cancer dataset. Keras has a time-based learning rate schedule built in. The stochastic gradient descent optimization algorithm implementation in the SGD class has an argument called decay. This argument is used in the time-based learning rate decay schedule equation as follows:

LearningRate = LearningRate * 1/1+(decay*epoch)

When the decay argument is zero (the default), this has no effect on the learning rate (e.g.0.1).

LearningRate = 0.1 * 1/(1 + 0.0 * 1)

LearningRate = 0.1

When the decay argument is specied, it will decrease the learning rate from the previous epoch by the given fixed amount. You can create a nice default schedule by setting the decay value as follows:

Decay = LearningRate / Epochs
Decay = 0.1 / 100
Decay = 0.001

Lets see the time-based learning rate adaptation schedule in Keras. A small neural network model is constructed with a single hidden layer with 50 neurons and using the rectier activation function. The output layer has three neurons and uses the sigmoid activation function in order to output probability-like values. The learning rate for stochastic gradient descent has been set to a higher value of 0.1. The model is trained for 100 epochs and the decay argument has been set to 0.001, calculated as 0:1/100 . Additionally, it can be a good idea to use momentum when using an adaptive learning rate. In this case we use a momentum value of 0.9.

In [None]:
#load libraries
import numpy
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from keras.utils import np_utils
from keras.optimizers import SGD
import math
from keras.callbacks import LearningRateScheduler
%matplotlib inline

In [None]:
#load the data and perform one-hot encoding
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
dataframe = read_csv("Book1.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:15].astype(float)
Y = dataset[:,15]

#One hot encoding or creating dummy variables from a categorical variable (class)
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

In [None]:
#lets perform data standardization
scaler = MinMaxScaler().fit(X)
X = scaler.transform(X)
scaler = StandardScaler().fit(X)
X = scaler.transform(X)

In [None]:
#Lets create a simple baseline model
def baseline_model():
    model = Sequential()
    model.add(Dense(50, input_dim=15, kernel_initializer='normal', activation='relu'))
    model.add(Dense(3, kernel_initializer='normal', activation='sigmoid'))
    
    # Compile model
    epochs = 100
    learning_rate = 0.1
    decay_rate = learning_rate / epochs
    momentum = 0.9
    sgd = SGD(lr=learning_rate, momentum=momentum, decay=decay_rate, nesterov=False)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model

estimators = []
estimators.append(('MinMaxScale', MinMaxScaler()))
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=baseline_model, epochs=100,
batch_size=5, verbose=1)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=2, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Drop based learning rate schedule:

Another popular learning rate schedule used with deep learning models is to systematically drop the learning rate at specic times during training. Often this method is implemented by dropping the learning rate by half every fixed number of epochs. 

For example, we may have an initial learning rate of 0.1 and drop it by a factor of 0.5 every 10 epochs. The first 10 epochs of training would use a value of 0.1, in the next 10 epochs a learning rate of 0.05 would be used, and so on. We can implement this in Keras using the LearningRateScheduler callback when fitting the model. The LearningRateScheduler callback allows us to define a function to call that takes the epoch number as an argument and returns the learning rate to use in stochastic gradient descent. When used, the learning rate specied by stochastic gradient descent is ignored. 

A new step decay() function is defined that implements the equation:

LearningRate = InitialLearningRate*DropRate^floor( 1+Epoch/EpochDrop )

Where InitialLearningRate is the learning rate at the beginning of the run, EpochDrop is how often the learning rate is dropped in epochs and DropRate is how much to drop the learning rate each time it is dropped.

In [None]:
# learning rate schedule
def step_decay(epoch):
    initial_lrate = 0.2
    drop = 0.5
    epochs_drop = 10.0
    lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
    return lrate

In [None]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
dataframe = read_csv("Book1.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:15].astype(float)
Y = dataset[:,15]

#One hot encoding or creating dummy variables from a categorical variable (class)
# encode class values as integers

encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

In [None]:
# create new baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(50, input_dim=15, kernel_initializer='normal', activation='relu'))
    model.add(Dense(3, kernel_initializer='normal', activation='sigmoid'))
    sgd = SGD(lr=0.0, momentum=0.9, decay=0, nesterov=False)
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model
    
#learning schedule callback
lrate = LearningRateScheduler(step_decay)
callbacks_list = [lrate]
estimators = []
estimators.append(('MinMaxScale', MinMaxScaler()))
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=baseline_model, epochs=100,
batch_size=5, callbacks=[lrate], verbose=1)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=2, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))