Often called an adaptive learning rate or an annealed learning rate, this is a technique where the learning rate used by stochastic gradient descent changes while training your model.

Keras has a time-based learning rate schedule built into the implementation of the stochastic gradient descent algorithm in the SGD class.

When constructing the class, you can specify the decay which is the amount that your learning rate (also specified) will decrease each epoch. When using learning rate decay you should bump up your initial learning rate and consider adding a large momentum value such as 0.8 or 0.9.

Your goal in this lesson is to experiment with the time-based learning rate schedule built into Keras.

For example, you can specify a learning rate schedule that starts at 0.1 and drops by 0.0001 each epoch as follows:

from keras.optimizers import SGD

...

sgd = SGD(lr=0.1, momentum=0.9, decay=0.0001, nesterov=False)

model.compile(..., optimizer=sgd)

Reference: https://keras.io/optimizers/?__s=kykbbjgkoezgqghgq8jc#sgd

keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)

In [None]:
from keras import optimizers

model = Sequential()
model.add(Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(Activation('softmax'))

sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)

In [None]:
# pass optimizer by name: default parameters will be used
model.compile(loss='mean_squared_error', optimizer='sgd')

In [None]:
keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)

<b>Arguments</b>

lr: float >= 0. Learning rate.

momentum: float >= 0. Parameter that accelerates SGD in the relevant direction and dampens oscillations.

decay: float >= 0. Learning rate decay over each update.

nesterov: boolean. Whether to apply Nesterov momentum.