It helps to avoid local optimas when using smaller learning rates. However, it often takes longer to converge. What can help shorten the training time is using a warm-up period. In this period, we can use a bigger learning rate for the first few epochs. After a certain number of epochs, we can decrease the learning rate. 
<br><br>
It's even possible to decrease the learning rate after each step, but this is not recommended, because you might be better off using a different optimizer instead (for example, if you want to use decay, you can specify this in as a hyperparameter). In theory, when the learning rate is too big during the warm-up period, it can be the case that you won't be able to reach the global optima at all.

In [None]:
import math

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD

from keras.callbacks import EarlyStopping, TensorBoard, ModelCheckpoint, LearningRateScheduler, Callback
from keras import backend as K

In [None]:
n_classes = 5

In [None]:
train_datagen = ImageDataGenerator(rescale=1./255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   width_shift_range=0.1,
                                   height_shift_range=0.1,
                                   horizontal_flip=True,
                                   vertical_flip=False,
                                   validation_split=0.25)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory('data',
                                                target_size = (150,150),
                                                 batch_size = batch_size,
                                                 class_mode = 'categorical',
                                                 subset = "training")

validation_set = train_datagen.flow_from_directory('data',
                                            target_size = (150,150),
                                            batch_size = batch_size,
                                            class_mode = 'categorical',
                                            subset = "validation")

Other than a custom callback function, Keras also provides a convenient LearningRateScheduler and ReduceLROnPlateau callback function. <br><br>With these callbacks, you can implement an epoch-dependent learning rate scheme or reduce the learning rate if a monitored loss or metric reaches a plateau.

In [None]:
learning_rate_schedule = {0: '0.1', 10: '0.01', 25: '0.0025'}

class get_learning_rate(Callback):
    def on_epoch_end(self, epoch, logs={}):
        optimizer = self.model.optimizer
        if epoch in learning_rate_schedule:
            K.set_value(optimizer.lr, learning_rate_schedule[epoch])
        lr = K.eval(optimizer.lr)
        print('\nlr: {:.4f}'.format(lr))

In [None]:
callbacks =[EarlyStopping(monitor='val_acc', patience=5, verbose=2),
            ModelCheckpoint('checkpoints/{epoch:02d}.h5', 
            save_best_only=True),
            TensorBoard('~/notebooks/logs-lrscheduler',
            write_graph=True, write_grads=True, 
            write_images=True, embeddings_freq=0, 
            embeddings_layer_names=None,
            embeddings_metadata=None),
            get_learning_rate()
            ]

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', input_shape = (150, 150,3)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes))
model.add(Activation('softmax'))
model.summary()

In [None]:
optimizer = SGD()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

In [None]:
n_epochs = 20
batch_size = 128

history = model.fit(training_set, epochs=n_epochs, batch_size=batch_size, 
          validation_data = validation_set,
          verbose = 1, callbacks=callbacks)