Adapting learning rate for schochatic gradient descent optimization task can increase the performance and reduce training 
time. Often large learning rate is often at the beginning of training time, it can be reduce at the end. Two popular ways
are:    
   >Decreasing the learning rate gradually based on epoch   
   >Decreasing the learning rate using punctuated large drops at specific epoch       

In [1]:
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras .optimizers import SGD
from sklearn.preprocessing import LabelEncoder

Using TensorFlow backend.


In [2]:
seed=7
np.random.seed(seed)
dataframe=pd.read_csv('/home/tri/Downloads/MLdatasets/ionosphere.data',header=None)
dataset=dataframe.values

In [3]:
X=dataset[:,0:34].astype(float)
y=dataset[:,34]
# tranform categorical values
encoder = LabelEncoder()
encoder.fit(y)
y=encoder.transform(y)

In [4]:
def create_model():
    model =Sequential()
    model.add(Dense(34,input_dim=34, kernel_initializer='normal',activation='relu'))
    model.add(Dense(1,kernel_initializer='normal',activation='sigmoid'))
    # compile model
    epochs=50
    learning_rate=0.1
    decay_rate =learning_rate/epochs
    momentum =0.8
    sgd =SGD(lr=learning_rate,momentum=momentum,decay=decay_rate,nesterov=False)
    model.compile(loss='binary_crossentropy',optimizer=sgd,metrics=['accuracy'])
    model.fit(X,y,validation_split=0.33,epochs=epochs, batch_size=28, verbose=2)
    return model
create_model()

Train on 235 samples, validate on 116 samples
Epoch 1/50
2s - loss: 0.6812 - acc: 0.6468 - val_loss: 0.6353 - val_acc: 0.8793
Epoch 2/50
0s - loss: 0.6349 - acc: 0.7319 - val_loss: 0.5214 - val_acc: 0.8362
Epoch 3/50
0s - loss: 0.5534 - acc: 0.8213 - val_loss: 0.4690 - val_acc: 0.8793
Epoch 4/50
0s - loss: 0.4609 - acc: 0.8468 - val_loss: 0.4394 - val_acc: 0.9310
Epoch 5/50
0s - loss: 0.3788 - acc: 0.8596 - val_loss: 0.2744 - val_acc: 0.9483
Epoch 6/50
0s - loss: 0.3110 - acc: 0.8894 - val_loss: 0.3868 - val_acc: 0.8879
Epoch 7/50
0s - loss: 0.2725 - acc: 0.9106 - val_loss: 0.2212 - val_acc: 0.9483
Epoch 8/50
0s - loss: 0.2383 - acc: 0.9106 - val_loss: 0.1424 - val_acc: 0.9569
Epoch 9/50
0s - loss: 0.2420 - acc: 0.9106 - val_loss: 0.2207 - val_acc: 0.9397
Epoch 10/50
0s - loss: 0.2004 - acc: 0.9234 - val_loss: 0.2512 - val_acc: 0.9224
Epoch 11/50
0s - loss: 0.1903 - acc: 0.9277 - val_loss: 0.1829 - val_acc: 0.9569
Epoch 12/50
0s - loss: 0.1709 - acc: 0.9447 - val_loss: 0.1107 - val_acc

<keras.models.Sequential at 0x7f8d5f2c0b00>

## Drop learning rate schedule   
In this method, learning rate will be dropped at specific time during training. For example,learning rate can be dropped by half every fixed number of epochs. Keras provides LearningRateScheduler callback when fitting model.

In [5]:
import math
from keras.callbacks import LearningRateScheduler

In [13]:
# Learning rate schedule
def step_decay(epoch):
    initial_lrate = 0.1
    drop = 0.5
    epochs_drop =10.0
    lrate = initial_lrate * math.pow(drop,math.floor((1+epoch)/epochs_drop))
    return lrate

In [14]:
seed=7
np.random.seed(seed)
def create_model():
    model=Sequential()
    model.add(Dense(34,input_dim=34,kernel_initializer='normal',activation='relu'))
    model.add(Dense(1,kernel_initializer='normal',activation='sigmoid'))
    sgd =SGD(lr=0.0, momentum=0.9,decay=0.0,nesterov=False)
    model.compile(loss='binary_crossentropy',optimizer=sgd, metrics=['accuracy'])
    # learning schedule callback
    lrate = LearningRateScheduler(step_decay)
    callbacks_list = [lrate]
    model.fit(X,y,validation_split=0.33,epochs=50,batch_size=28,callbacks=callbacks_list,verbose=2)
    return model
create_model()

Train on 235 samples, validate on 116 samples
Epoch 1/50
0s - loss: 0.6801 - acc: 0.6426 - val_loss: 0.6155 - val_acc: 0.9138
Epoch 2/50
0s - loss: 0.6167 - acc: 0.7277 - val_loss: 0.4700 - val_acc: 0.9052
Epoch 3/50
0s - loss: 0.4906 - acc: 0.8255 - val_loss: 0.3475 - val_acc: 0.9483
Epoch 4/50
0s - loss: 0.3570 - acc: 0.8681 - val_loss: 0.3542 - val_acc: 0.8966
Epoch 5/50
0s - loss: 0.2710 - acc: 0.8936 - val_loss: 0.1468 - val_acc: 0.9655
Epoch 6/50
0s - loss: 0.2139 - acc: 0.9234 - val_loss: 0.2216 - val_acc: 0.9397
Epoch 7/50
0s - loss: 0.1775 - acc: 0.9362 - val_loss: 0.1500 - val_acc: 0.9741
Epoch 8/50
0s - loss: 0.1564 - acc: 0.9404 - val_loss: 0.0860 - val_acc: 0.9741
Epoch 9/50
0s - loss: 0.1700 - acc: 0.9319 - val_loss: 0.1590 - val_acc: 0.9569
Epoch 10/50
0s - loss: 0.1239 - acc: 0.9617 - val_loss: 0.0918 - val_acc: 0.9914
Epoch 11/50
0s - loss: 0.1112 - acc: 0.9660 - val_loss: 0.1070 - val_acc: 0.9914
Epoch 12/50
0s - loss: 0.1036 - acc: 0.9617 - val_loss: 0.0771 - val_acc

<keras.models.Sequential at 0x7f8d5db88e48>

## Guideline: Using learning rate schedule
* Increase the initial learning rate: Intial high learning rate can result in  a lot of weight change which in turn is benefit from fine tuning later
* Larg momentum: this will can help the optimization algorithm to continue to make updates  in the right direction when your learning rate shrinks to small values.
* Experiment with different schedules: Try schedules that change exponentially and even schedules that respond to the accuracyof your model.