## Learning Rate Schedule For Training Models
Adapting the learning rate for your stochastic gradient descent optimization procedure can increase performance and reduce training time. 
1. Decrease the learning rate gradually based on the epoch.
2. Decrease the learning rate using punctuated large drops at specific epochs.

### Time based learning rate schedule
Keras has a time-based learning rate schedule built in. The stochastic gradient descent optimiza- tion algorithm implementation in the SGD class has an argument called decay. This argument is used in the time-based learning rate decay schedule equation as follows:
![Screen%20Shot%202018-06-17%20at%202.48.25%20PM.png](attachment:Screen%20Shot%202018-06-17%20at%202.48.25%20PM.png)

In [2]:
# time based learning rate decay
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from sklearn.preprocessing import LabelEncoder

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [18]:
np.random.seed(7)

#load dataset
dataframe = read_csv("Accessory_files/ionosphere.csv", header = None)
dataset = dataframe.values

#split the inputs
X = dataset[:,0:34].astype(float)
Y = dataset[:,34]

#encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

#create model
model = Sequential()
model.add(Dense(34, input_dim = 34, kernel_initializer='normal', activation='relu'))
model.add(Dense(1,kernel_initializer='normal',activation='sigmoid'))

#compile
epochs = 50
learning_rate = 0.1
decay_rate = learning_rate/epochs
momentum=0.9
sgd = SGD(lr = learning_rate, momentum=momentum, decay = decay_rate, nesterov=False)
model.compile(loss = 'binary_crossentropy', optimizer = sgd, metrics=['accuracy'])

model.fit(X,encoded_Y, validation_split=0.33, epochs=epochs, batch_size=28, verbose=2)

Train on 235 samples, validate on 116 samples
Epoch 1/50
 - 0s - loss: 0.6803 - acc: 0.6468 - val_loss: 0.6203 - val_acc: 0.9138
Epoch 2/50
 - 0s - loss: 0.6201 - acc: 0.7234 - val_loss: 0.4763 - val_acc: 0.8707
Epoch 3/50
 - 0s - loss: 0.5015 - acc: 0.8298 - val_loss: 0.3793 - val_acc: 0.9397
Epoch 4/50
 - 0s - loss: 0.3668 - acc: 0.8553 - val_loss: 0.3906 - val_acc: 0.8621
Epoch 5/50
 - 0s - loss: 0.2855 - acc: 0.8681 - val_loss: 0.1530 - val_acc: 0.9655
Epoch 6/50
 - 0s - loss: 0.2199 - acc: 0.9277 - val_loss: 0.2410 - val_acc: 0.9310
Epoch 7/50
 - 0s - loss: 0.1848 - acc: 0.9362 - val_loss: 0.1411 - val_acc: 0.9655
Epoch 8/50
 - 0s - loss: 0.1685 - acc: 0.9277 - val_loss: 0.0920 - val_acc: 0.9741
Epoch 9/50
 - 0s - loss: 0.1763 - acc: 0.9362 - val_loss: 0.2079 - val_acc: 0.9397
Epoch 10/50
 - 0s - loss: 0.1451 - acc: 0.9404 - val_loss: 0.1241 - val_acc: 0.9828
Epoch 11/50
 - 0s - loss: 0.1261 - acc: 0.9532 - val_loss: 0.0829 - val_acc: 0.9914
Epoch 12/50
 - 0s - loss: 0.1182 - acc:

<keras.callbacks.History at 0x1a20053a90>

### Drop Based Learning Rate Schedule
nother popular learning rate schedule used with deep learning models is to systematically drop the learning rate at specific times during training. Often this method is implemented by dropping the learning rate by half every fixed number of epochs. For example, we may have an initial learning rate of 0.1 and drop it by a factor of 0.5 every 10 epochs. The first 10 epochs of training would use a value of 0.1, in the next 10 epochs a learning rate of 0.05 would be used, and so on. If we plot out the learning rates for this example out to 100 epochs you get the graph below showing learning rate (y-axis) versus epoch (x-axis).
![Screen%20Shot%202018-06-17%20at%203.05.07%20PM.png](attachment:Screen%20Shot%202018-06-17%20at%203.05.07%20PM.png)
![Screen%20Shot%202018-06-17%20at%203.03.30%20PM.png](attachment:Screen%20Shot%202018-06-17%20at%203.03.30%20PM.png)

In [19]:
from keras.callbacks import LearningRateScheduler
import math

In [20]:
# learning rate schedule
def step_decay(epoch):
  initial_lrate = 0.1
  drop = 0.5
  epochs_drop = 10.0
  lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
  return lrate

In [22]:
np.random.seed(7)

#load dataset
dataframe = read_csv("Accessory_files/ionosphere.csv", header = None)
dataset = dataframe.values

#split the inputs
X = dataset[:,0:34].astype(float)
Y = dataset[:,34]

#encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

#create model
model = Sequential()
model.add(Dense(34, input_dim = 34, kernel_initializer='normal', activation='relu'))
model.add(Dense(1,kernel_initializer='normal',activation='sigmoid'))

#compile
sgd = SGD(lr = 0.0, momentum=0.9, decay = 0.0, nesterov=False)
model.compile(loss = 'binary_crossentropy', optimizer = sgd, metrics=['accuracy'])

#learning schedule callback
lrate = LearningRateScheduler(step_decay)
callbacks_list=[lrate]

#Fit the model
model.fit(X,encoded_Y, validation_split=0.33, epochs=50, batch_size=28, callbacks=callbacks_list, verbose=2)

Train on 235 samples, validate on 116 samples
Epoch 1/50
 - 0s - loss: 0.6803 - acc: 0.6468 - val_loss: 0.6199 - val_acc: 0.9138
Epoch 2/50
 - 0s - loss: 0.6195 - acc: 0.7234 - val_loss: 0.4761 - val_acc: 0.8621
Epoch 3/50
 - 0s - loss: 0.4985 - acc: 0.8255 - val_loss: 0.3656 - val_acc: 0.9483
Epoch 4/50
 - 0s - loss: 0.3623 - acc: 0.8596 - val_loss: 0.3781 - val_acc: 0.8879
Epoch 5/50
 - 0s - loss: 0.2791 - acc: 0.8809 - val_loss: 0.1528 - val_acc: 0.9655
Epoch 6/50
 - 0s - loss: 0.2181 - acc: 0.9234 - val_loss: 0.2144 - val_acc: 0.9397
Epoch 7/50
 - 0s - loss: 0.1797 - acc: 0.9362 - val_loss: 0.1483 - val_acc: 0.9655
Epoch 8/50
 - 0s - loss: 0.1586 - acc: 0.9404 - val_loss: 0.0848 - val_acc: 0.9741
Epoch 9/50
 - 0s - loss: 0.1828 - acc: 0.9362 - val_loss: 0.1650 - val_acc: 0.9569
Epoch 10/50
 - 0s - loss: 0.1267 - acc: 0.9574 - val_loss: 0.0947 - val_acc: 0.9914
Epoch 11/50
 - 0s - loss: 0.1146 - acc: 0.9574 - val_loss: 0.1018 - val_acc: 0.9914
Epoch 12/50
 - 0s - loss: 0.1055 - acc:

<keras.callbacks.History at 0x1a20247f98>

## Tips for using learning rate schedules
1. Increase the initial learning rate
2. Use a large momentum
3. Experiement with different schedules