### Learning Rate Schedulers
Two different kinds of learning rate schedulers are discussed here: (i) keras default decay and (ii) polynomial decay. 

As we know that starting with a high learning rate is good to take big jumps in the loss landscape and then, slowly 
decrease the learning rate so that it converges to the local/global minima. These high LR and min LR can be simply found by exploring various learning rates. It is assumed that you have found out those LRs already. 

Purpose of this jupyter notebook is to plot the schedulers and to see if they go down to the min LR. 

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt 
%matplotlib inline

In [None]:
# configuration parameters 
TRAIN_SAMPLES = 2400
BATCH_SIZE = 32
EPOCHS = 100
INIT_LR = 0.01

In [None]:
# three sets of learning rate decays which can be used along with SGD optimizer 
class PolynomialDecay():
    """ polynomial decay of the learning rate. Note: if power = 1 then it is a linear decay """
    def __init__(self, maxEpochs=EPOCHS, initLR=INIT_LR, power=1.0):
        self.maxEpochs = maxEpochs 
        self.initLR = initLR 
        self.power = power 
        self.epochs = []
        self.lrs = []
        
    def __call__(self, epoch):
        """ compute the new learning rate based on polynomial decay """
        decay = (1 - (epoch / float(self.maxEpochs))) ** self.power 
        lr = self.initLR * decay
        
        # save the epochs and lrs for plotting 
        self.epochs.append(epoch)
        self.lrs.append(lr)
        
        return float(lr)
    
    def plot(self):        
        plt.style.use("ggplot")
        plt.figure()
        plt.plot(self.epochs, self.lrs)
        plt.title(f"Polynomial LR Scheduler with {self.power} degree")
        plt.xlabel("Epoch #")
        plt.ylabel("Learning Rate")

In [None]:
poly_decay = PolynomialDecay(power=2)

for i in range(EPOCHS):
    print(poly_decay(i))
    
poly_decay.plot()

In [None]:
# three sets of learning rate decays which can be used along with SGD optimizer 
class KerasDecay():
    """ Keras decaying of the learning rate """
    def __init__(self, initLR=INIT_LR, decay=None):
        self.initLR = initLR
        if decay is None: # then use a default decay value 
            self.decay = self.initLR / BATCH_SIZE
            print(f"using the default decay rate: {self.decay}")
        else:
            self.decay = decay 
        self.iterations = TRAIN_SAMPLES / BATCH_SIZE # it is steps per epoch or, total num of batches per epoch 
        self.epochs = []
        self.lrs = []
        
    def __call__(self, epoch):
        """ compute the new learning rate based on every batch update """
        lr = self.initLR * (1. / (1. + self.decay * (epoch *self.iterations))) 
        
        # save the epochs and lrs for plotting 
        self.epochs.append(epoch)
        self.lrs.append(lr)
        return float(lr)
    
    def plot(self):        
        plt.style.use("ggplot")
        plt.figure()
        plt.plot(self.epochs, self.lrs)
        plt.title(f"Keras default LR Scheduler")
        plt.xlabel("Epoch #")
        plt.ylabel("Learning Rate")

In [None]:
#keras_decay = KerasDecay(decay=1e-2)
#keras_decay = KerasDecay()
keras_decay = KerasDecay(decay = 0.01/EPOCHS)

for i in range(EPOCHS):
    print(keras_decay(i))
    
keras_decay.plot()