# Foundations of AI & ML
## Session 05
### Experiment 1 - Part 6
# Effect of LR Decay

**Objectives:** Here you will see how to vary the learning rate and observe the phenomenon of oscillation around the optimal value, and the effect of decreasing learning rate.

**Expected Time:** This Experiment should take around 15 mins

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

## Read the data

In [None]:
data = pd.read_csv("../Datasets/regr01.txt", sep=" ", header=None, names=['l', 't'])
print(data.head())
print(data.tail())

In [None]:
l = data['l'].values
t = data['t'].values
tsq = t * t

# Vanilla/Batch Gradient Descent

In [None]:
def train(x, y, m, c, eta):
    const = - 2.0/len(y)
    ycalc = m * x + c
    delta_m = const * sum(x * (y - ycalc))
    delta_c = const * sum(y - ycalc)
    m = m - delta_m * eta
    c = c - delta_c * eta
    error = sum((y - ycalc)**2)/len(y)
    return m, c, error

def train_on_all(x, y, m, c, eta, iterations=1000):
    for steps in range(iterations):
        m, c, err = train(x, y, m, c, eta)
    return m, c, err

# Effect of varying LR on error and final line

Let us vary LR and find how the error decreases in each case, and how the final line looks, by training each case for the same nuber of iterations - 2000.

### $\eta$ = 0.1

In [None]:
# Save errors
'''Checking the algo for different learning rates'''
errs_1 = []
m, c = 0, 0
eta = 0.1
for iteration in range(2000):
    m, c, error = train(l, tsq, m, c, eta)
    errs_1.append(error)

# Save final line
m_1, c_1 = m, c

### $\eta$ = 0.01

In [None]:
errs_01 = []
m, c = 0, 0
eta = 0.01
for iteration in range(2000):
    m, c, error = train(l, tsq, m, c, eta)
    errs_01.append(error)

# Save final line
m_01, c_01 = m, c

### $\eta$ = 0.001

In [None]:
errs_001 = []
m, c = 0, 0
eta = 0.001
for iteration in range(2000):
    m, c, error = train(l, tsq, m, c, eta) # We will plot the value of for every 100 iterations
    errs_001.append(error)

# Save final line
m_001, c_001 = m, c

### $\eta$ = 0.0001

In [None]:
errs_0001 = []
m, c = 0, 0
eta = 0.0001
for iteration in range(2000):
    m, c, error = train(l, tsq, m, c, eta) # We will plot the value of for every 100 iterations
    errs_0001.append(error)

# Save final line
m_0001, c_0001 = m, c

## Plot of lines vs $\eta$

In [None]:
# Find the lines
y_1 = m_1 * l + c_1
y_01 = m_01 * l + c_01
y_001 = m_001 * l + c_001
y_0001 = m_0001 * l + c_0001

In [None]:
plt.figure(figsize=(15, 8))
plt.plot(l, tsq, '.k')
plt.plot(l, y_1, "g")
plt.plot(l, y_01, "r")
plt.plot(l, y_001, "b")
plt.plot(l, y_0001, "y")
plt.legend(["l vs tsq","eta = 0.1","eta = 0.01","eta = 0.001","eta = 0.0001"])
plt.show()

'''the plot shows lines for different learning rates'''

Thus, we see that higher learning rates reach the best fit faster than lower learning rates (obviously).

## Plot of errors vs epochs for each $\eta$

In [None]:
epochs = range(0,2000)
plt.figure(figsize=(16,10))
plt.plot(epochs, errs_1, "g")
plt.plot(epochs, errs_01,"r")
plt.plot(epochs, errs_001,"b")
plt.plot(epochs, errs_0001,"y")
plt.legend(["eta = 0.1","eta = 0.01","eta = 0.001","eta = 0.0001"])
plt.show()

'''plot for errors vs epochs of each learning rate '''

# With LR Decay

In some cases, the learning rate might be too high to give good fitting lines. For example, let us train with constant LR of 0.8 and get the final line after 1000 iterations:

### $\eta$ = 0.8

In [None]:
errs = []
m, c = 0, 0
eta = 0.8
for times in range(1000):
    m, c, error = train(l, tsq, m, c, eta)
    errs.append(error)
    
m_normal, c_normal = m, c



Let us see the plot of error vs iterations:

In [None]:
plt.plot(range(len(errs)), errs)
plt.xlabel("Iterations")
plt.ylabel("Error")
plt.show()

We see that the error quickly goes to almost 0, but after some iterations blows up.

Let us check the "best fit" line that is found:

In [None]:
print("m = {0:.6} c = {1:.6} Error = {2:.6}".format(m_normal, c_normal, errs[-1]))

In [None]:
y = m_normal * l + c_normal 
plt.plot(l, tsq, '.k')
plt.plot(l,y,"r")
plt.show()

Clearly this is not ideal.

This was a simple case where we can see the learning rate is too high. There might be cases where it is not so simple to identify this. Also, having a low learning rate is not good because training time would be too high!

**Solution: Decay the learning rate.**

Now let us train another model with decaying lr. But let us not decay lr below 0.0001.

In [None]:
errs_decay = []
m, c = 0, 0
eta = 0.5
decay_factor = 0.99
for iteration in range(1000):
    eta = max(0.0001, eta * decay_factor)
    m, c, error = train(l, tsq, m, c, eta)
    errs_decay.append(error)

m_decay, c_decay = m, c

In [None]:
print("m = {0:.6} c = {1:.6} Error = {2:.6}".format(m_decay, c_decay, errs_decay[-1]))

In [None]:
plt.plot(range(len(errs_decay)), errs_decay)
plt.xlabel("Iterations")
plt.ylabel("Error")
plt.show()

In [None]:
y = m_decay * l + c_decay 
plt.plot(l, tsq, '.k')
plt.plot(l,y,"r")
plt.show()

Thus, this is correct.