# Implement an exponential $\epsilon$ schedule

In the last lesson, we implemented a **linear** exponential schedule with the function `linear_schedule()`. The function is given below for our reference.

In [None]:
def linear_schedule(step_num, end_step_num, start_step_num=0, end_value=0, start_value=1):
    """
    Returns: the value of epsilon at step_num
    y = slope * x + intercept
    """
    return start_value + ((end_value - start_value) / (end_step_num - start_step_num)) * step_num

# Your job in this assignment is to implement an exponentially decreasing schedule 

- Use the equation `epsilon = a * exp(-b * step_num)` to model the exponentially decreading schedule.
- You can calculate the correct values of `a` and `b`  by using the following conditions. 
    - `epsilon = start_value` at `step_num = start_step_num`
    - `epsilon = end_value` at `step_num = end_step_num`.
- Note: Since an exponential function can never take the value `0`, we use a very small value close to zero as the default `end_value` for `epsilon` (see the cell below). 

Ready? Your code goes below!

In [None]:
from math import exp, log

def exponential_schedule(step_num, end_step_num, start_step_num=0, end_value=0.0001, start_value=1):
    """
    Returns: the value of epsilon at step_num assuming an exponentially decreasing schedule
    epsilon = a * exp(-b * step_num)
    """
    # Your code goes here

# Check if your implementation is correct

- Run the cells below.
- The first cell should return `1.0`.
- The second cell should return `0.0001`
- The third cell should return a value around `0.00158`

In [None]:
exponential_schedule(0, 10000)

In [None]:
exponential_schedule(10000, 10000)

In [None]:
exponential_schedule(7000, 10000)

# Finally, plot the exponentially decreasing schedule for `end_step_num=1000`
- Just run the cell below.
- If your implementation was correct, you should see a beautiful exponentially decreasing curve.

In [None]:
%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()
step_nums = np.array([i for i in range(10000)])
values = np.array([exponential_schedule(i, 10000) for i in range(10000)])
ax.plot(step_nums, values)
ax.set(xlabel="policy improvement steps", ylabel="epsilon", title="exponential epsilon schedule")
fig.show()

# If you managed to implement the function correctly, congrats! 
- A decreasing epsilon schedule means that the agent is still exploring till `end_step_num`, but the amount of exploration it is doing gets smaller and smaller as it approaches `end_step_num`. 
- At the end, the policy becomes a fully greedy policy, and we hope that we explored enough till `end_step_num` to see all state-action pairs of the MDP a sufficient number of times (the GLIE condition).