# Parameter Study for Lunar Lander

Adapted from the Reinforcement Learning specialization course on Coursera, this project involves several crucial steps: developing a realistic simulator, selecting and implementing a suitable reinforcement learning algorithm, and optimizing its hyperparameters.

Having completed the implementation of the Lunar Lander environment and the agent using neural networks and the Adam optimizer, attention now turns to understanding how different meta-parameters affect agent performance. Key meta-parameters include step-size, the temperature parameter for the softmax policy, and the replay buffer capacity. While rules of thumb can guide initial choices, a detailed analysis of these parameters’ impact on performance provides deeper insights.

In this notebook, the focus will be on analyzing agent performance across various step-size parameters through careful experimentation.

**Objectives:**

- Develop a script to run the agent and environment with a range of parameter values to assess performance.
- Analyze the effect of the step-size parameter on agent performance by examining its sensitivity curve.

## Packages

- [numpy](www.numpy.org) : Fundamental package for scientific computing with Python.
- [matplotlib](http://matplotlib.org) : Library for plotting graphs in Python.
- [RL-Glue](http://www.jmlr.org/papers/v10/tanner09a.html) : Library for reinforcement learning experiments.
- [tqdm](https://tqdm.github.io/) : A package to display progress bar when running experiments

In [None]:
# uncomment the following line to install the packages.
# !pip install numpy matplotlib tqdm

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import os
from tqdm import tqdm

from rl_glue import RLGlue
from environment import BaseEnvironment
from agent import BaseAgent
from dummy_environment import DummyEnvironment
from dummy_agent import DummyAgent


## 1. Parameter Study Script

This section involves writing a script to conduct parameter studies. The task is to implement the `run_experiment()` function, which will take an environment and an agent to perform a parameter study on both the step-size and temperature parameters.

In [2]:
def run_experiment(environment, agent, environment_parameters, agent_parameters, experiment_parameters):
    
    """
    Assume environment_parameters dict contains:
    {
        input_dim: integer,
        num_actions: integer,
        discount_factor: float
    }
    
    Assume agent_parameters dict contains:
    {
        step_size: 1D numpy array of floats,
        tau: 1D numpy array of floats
    }
    
    Assume experiment_parameters dict contains:
    {
        num_runs: integer,
        num_episodes: integer
    }    
    """
    
    ### Instantiate rl_glue from RLGlue    
    rl_glue = RLGlue(environment, agent)

    os.system('sleep 1') # to prevent tqdm printing out-of-order
    
     ### Initialize agent_sum_reward to zero in the form of a numpy array 
    # with shape (number of values for tau, number of step-sizes, number of runs, number of episodes)
    agent_sum_reward = np.zeros((len(agent_parameters["tau"]), len(agent_parameters["step_size"]), experiment_parameters["num_runs"], experiment_parameters["num_episodes"]))
    
    ### Replace the Nones with the correct values in the rest of the code

    # for loop over different values of tau
    # tqdm is used to show a progress bar for completing the parameter study
    for i in tqdm(range(len(agent_parameters["tau"]))):
    
        # for loop over different values of the step-size
        for j in range(len(agent_parameters["step_size"])): 

            ### Specify env_info 
            env_info = {}

            ### Specify agent_info
            agent_info = {"num_actions": environment_parameters["num_actions"],
                          "input_dim": environment_parameters["input_dim"],
                          "discount_factor": environment_parameters["discount_factor"],
                          "tau": agent_parameters["tau"][i],
                          "step_size": agent_parameters["step_size"][j]}

            # for loop over runs
            for run in range(experiment_parameters["num_runs"]): 
                
                # Set the seed
                agent_info["seed"] = agent_parameters["seed"] * experiment_parameters["num_runs"] + run
                
                # Beginning of the run            
                rl_glue.rl_init(agent_info, env_info)

                for episode in range(experiment_parameters["num_episodes"]): 
                    
                    # Run episode
                    rl_glue.rl_episode(0) # no step limit

                    ### Store sum of reward
                    agent_sum_reward[i, j, run, episode] = rl_glue.rl_agent_message("get_sum_reward")

            if not os.path.exists('results'):
                    os.makedirs('results')

            save_name = "{}".format(rl_glue.agent.name).replace('.','')

            # save sum reward
            np.save("results/sum_reward_{}".format(save_name), agent_sum_reward)
    

Execute the following code to test the implementation of `run_experiment()` with a dummy agent and a dummy environment. The test will cover 100 runs, 100 episodes, 12 step-size values, and 4 values for $\tau$.

In [3]:
# Experiment parameters
experiment_parameters = {
    "num_runs" : 100,
    "num_episodes" : 100,
}

# Environment parameters
environment_parameters = {
    "input_dim" : 8,
    "num_actions": 4, 
    "discount_factor" : 0.99
}

agent_parameters = {
    "step_size": 3e-5 * np.power(2.0, np.array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])),
    "tau": np.array([0.001, 0.01, 0.1, 1.0]),
    "seed": 0
}

test_env = DummyEnvironment
test_agent = DummyAgent

run_experiment(test_env, 
               test_agent, 
               environment_parameters, 
               agent_parameters, 
               experiment_parameters)

sum_reward_dummy_agent = np.load("results/sum_reward_dummy_agent.npy")
sum_reward_dummy_agent_answer = np.load("asserts/sum_reward_dummy_agent.npy")
assert(np.allclose(sum_reward_dummy_agent, sum_reward_dummy_agent_answer))

print("Passed the assert!")


100%|██████████| 4/4 [00:11<00:00,  2.81s/it]

Passed the assert!





## 2. Parameter Study for the Neural Network Agent with Adam Optimizer

With `run_experiment()` implemented for a dummy agent, the next step is to evaluate the performance of the agent developed in [Lunar Lander Agent](TODO) with various step-size parameters. This analysis will use parameter sensitivity curves, where the y-axis represents performance metrics and the x-axis shows the tested parameter values. Performance will be measured as the average return over episodes, averaged across 30 runs.

In [Lunar Lander Agent](TODO), a step-size of $10^{-3}$ was used, yielding reasonable results. To explore other step-sizes, we can vary this value by multiplying it with powers of two:

$10^{-3} \times 2^x$ where $x \in \{-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3\}$


Using powers of two allows for finer increments at smaller values and larger jumps at higher values.

The results for this set of step-sizes are shown below:

<img src="parameter_study.png" alt="Parameter Study" style="width: 500px;"/>

The optimal performance is observed for step-sizes in the range $[10^{-4}, 10^{-3}]$. Performance tends to decline for both larger and smaller step-size values. This narrow range where the agent performs well indicates that selecting an appropriate step-size is crucial.

For performance assessment, the average return over episodes, averaged over 30 runs, was used. However, to analyze the impact of the step-size on the agent’s early or final performance, alternative metrics might be considered. For instance, to examine early performance, one could use the average return over the first 100 episodes, averaged over 30 runs. Adjusting the performance metric can provide different insights into how the step-size parameter affects the agent’s behavior.