# **L2RPN-WCCI Example Run of RL Agents**

This short tutorial notebook provides a quick guidance for installing and testing some Reinforcement Learning (RL) algorithms with Grid2Op framework. The RL algorithm used in **Section-I** is taken from [l2rpn_baselines](https://github.com/rte-france/l2rpn-baselines/tree/master/l2rpn_baselines), and that used in **Section-II** is taken from [Ray-RLlib](https://docs.ray.io/en/master/rllib.html). 

**A quick walkthrough:**
- Install Grid2op and l2rpn_baselines using pip command.
- Sample codes of DeepQSimple, DuelQSimple, DuelQLeapNet, DoubleDuelingDQN, DoubleDuelingRDQN are available in l2rpn_baselines, for brevity only the usage of DeepQSimple is shown in **Section-I**.
- Please note these codes are just used to show the implementation. The performnaces are not tuned for the given codes. The action space, observation space and neural network architecture are chosen randomly.
- "l2rpn_wcci_2022" is used as the environment for this example.
- Please note, to use expert_agent (can be found in l2rpn_baselines), one need to install [ExpertOp4Grid](https://expertop4grid.readthedocs.io/en/latest/).

- In **Section-II**, install RLlib. The DQN algorithms from RLlib are implemented as examples. Here, also the performances are not tuned. Check [training API](https://docs.ray.io/en/master/rllib-training.html) for RLlib algoritms.
- Please note, to use grid2op environment with RLlib, there is a need to tighten the gap between grid2op and OpenAI Gym environments. Hence, The observation space and action space are made compatible with gym enviorment. To learn more on this, please check [grid2op.gym_compat](https://grid2op.readthedocs.io/en/latest/gym.html).

# **Section-I (RL Algorithms from l2rpn_baselines)**

In [None]:
#!pip3 install grid2op  # for use with google colab (grid2Op is not installed by default)
#!pip3 install l2rpn_baselines.   # for use with google colab (l2rpn_baselines is not installed by default)

In [None]:
import grid2op
from l2rpn_baselines.PPO_SB3 import train as ppo_train
from l2rpn_baselines.PPO_SB3 import evaluate as ppo_evaluate

**Train and evaluate a Proximal Policy Optimization agent:**

In [None]:
env = grid2op.make("l2rpn_wcci_2022")
agent = ppo_train(env, name="PPO_SB3", save_path="baseline", iterations=1)

In [None]:
g2op_agent, res = ppo_evaluate(
                            env,
                            load_path="baseline/",
                            name="PPO_SB3",
                            nb_episode=10,
                            obs_space_kwargs={},
                            act_space_kwargs={}
                          )
for _, chron_name, cum_reward, nb_time_step, max_ts in res:
  msg_tmp = "chronics at: {}".format(chron_name)
  msg_tmp += "\ttotal score: {:.6f}".format(cum_reward)
  msg_tmp += "\ttime steps: {:.0f}/{:.0f}".format(nb_time_step, max_ts)
  print(msg_tmp)

# **Section-II** **(RL Algorithms from Ray-RLlib)**

**Installation of RLlib**

In [None]:
#import sys
#!$sys.executable -m pip install 'ray[rllib]' # Install RLLib
#!pip install tensorflow

# **RLlib code for DQN**

In [None]:
import gym
import ray
import gym
import numpy as np
from ray.tune.logger import pretty_print
import shutil
import os
class MyEnv(gym.Env):
    def __init__(self, env_config):
        import grid2op
        from grid2op.gym_compat import GymEnv
        from grid2op.gym_compat import BoxGymActSpace
        from grid2op.Reward import L2RPNReward


        # 1. create the grid2op environment
        if not "env_name" in env_config:
            raise RuntimeError("The configuration for RLLIB should provide the env name")
        nm_env = env_config["env_name"]
        del env_config["env_name"]
        self.env_glop = grid2op.make(nm_env, **env_config, reward_class=L2RPNReward)

        # 2. create the gym environment
        self.env_gym = GymEnv(self.env_glop)
        obs_gym = self.env_gym.reset()

        # 3. (optional) customize it (see section above for more information)
        ## customize action space
        self.env_gym.action_space = BoxGymActSpace(self.env_glop.action_space,
                                                     attr_to_keep=["redispatch", "curtail", "set_storage"])
        # The possible attribute you can provide in the "attr_to_keep" are:
        # - "redispatch"
        # - "set_storage"
        # - "curtail"
        # - "curtail_mw" (same effect as "curtail")

        ## customize observation space
        ob_space = self.env_gym.observation_space
        ob_space = ob_space.keep_only_attr(["rho"])
        
        self.env_gym.observation_space = ob_space

        # 4. specific to RLlib
        self.action_space = self.env_gym.action_space
        self.observation_space = self.env_gym.observation_space
        self.step_count = 0
        self.case_no = 0
        self.reward_sum = 0
        

    def reset(self):
        obs = self.env_gym.reset()
        self.case_no += 1
        self.reward_sum = 0
        return obs
    def step(self, action):
        self.step_count += 1
        obs, reward, done, info = self.env_gym.step(action)
        self.reward_sum += reward
        return obs, reward, done, info
CHECKPOINT_ROOT = "tmp/rllib"
shutil.rmtree(CHECKPOINT_ROOT, ignore_errors=True, onerror=None)

ray_results = os.getenv("HOME") + "/ray_results/"
shutil.rmtree(ray_results, ignore_errors=True, onerror=None)

In [None]:
## Check this link for RLlib Training API: https://docs.ray.io/en/master/rllib-training.html
nb_step_train = 1

s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f} saved {}"
for n in range(nb_step_train):  # remember: don't forge to change this number to perform an actual training !
    from ray.rllib.agents import ppo  # import the type of agents (Change accordingly for PPO / ARS / APPO / A3C / A2C)
    # fist initialize ray
    config = ppo.DEFAULT_CONFIG.copy()
    config["timesteps_per_iteration"] = 10
    config["num_workers"] = 1
    ray.init()
    try:
        # then define a "trainer" (Change accordingly for PPO / ARS / APPO / A3C / A2C)
        trainer = ppo.PPOTrainer(env=MyEnv, config={
            "env_config": {"env_name":"l2rpn_wcci_2022"},  # config to pass to env class
        })
        # and then train it for a given number of iteration
        for step in range(nb_step_train):
            result = trainer.train()
            
            file_name = trainer.save(CHECKPOINT_ROOT)

            print(s.format(
              n + 1,
              result["episode_reward_min"],
              result["episode_reward_mean"],
              result["episode_reward_max"],
              result["episode_len_mean"],
              file_name
            ))
            #print(pretty_print(result))
    finally:   
        # shutdown ray
        ray.shutdown()