# **L2RPN-ICAPS Example Run of RL Agents**

This short tutorial notebook provides a quick guidance for installing and testing some Reinforcement Learning (RL) algorithms with Grid2Op framework. The RL algorithm used in **Section-I** is taken from [l2rpn_baselines](https://github.com/rte-france/l2rpn-baselines/tree/master/l2rpn_baselines), and that used in **Section-II** is taken from [Ray-RLlib](https://docs.ray.io/en/master/rllib.html). 

**A quick walkthrough:**
- Install Grid2op and l2rpn_baselines using pip command.
- Sample codes of DeepQSimple, DuelQSimple, DuelQLeapNet, DoubleDuelingDQN, DoubleDuelingRDQN are available in l2rpn_baselines, for brevity only the usage of DeepQSimple is shown in **Section-I**.
- Please note these codes are just used to show the implementation. The performnaces are not tuned for the given codes. The action space, observation space and neural network architecture are chosen randomly.
- "l2rpn_neurips_2020_track1_small" is used as the environment for this example.
- Please note, to use expert_agent (can be found in l2rpn_baselines), one need to install [ExpertOp4Grid](https://expertop4grid.readthedocs.io/en/latest/).

- In **Section-II**, install RLlib. The DQN algorithms from RLlib are implemented as examples. Here, also the performances are not tuned. Check [training API](https://docs.ray.io/en/master/rllib-training.html) for RLlib algoritms.
- Please note, to use grid2op environment with RLlib, there is a need to tighten the gap between grid2op and OpenAI Gym environments. Hence, The observation space and action space are made compatible with gym enviorment. To learn more on this, please check [grid2op.gym_compat](https://grid2op.readthedocs.io/en/latest/gym.html).

# **Section-I (RL Algorithms from l2rpn_baselines)**

In [None]:
#!pip3 install grid2op  # for use with google colab (grid2Op is not installed by default)
#!pip3 install l2rpn_baselines.   # for use with google colab (l2rpn_baselines is not installed by default)

# **Sample Code for Training // Algorithm: DeepQSimple**

In [None]:
import grid2op
from grid2op.Reward import L2RPNReward
from l2rpn_baselines.utils import TrainingParam, NNParam
from l2rpn_baselines.DeepQSimple import train
# define the environment
env = grid2op.make("l2rpn_icaps_2021_small",
                    reward_class=L2RPNReward)
# use the default training parameters
tp = TrainingParam()
# this will be the list of what part of the observation I want to keep
# more information on https://grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes
li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                  "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                  "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status"]
#li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
#"actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
#"time_before_cooldown_sub", "rho", "timestep_overflow", "line_status"]
# neural network architecture
observation_size = NNParam.get_obs_size(env, li_attr_obs_X)
sizes = [800, 800, 800, 494, 494, 494]  # sizes of each hidden layers
kwargs_archi = {'observation_size': observation_size,
                'sizes': sizes,
                'activs': ["relu" for _ in sizes],  # all relu activation function
                "list_attr_obs": li_attr_obs_X}
# select some part of the action
# more information at https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.init_converter
kwargs_converters = {"all_actions": None,
                      "set_line_status": False,
                      "change_bus_vect": True,
                      "set_topo_vect": False
                      }
# define the name of the model
nm_ = "AnneOnymous"
try:
    train(env,
          name=nm_,
          iterations=100, # Change the no. of interations
          save_path=None,
          load_path=None,
          logs_dir=None,
          training_param=tp,
          kwargs_converters=kwargs_converters,
          kwargs_archi=kwargs_archi)
finally:
    env.close()

# **Section-II** **(RL Algorithms from Ray-RLlib)**

**Installation of RlLib**

In [None]:
import sys 
!$sys.executable -m pip install 'ray[rllib]' # Install RLLib
!pip install tensorflow==2.2.0#make sure to have tensorflow<=2.2.0 to work fine for now with Ray

# **RLlib code for DQN**

In [None]:
import gym
import ray
import gym
import numpy as np
from ray.tune.logger import pretty_print
import shutil
import os
class MyEnv(gym.Env):
    def __init__(self, env_config):
        import grid2op
        from grid2op.gym_compat import GymEnv
        from grid2op.gym_compat import ScalerAttrConverter, ContinuousToDiscreteConverter, MultiToTupleConverter
        from grid2op.gym_compat import DiscreteActSpace
        from grid2op.Reward import GameplayReward, L2RPNReward


        # 1. create the grid2op environment
        if not "env_name" in env_config:
            raise RuntimeError("The configuration for RLLIB should provide the env name")
        nm_env = env_config["env_name"]
        del env_config["env_name"]
        self.env_glop = grid2op.make(nm_env, **env_config,reward_class=L2RPNReward)

        # 2. create the gym environment
        self.env_gym = GymEnv(self.env_glop)
        obs_gym = self.env_gym.reset()

        # 3. (optional) customize it (see section above for more information)
        ## customize action space
        self.env_gym.action_space = DiscreteActSpace(self.env_glop.action_space,
                                                     attr_to_keep=['change_line_status'])
        # The possible attribute you can provide in the "attr_to_keep" are:
        # - "set_line_status"
        # - "change_line_status"
        # - "set_bus": corresponds to changing the topology using the "set_bus" (equivalent to the
        #   "one_sub_set" keyword in the "attr_to_keep" of the :class:`MultiDiscreteActSpace`)
        # - "change_bus": corresponds to changing the topology using the "change_bus" (equivalent to the
        #   "one_sub_change" keyword in the "attr_to_keep" of the :class:`MultiDiscreteActSpace`)
        # - "redispatch"
        # - "set_storage"
        # - "curtail"
        # - "curtail_mw" (same effect as "curtail")

        ## customize observation space
        ob_space = self.env_gym.observation_space
        ob_space = ob_space.keep_only_attr(["rho"])
        
        self.env_gym.observation_space = ob_space

        # 4. specific to rllib
        self.action_space = self.env_gym.action_space
        self.observation_space = self.env_gym.observation_space
        self.step_count = 0
        self.case_no = 0
        self.reward_sum = 0
        

    def reset(self):
        obs = self.env_gym.reset()
        self.case_no += 1
        self.reward_sum = 0
        return obs
    def step(self, action):
        self.step_count += 1
        obs, reward, done, info = self.env_gym.step(action)
        self.reward_sum += reward
        return obs, reward, done, info
CHECKPOINT_ROOT = "tmp/rllib"
shutil.rmtree(CHECKPOINT_ROOT, ignore_errors=True, onerror=None)

ray_results = os.getenv("HOME") + "/ray_results/"
shutil.rmtree(ray_results, ignore_errors=True, onerror=None)

In [None]:
## Check this link for RLlib Training API: https://docs.ray.io/en/master/rllib-training.html
nb_step_train = 1

s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f} saved {}"
for n in range(nb_step_train):  # remember: don't forge to change this number to perform an actual training !
    from ray.rllib.agents import dqn  # import the type of agents (Change accordingly for PPO / ARS / APPO / A3C / A2C)
    # fist initialize ray
    config = dqn.DEFAULT_CONFIG.copy()
    config["timesteps_per_iteration"] = 10
    ray.init()
    try:
        # then define a "trainer" (Change accordingly for PPO / ARS / APPO / A3C / A2C)
        trainer = dqn.DQNTrainer(env=MyEnv, config={
            "env_config": {"env_name":"l2rpn_icaps_2021_small"},  # config to pass to env class
        })
        # and then train it for a given number of iteration
        for step in range(nb_step_train):
            result = trainer.train()
            
            file_name = trainer.save(CHECKPOINT_ROOT)

            print(s.format(
              n + 1,
              result["episode_reward_min"],
              result["episode_reward_mean"],
              result["episode_reward_max"],
              result["episode_len_mean"],
              file_name
            ))
            #print(pretty_print(result))
    finally:   
        # shutdown ray
        ray.shutdown()