# Grid2Op integration with ray / rllib framework

Try me out interactively with: [![Binder](./img/badge_logo.svg)](https://mybinder.org/v2/gh/rte-france/Grid2Op/master)


**objectives** This notebooks briefly explains how to use grid2op with ray (rllib) RL framework. Make sure to read the previous notebook 11_IntegrationWithExistingRLFrameworks.ipynb for a deeper dive into what happens. We only show the working solution here.

<font color='red'> This explains the ideas and shows a "self contained" somewhat minimal example of use of ray / rllib framework with grid2op. It is not meant to be fully generic, code might need to be adjusted.</font> 

This notebook is more an "example of what works" rather than a deep dive tutorial.

See https://docs.ray.io/en/latest/rllib/rllib-env.html#configuring-environments for a more detailed information.

See also https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html for other details

This notebook is tested with grid2op 1.10 and ray 2.23 on an ubuntu 20.04 machine.


## 1 Create the "Grid2opEnv" class

In the next cell, we define a custom environment (that will internally use the `GymEnv` grid2op class) that is needed for ray / rllib.

Indeed, in order to work with ray / rllib you need to define a custom wrapper on top of the GymEnv wrapper. You then have:

- self._g2op_env which is the default grid2op environment, receiving grid2op Action and producing grid2op Observation.
- self._gym_env which is a the grid2op defined `gymnasium Environment` that cannot be directly used with ray / rllib
- `Grid2opEnv` which is a the wrapper on top of `self._gym_env` to make it usable with ray / rllib.

Ray / rllib expects the gymnasium environment to inherit from `gymnasium.Env` and to be initialized with a given configuration. This is why you need to create the `Grid2opEnv` wrapper on top of `GymEnv`.

In the initialization of `Grid2opEnv`, the `env_config` variable is a dictionary that can take as key-word arguments:

- `backend_cls` : what is the class of the backend. If not provided, it will use `LightSimBackend` from the `lightsim2grid` package
- `backend_options`: what options will be used to create the backend for your environment. Your backend will be created by calling
   `backend_cls(**backend_options)`, for example if you want to build `LightSimBackend(detailed_info_for_cascading_failure=False)` you can pass `{"backend_cls": LightSimBackend, "backend_options": {"detailed_info_for_cascading_failure": False}}`
- `env_name` : name of the grid2op environment you want to use, by default it uses `"l2rpn_case14_sandbox"`
- `env_is_test` : whether to add `test=True` when creating the grid2op environment (if `env_is_test` is True it will add `test=True` when calling `grid2op.make(..., test=True)`) otherwise it uses `test=False`
- `obs_attr_to_keep` : in this wrapper we only allow your agent to see a Box as an observation. This parameter allows you to control which attributes of the grid2op observation will be present in the agent observation space. By default it's `["rho", "p_or", "gen_p", "load_p"]` which is "kind of random" and is probably not suited for every agent.
- `act_type` : controls the type of actions your agent will be able to perform. Already coded in this notebook are:
   - `"discrete"` to use a `Discrete` action space
   - `"box"` to use a `Box` action space
   - `"multi_discrete"` to use a `MultiDiscrete` action space
- `act_attr_to_keep` :  that allows you to customize the action space. If not provided, it defaults to:
  - `["set_line_status_simple", "set_bus"]` if `act_type` is `"discrete"` 
  - `["redispatch", "set_storage", "curtail"]` if `act_type` is `"box"` 
  - `["one_line_set", "one_sub_set"]` if `act_type` is `"multi_discrete"`

If you want to add more customization, for example the reward function, the parameters of the environment etc. etc. feel free to get inspired by this code and extend it. Any PR on this regard is more than welcome.

In [None]:
from gymnasium import Env
from gymnasium.spaces import Discrete, MultiDiscrete, Box

import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.algorithms import ppo

from typing import Dict, Literal, Any

import grid2op
from grid2op.gym_compat import GymEnv, BoxGymObsSpace, DiscreteActSpace, BoxGymActSpace, MultiDiscreteActSpace
from lightsim2grid import LightSimBackend


class Grid2opEnv(Env):
    def __init__(self,
                 env_config: Dict[Literal["backend_cls",
                                          "backend_options",
                                          "env_name",
                                          "env_is_test",
                                          "obs_attr_to_keep",
                                          "act_type",
                                          "act_attr_to_keep"],
                                  Any]):
        super().__init__()
        if env_config is None:
            env_config = {}

        # handle the backend
        backend_cls = LightSimBackend
        if "backend_cls" in env_config:
            backend_cls = env_config["backend_cls"]
        backend_options = {}
        if "backend_options" in env_config:
            backend_options = env_config["backend_options"]
        backend = backend_cls(**backend_options)

        # create the grid2op environment
        env_name = "l2rpn_case14_sandbox"
        if "env_name" in env_config:
            env_name = env_config["env_name"]
        if "env_is_test" in env_config:
            is_test = bool(env_config["env_is_test"])
        else:
            is_test = False
        self._g2op_env = grid2op.make(env_name, backend=backend, test=is_test)
        # NB by default this might be really slow (when the environment is reset)
        # see https://grid2op.readthedocs.io/en/latest/data_pipeline.html for maybe 10x speed ups !
        # TODO customize reward or action_class for example !

        # create the gym env (from grid2op)
        self._gym_env = GymEnv(self._g2op_env)

        # customize observation space
        obs_attr_to_keep = ["rho", "p_or", "gen_p", "load_p"]
        if "obs_attr_to_keep" in env_config:
            obs_attr_to_keep = copy.deepcopy(env_config["obs_attr_to_keep"])
        self._gym_env.observation_space.close()
        self._gym_env.observation_space = BoxGymObsSpace(self._g2op_env.observation_space,
                                                         attr_to_keep=obs_attr_to_keep
                                                         )
        # export observation space for the Grid2opEnv
        self.observation_space = Box(shape=self._gym_env.observation_space.shape,
                                     low=self._gym_env.observation_space.low,
                                     high=self._gym_env.observation_space.high)

        # customize the action space
        act_type = "discrete"
        if "act_type" in env_config:
            act_type = env_config["act_type"]

        self._gym_env.action_space.close()
        if act_type == "discrete":
            # user wants a discrete action space
            act_attr_to_keep =  ["set_line_status_simple", "set_bus"]
            if "act_attr_to_keep" in env_config:
                act_attr_to_keep = copy.deepcopy(env_config["act_attr_to_keep"])
            self._gym_env.action_space = DiscreteActSpace(self._g2op_env.action_space,
                                                          attr_to_keep=act_attr_to_keep)
            self.action_space = Discrete(self._gym_env.action_space.n)
        elif act_type == "box":
            # user wants continuous action space
            act_attr_to_keep =  ["redispatch", "set_storage", "curtail"]
            if "act_attr_to_keep" in env_config:
                act_attr_to_keep = copy.deepcopy(env_config["act_attr_to_keep"])
            self._gym_env.action_space = BoxGymActSpace(self._g2op_env.action_space,
                                                        attr_to_keep=act_attr_to_keep)
            self.action_space = Box(shape=self._gym_env.action_space.shape,
                                    low=self._gym_env.action_space.low,
                                    high=self._gym_env.action_space.high)
        elif act_type == "multi_discrete":
            # user wants a multi-discrete action space
            act_attr_to_keep = ["one_line_set", "one_sub_set"]
            if "act_attr_to_keep" in env_config:
                act_attr_to_keep = copy.deepcopy(env_config["act_attr_to_keep"])
            self._gym_env.action_space = MultiDiscreteActSpace(self._g2op_env.action_space,
                                                               attr_to_keep=act_attr_to_keep)
            self.action_space = MultiDiscrete(self._gym_env.action_space.nvec)
        else:
            raise NotImplementedError(f"action type '{act_type}' is not currently supported.")
            
            
    def reset(self, seed, options):
        # use default _gym_env (from grid2op.gym_compat module)
        return self._gym_env.reset(seed=seed, options=options)
        
    def step(self, action):
        # use default _gym_env (from grid2op.gym_compat module)
        return self._gym_env.step(action)
        

Now we init ray, because we need to.

In [None]:
ray.init()

## 2 Make a default environment, and train a PPO agent for one iteration

In [None]:
# example of the documentation, directly
# see https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html

# Construct a generic config object, specifying values within different
# sub-categories, e.g. "training".
config = (PPOConfig().training(gamma=0.9, lr=0.01)
          .environment(env=Grid2opEnv, env_config={})
          .resources(num_gpus=0)
          .env_runners(num_env_runners=0)
          .framework("tf2")
         )

# A config object can be used to construct the respective Algorithm.
rllib_algo = config.build()


Now we train it for one training iteration (might call `env.reset()` and  `env.step()` multiple times)

In [None]:
print(rllib_algo.train())

## 3 Train a PPO agent using 2 "runners" to make the rollouts

In [None]:
# see https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html

# use multiple use multiple runners
config2 = (PPOConfig().training(gamma=0.9, lr=0.01)
           .environment(env=Grid2opEnv, env_config={})
           .resources(num_gpus=0)
           .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)
           .framework("tf2")
          )

# A config object can be used to construct the respective Algorithm.
rllib_algo2 = config2.build()

Now we train it for one training iteration (might call `env.reset()` and  `env.step()` multiple times)

In [None]:
print(rllib_algo2.train())

## 4 Use non default parameters to make the l2rpn environment

In this first example, we will train a policy using the "box" action space.

In [None]:
# see https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html

# Use a "Box" action space (mainly to use redispatching, curtailment and storage units)
env_config = {"env_name": "l2rpn_idf_2023",
              "env_is_test": True,
              "act_type": "box",
             }
config3 = (PPOConfig().training(gamma=0.9, lr=0.01)
           .environment(env=Grid2opEnv, env_config=env_config)
           .resources(num_gpus=0)
           .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)
           .framework("tf2")
          )

# A config object can be used to construct the respective Algorithm.
rllib_algo3 = config3.build()

Now we train it for one training iteration (might call `env.reset()` and  `env.step()` multiple times)

In [None]:
print(rllib_algo3.train())

And now a policy using the "multi discrete" action space: 

In [None]:
# see https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html

# Use a "Box" action space (mainly to use redispatching, curtailment and storage units)
env_config4 = {"env_name": "l2rpn_idf_2023",
               "env_is_test": True,
               "act_type": "multi_discrete",
               }
config4 = (PPOConfig().training(gamma=0.9, lr=0.01)
           .environment(env=Grid2opEnv, env_config=env_config4)
           .resources(num_gpus=0)
           .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)
           .framework("tf2")
          )

# A config object can be used to construct the respective Algorithm.
rllib_algo4 = config4.build()

Now we train it for one training iteration (might call `env.reset()` and  `env.step()` multiple times)

In [None]:
print(rllib_algo4.train())

## 5 Customize the policy (number of layers, size of layers etc.)

This notebook does not aim at covering all possibilities offered by ray / rllib. For that you need to refer to the ray / rllib documentation.

We will simply show how to change the size of the neural network used as a policy.

In [None]:
# see https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html

# Use a "Box" action space (mainly to use redispatching, curtailment and storage units)
config5 = (PPOConfig().training(gamma=0.9, lr=0.01)
           .environment(env=Grid2opEnv, env_config={})
           .resources(num_gpus=0)
           .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)
           .framework("tf2")
           .rl_module(
             model_config_dict={"fcnet_hiddens": [32, 32, 32]},  # 3 layers (fully connected) of 32 units each
           )
          )

# A config object can be used to construct the respective Algorithm.
rllib_algo5 = config5.build()

Now we train it for one training iteration (might call `env.reset()` and  `env.step()` multiple times)

In [None]:
print(rllib_algo5.train())