# Grid2op integration with existing frameworks

Try me out interactively with: [![Binder](./img/badge_logo.svg)](https://mybinder.org/v2/gh/rte-france/Grid2Op/master)


**objectives** This notebooks briefly explains how to use grid2op with commonly used RL frameworks.

The structure is always very similar:
1. Create a grid2op environment
2. Convert it to a gym environment
3. (optional) Customize the action space and observation space
4. Use the framework to train an agent

In this notebook, we will demonstrate its usage with 3 different framework. The code provided here are given as examples and we do not assume anything on their performance or fitness of use. More detailed example will be provided in the l2rpn-baselines repository in due time (work in progress at the time of writing this notebook). The 3 framework we will demonstrate an example of are:

- ray (rllib): see [ray on github](https://github.com/ray-project/ray) or [rllib on github](https://github.com/ray-project/ray/blob/master/doc/source/rllib.rst)
- stable-baselines3: see [stable-baselines3 on github](https://github.com/DLR-RM/stable-baselines3)
- tf_agents: see [tf_agents on github](https://github.com/tensorflow/agents)

Other RL frameworks are not cover here. If you already use them, let us know !
- https://github.com/wau/keras-rl2
- https://github.com/deepmind/acme


<img src="https://colab.research.google.com/assets/colab-badge.svg" width="200">
Execute the cell below by removing the `#` characters if you use google colab !

Cell will look like:
```python
import sys
!$sys.executable install grid2op[optional]  # for use with google colab (grid2Op is not installed by default)
!$sys.executable install tensorflow pytorch stable-baselines3 'ray[rllib]' tf_agents
```

It might take a while
<img src="https://colab.research.google.com/assets/colab-badge.svg" width="200">

In [1]:
import sys
# !$sys.executable install grid2op[optional]  # for use with google colab (grid2Op is not installed by default)
# !$sys.executable -m pip install stable-baselines3 'ray[rllib]' tf_agents

## 0) Recommended initial steps

### Create a grid2op environment

This is a rather standard step, with lots of inspiration drawn from openAI gym framework, and there is absolutely no specificity here.

In [2]:
import grid2op
env_name = "l2rpn_case14_sandbox"
env_glop = grid2op.make(env_name, test=True)  # NOTE: do not set the flag "test=True" for a real usage !
# This flag is here for testing purpose !!!
obs_glop = env_glop.reset()
obs_glop



<grid2op.Space.GridObjects.CompleteObservation_l2rpn_case14_sandbox at 0x7fa66e09dfd0>

### Convert it to a gym environment

To that end, we recommend using the "gym_compat" module. More information is given in the [official grid2op documentation](https://grid2op.readthedocs.io/en/latest/gym.html)

In [3]:
import gym
import numpy as np
from grid2op.gym_compat import GymEnv
env_gym = GymEnv(env_glop)
print(f"The \"env_gym\" is a gym environment: {isinstance(env_gym, gym.Env)}")
# obs_gym = env_gym.reset()
# obs_gym

The "env_gym" is a gym environment: True


### Customize the action space and observation space

This step is optional, but highly recommended.

By default, grid2op actions and observations are huge. Even for this very simplistic example, you have really important sizes:

In [4]:
dim_act_space = np.sum([np.sum(env_gym.action_space[el].shape) for el in env_gym.action_space.spaces])
print(f"The size of the action space is : "
      f"{dim_act_space}")
dim_obs_space = np.sum([np.sum(env_gym.observation_space[el].shape).astype(int) 
                        for el in env_gym.observation_space.spaces])
print(f"The size of the observation space is : "
      f"{dim_obs_space}")

The size of the action space is : 160
The size of the observation space is : 432


#### Action space
This is partly due because in grid2op, you can represent the same concept (*eg* reconnect a powerline) in different manners (in this case: either you "toggle a switch" - if the said powerline was connected, it will disconnect it, otherwise it will reconnect it- or you can say "i want this line connected whatever its original state"). This behaviour is detailed in the [official grid2op documentation](https://grid2op.readthedocs.io/en/latest/action.html#usage-examples).

To (in general) reduce the action space by a factor of 2, you can represent these actions only using the change method (for example). You can do that with:

In [5]:
# example: ignore the "set_status" and "set_bus" type of actions, that are covered by the "change_status" and
# "change_bus"

env_gym.action_space = env_gym.action_space.ignore_attr("set_bus").ignore_attr("set_line_status")

new_dim_act_space = np.sum([np.sum(env_gym.action_space[el].shape) for el in env_gym.action_space.spaces])
print(f"The new size of the action space is : {new_dim_act_space}")

The new size of the action space is : 83


Grid2op environments allow for both continuous and discrete action. For the sake of the example, let's "convert" the continuous actions in discrete ones (this is done with "binning" the values as explained in more details [in the documentation](https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.ContinuousToDiscreteConverter) )

In [6]:
# example: convert the continuous action type "redispatch" to a discrete action type
from grid2op.gym_compat import ContinuousToDiscreteConverter
env_gym.action_space = env_gym.action_space.reencode_space("redispatch",
                                                           ContinuousToDiscreteConverter(nb_bins=11)
                                                           )

In [7]:
# And now our action space looks like:
env_gym.action_space

Dict(change_bus:MultiBinary(57), change_line_status:MultiBinary(20), redispatch:MultiDiscrete([11 11  1  1  1 11]))

#### Observation space

For the obsevation space, we will remove lots of useless attributes (remember, it is for the sake of the example here, and rescale some other so that they have numbers between rougly 0. and 1., which stabilizes the learning process.

In [8]:
# first let's see which are the attributes in the observation space:
# More information on
# https://beta-grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes
# and 
# https://grid2op.readthedocs.io/en/latest/gym.html#observation-space-and-action-space-customization
env_gym.observation_space

Dict(a_ex:Box(20,), a_or:Box(20,), actual_dispatch:Box(6,), curtailment:Box(6,), curtailment_limit:Box(6,), day:Discrete(32), day_of_week:Discrete(8), duration_next_maintenance:Box(20,), gen_p:Box(6,), gen_p_before_curtail:Box(6,), gen_q:Box(6,), gen_v:Box(6,), hour_of_day:Discrete(24), line_status:MultiBinary(20), load_p:Box(11,), load_q:Box(11,), load_v:Box(11,), minute_of_hour:Discrete(60), month:Discrete(13), p_ex:Box(20,), p_or:Box(20,), q_ex:Box(20,), q_or:Box(20,), rho:Box(20,), storage_charge:Box(0,), storage_power:Box(0,), storage_power_target:Box(0,), target_dispatch:Box(6,), time_before_cooldown_line:Box(20,), time_before_cooldown_sub:Box(14,), time_next_maintenance:Box(20,), timestep_overflow:Box(20,), topo_vect:Box(57,), v_ex:Box(20,), v_or:Box(20,), year:Discrete(2100))

Let's keep only the information about the flow on the powerlines: `rho`, the generation `gen_p`, the load `load_p` and the representation of the topology `topo_vect` (for the sake of the example, once again)

In [9]:
env_gym.observation_space = env_gym.observation_space.keep_only_attr(["rho", "gen_p", "load_p", "topo_vect", 
                                                                      "actual_dispatch"])
new_dim_obs_space = np.sum([np.sum(env_gym.observation_space[el].shape).astype(int) 
                        for el in env_gym.observation_space.spaces])
print(f"The new size of the observation space is : "
      f"{new_dim_obs_space} (it was {dim_obs_space} before!)")

The new size of the observation space is : 100 (it was 432 before!)


One other detail here, the generation and loads are not scaled (they are given in MW). We recommend to scale them to have number roughly between 0 and 1 for stability during learning.

This can be done pretty easily with the code below:

In [10]:
from grid2op.gym_compat import ScalerAttrConverter
ob_space = env_gym.observation_space
ob_space = ob_space.reencode_space("actual_dispatch",
                                   ScalerAttrConverter(substract=0.,
                                                       divide=env_glop.gen_pmax
                                                       )
                                   )
ob_space = ob_space.reencode_space("gen_p",
                                   ScalerAttrConverter(substract=0.,
                                                       divide=env_glop.gen_pmax
                                                       )
                                   )
# ob_space = ob_space.reencode_space("load_p",
#                                   ScalerAttrConverter(substract=obs_gym["load_p"],
#                                                       divide=0.5 * obs_gym["load_p"]
#                                                       )
#                                   )

env_gym.observation_space = ob_space
env_gym.observation_space

Dict(actual_dispatch:Box(6,), gen_p:Box(6,), load_p:Box(11,), rho:Box(20,), topo_vect:Box(57,))

## 1) RLLIB

This part is not a tutorial on how to use rllib. Please refer to [their documentation](https://docs.ray.io/en/master/rllib.html) for more detailed information.

As explained in the header of this notebook, we will follow the recommended usage:
1. Create a grid2op environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
2. Convert it to a gym environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
3. (optional) Customize the action space and observation space (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
4. Use the framework to train an agent  **(only this part is framework specific)**


The issue with rllib is that it does not take into account MultiBinary nor MultiDiscrete action space (see 
see https://github.com/ray-project/ray/issues/1519) so we need some way to encode these types of actions. This can be done automatically with the `MultiToTupleConverter` provided in grid2op (as always, more information [in the documentation](https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.MultiToTupleConverter) ).

We will then use this to customize our environment previously defined:
    

In [11]:
import copy
env_rllib = copy.deepcopy(env_gym)
from grid2op.gym_compat import MultiToTupleConverter
env_rllib.action_space = env_rllib.action_space.reencode_space("change_bus", MultiToTupleConverter())
env_rllib.action_space = env_rllib.action_space.reencode_space("change_line_status", MultiToTupleConverter())
env_rllib.action_space = env_rllib.action_space.reencode_space("redispatch", MultiToTupleConverter())
env_rllib.action_space

Dict(change_bus:Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2)), change_line_status:Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Dis

In [12]:
act_gym = env_rllib.action_space.sample()
act_glop = env_rllib.action_space.from_gym(act_gym)
act_gym2 = env_rllib.action_space.to_gym(act_glop)
act_glop2 = env_rllib.action_space.from_gym(act_gym2)
for k in act_gym.keys():
    assert np.array_equal(act_gym[k], act_gym2[k]), f"error for {k}"
for k in act_gym2.keys():
    assert np.array_equal(act_gym[k], act_gym2[k]), f"error for {k}"
assert act_glop == act_glop2

> /home/benjamin/Documents/grid2op_dev/getting_started/grid2op/gym_compat/multi_to_tuple_converter.py(80)g2op_to_gym()
-> tmp = self.previous_fun(g2op_object)  # TODO
(Pdb) c
> /home/benjamin/Documents/grid2op_dev/getting_started/grid2op/gym_compat/multi_to_tuple_converter.py(80)g2op_to_gym()
-> tmp = self.previous_fun(g2op_object)  # TODO
(Pdb) c
> /home/benjamin/Documents/grid2op_dev/getting_started/grid2op/gym_compat/multi_to_tuple_converter.py(80)g2op_to_gym()
-> tmp = self.previous_fun(g2op_object)  # TODO
(Pdb) c


In [18]:
act_gym

OrderedDict([('change_bus',
              (0,
               1,
               0,
               1,
               1,
               0,
               0,
               1,
               1,
               1,
               0,
               1,
               0,
               1,
               0,
               1,
               0,
               1,
               1,
               1,
               1,
               0,
               1,
               0,
               0,
               1,
               1,
               0,
               0,
               1,
               0,
               0,
               0,
               0,
               0,
               1,
               1,
               0,
               0,
               1,
               1,
               0,
               0,
               0,
               1,
               1,
               1,
               1,
               1,
               1,
               0,
               1,
               0,
               0,


In [14]:
act_gym2[k]

(5, 4, 0, 0, 0, 5)

## 2) Stable baselines

This part is not a tutorial on how to use stable baselines. Please refer to [their documentation](https://stable-baselines3.readthedocs.io/en/master/) for more detailed information.

As explained in the header of this notebook, we will follow the recommended usage:
1. Create a grid2op environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
2. Convert it to a gym environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
3. (optional) Customize the action space and observation space (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
4. Use the framework to train an agent  **(only this part is framework specific)**


The issue with stable beselines 3 is that it expects standard action / observation types as explained there:
https://stable-baselines3.readthedocs.io/en/master/guide/algos.html#rl-algorithms

> Non-array spaces such as Dict or Tuple are not currently supported by any algorithm.

Unfortunately, it's not possible to convert without any "loss of information" an action space of dictionnary type to a vector.

TODO

In [15]:
import copy
env_stable_base = copy.deepcopy(env_gym)
# required for stable_baselines
env_stable_base.observation_space = env_stable_base.observation_space.vectorize()
env_stable_base.action_space = env_stable_base.action_space.vectorize()

AttributeError: 'GymObservationSpace' object has no attribute 'vectorize'

In [None]:
env_stable_base.observation_space

In [None]:
from stable_baselines3 import A2C

model = A2C('MlpPolicy', env_gym, verbose=1)
model.learn(total_timesteps=10000)