# Grid2op integration with existing frameworks

Try me out interactively with: [![Binder](./img/badge_logo.svg)](https://mybinder.org/v2/gh/rte-france/Grid2Op/master)


**objectives** This notebooks briefly explains how to use grid2op with commonly used RL frameworks.

The structure is always very similar:
1. Create a grid2op environment
2. Convert it to a gym environment
3. (optional) Customize the action space and observation space
4. Use the framework to train an agent

In this notebook, we will demonstrate its usage with 3 different framework. The code provided here are given as examples and we do not assume anything on their performance or fitness of use. More detailed example will be provided in the l2rpn-baselines repository in due time (work in progress at the time of writing this notebook). The 3 framework we will demonstrate an example of are:

- ray (rllib): see [ray on github](https://github.com/ray-project/ray) or [rllib on github](https://github.com/ray-project/ray/blob/master/doc/source/rllib.rst)
- stable-baselines3: see [stable-baselines3 on github](https://github.com/DLR-RM/stable-baselines3)
- tf_agents: see [tf_agents on github](https://github.com/tensorflow/agents)

Other RL frameworks are not cover here. If you already use them, let us know !
- https://github.com/wau/keras-rl2
- https://github.com/deepmind/acme


<img src="https://colab.research.google.com/assets/colab-badge.svg" width="200">
Execute the cell below by removing the `#` characters if you use google colab !

Cell will look like:
```python
import sys
!$sys.executable install grid2op[optional]  # for use with google colab (grid2Op is not installed by default)
!$sys.executable install tensorflow pytorch stable-baselines3 'ray[rllib]' tf_agents
```

It might take a while
<img src="https://colab.research.google.com/assets/colab-badge.svg" width="200">

In [1]:
import sys
# !$sys.executable install grid2op[optional]  # for use with google colab (grid2Op is not installed by default)
# !$sys.executable -m pip install stable-baselines3 'ray[rllib]' tf_agents

In [2]:
# because this notebook is part of some tests, we train the agent for only a small number of steps
nb_step_train = 0 

## Organisation of this notebook

TODO

## 0) Recommended initial steps

### Create a grid2op environment

This is a rather standard step, with lots of inspiration drawn from openAI gym framework, and there is absolutely no specificity here.

In [3]:
import grid2op
env_name = "l2rpn_case14_sandbox"
env_glop = grid2op.make(env_name, test=True)  # NOTE: do not set the flag "test=True" for a real usage !
# This flag is here for testing purpose !!!
obs_glop = env_glop.reset()
obs_glop



<grid2op.Space.GridObjects.CompleteObservation_l2rpn_case14_sandbox at 0x7f48d25c4970>

### Convert it to a gym environment

To that end, we recommend using the "gym_compat" module. More information is given in the [official grid2op documentation](https://grid2op.readthedocs.io/en/latest/gym.html)

In [4]:
import gym
import numpy as np
from grid2op.gym_compat import GymEnv
env_gym = GymEnv(env_glop)
print(f"The \"env_gym\" is a gym environment: {isinstance(env_gym, gym.Env)}")
obs_gym = env_gym.reset()
# obs_gym

The "env_gym" is a gym environment: True


### Customize the action space and observation space

This step is optional, but highly recommended.

By default, grid2op actions and observations are huge. Even for this very simplistic example, you have really important sizes:

In [5]:
dim_act_space = np.sum([np.sum(env_gym.action_space[el].shape) for el in env_gym.action_space.spaces])
print(f"The size of the action space is : "
      f"{dim_act_space}")
dim_obs_space = np.sum([np.sum(env_gym.observation_space[el].shape).astype(int) 
                        for el in env_gym.observation_space.spaces])
print(f"The size of the observation space is : "
      f"{dim_obs_space}")

The size of the action space is : 160
The size of the observation space is : 432


#### Action space
This is partly due because in grid2op, you can represent the same concept (*eg* reconnect a powerline) in different manners (in this case: either you "toggle a switch" - if the said powerline was connected, it will disconnect it, otherwise it will reconnect it- or you can say "i want this line connected whatever its original state"). This behaviour is detailed in the [official grid2op documentation](https://grid2op.readthedocs.io/en/latest/action.html#usage-examples).

To (in general) reduce the action space by a factor of 2, you can represent these actions only using the change method (for example). You can do that with:

In [6]:
# example: ignore the "set_status" and "set_bus" type of actions, that are covered by the "change_status" and
# "change_bus"

env_gym.action_space = env_gym.action_space.ignore_attr("set_bus").ignore_attr("set_line_status")

new_dim_act_space = np.sum([np.sum(env_gym.action_space[el].shape) for el in env_gym.action_space.spaces])
print(f"The new size of the action space is : {new_dim_act_space}")

The new size of the action space is : 83


Grid2op environments allow for both continuous and discrete action. For the sake of the example, let's "convert" the continuous actions in discrete ones (this is done with "binning" the values as explained in more details [in the documentation](https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.ContinuousToDiscreteConverter) )

In [7]:
# example: convert the continuous action type "redispatch" to a discrete action type
from grid2op.gym_compat import ContinuousToDiscreteConverter
env_gym.action_space = env_gym.action_space.reencode_space("redispatch",
                                                           ContinuousToDiscreteConverter(nb_bins=11)
                                                           )

In [8]:
# And now our action space looks like:
env_gym.action_space

Dict(change_bus:MultiBinary(57), change_line_status:MultiBinary(20), redispatch:MultiDiscrete([11 11  1  1  1 11]))

#### Observation space

For the obsevation space, we will remove lots of useless attributes (remember, it is for the sake of the example here, and rescale some other so that they have numbers between rougly 0. and 1., which stabilizes the learning process.

In [9]:
# first let's see which are the attributes in the observation space:
# More information on
# https://beta-grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes
# and 
# https://grid2op.readthedocs.io/en/latest/gym.html#observation-space-and-action-space-customization
env_gym.observation_space

Dict(a_ex:Box(0.0, inf, (20,), float32), a_or:Box(0.0, inf, (20,), float32), actual_dispatch:Box(-140.0, 140.0, (6,), float32), curtailment:Box(0.0, 1.0, (6,), float32), curtailment_limit:Box(0.0, 1.0, (6,), float32), day:Discrete(32), day_of_week:Discrete(8), duration_next_maintenance:Box(-1, 2147483647, (20,), int32), gen_p:Box(0.0, 168.0, (6,), float32), gen_p_before_curtail:Box(0.0, 168.0, (6,), float32), gen_q:Box(-inf, inf, (6,), float32), gen_v:Box(0.0, inf, (6,), float32), hour_of_day:Discrete(24), line_status:MultiBinary(20), load_p:Box(-inf, inf, (11,), float32), load_q:Box(-inf, inf, (11,), float32), load_v:Box(0.0, inf, (11,), float32), minute_of_hour:Discrete(60), month:Discrete(13), p_ex:Box(-inf, inf, (20,), float32), p_or:Box(-inf, inf, (20,), float32), q_ex:Box(-inf, inf, (20,), float32), q_or:Box(-inf, inf, (20,), float32), rho:Box(0.0, inf, (20,), float32), target_dispatch:Box(-140.0, 140.0, (6,), float32), time_before_cooldown_line:Box(0, 10, (20,), int32), time_bef

Let's keep only the information about the flow on the powerlines: `rho`, the generation `gen_p`, the load `load_p` and the representation of the topology `topo_vect` (for the sake of the example, once again)

In [10]:
env_gym.observation_space = env_gym.observation_space.keep_only_attr(["rho", "gen_p", "load_p", "topo_vect", 
                                                                      "actual_dispatch"])
new_dim_obs_space = np.sum([np.sum(env_gym.observation_space[el].shape).astype(int) 
                        for el in env_gym.observation_space.spaces])
print(f"The new size of the observation space is : "
      f"{new_dim_obs_space} (it was {dim_obs_space} before!)")

The new size of the observation space is : 100 (it was 432 before!)


One other detail here, the generation and loads are not scaled (they are given in MW). We recommend to scale them to have number roughly between 0 and 1 for stability during learning.

This can be done pretty easily with the code below:

In [11]:
from grid2op.gym_compat import ScalerAttrConverter
ob_space = env_gym.observation_space
ob_space = ob_space.reencode_space("actual_dispatch",
                                   ScalerAttrConverter(substract=0.,
                                                       divide=env_glop.gen_pmax
                                                       )
                                   )
ob_space = ob_space.reencode_space("gen_p",
                                   ScalerAttrConverter(substract=0.,
                                                       divide=env_glop.gen_pmax
                                                       )
                                   )
ob_space = ob_space.reencode_space("load_p",
                                  ScalerAttrConverter(substract=obs_gym["load_p"],
                                                      divide=0.5 * obs_gym["load_p"]
                                                      )
                                  )

env_gym.observation_space = ob_space
env_gym.observation_space

Dict(actual_dispatch:Box(-1.0, 1.0, (6,), float32), gen_p:Box(0.0, 1.2000000476837158, (6,), float32), load_p:Box(-inf, inf, (11,), float32), rho:Box(0.0, inf, (20,), float32), topo_vect:Box(-1, 2, (57,), int32))

## 1) RLLIB

This part is not a tutorial on how to use rllib. Please refer to [their documentation](https://docs.ray.io/en/master/rllib.html) for more detailed information.

As explained in the header of this notebook, we will follow the recommended usage:
1. Create a grid2op environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
2. Convert it to a gym environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
3. (optional) Customize the action space and observation space (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
4. Use the framework to train an agent  **(only this part is framework specific)**


The issue with rllib is that it does not take into account MultiBinary nor MultiDiscrete action space (see 
see https://github.com/ray-project/ray/issues/1519) so we need some way to encode these types of actions. This can be done automatically with the `MultiToTupleConverter` provided in grid2op (as always, more information [in the documentation](https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.MultiToTupleConverter) ).

We will then use this to customize our environment previously defined:
    

In [12]:
import copy
env_rllib = copy.deepcopy(env_gym)
from grid2op.gym_compat import MultiToTupleConverter
env_rllib.action_space = env_rllib.action_space.reencode_space("change_bus", MultiToTupleConverter())
env_rllib.action_space = env_rllib.action_space.reencode_space("change_line_status", MultiToTupleConverter())
env_rllib.action_space = env_rllib.action_space.reencode_space("redispatch", MultiToTupleConverter())
env_rllib.action_space

Dict(change_bus:Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2)), change_line_status:Tuple(Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Discrete(2), Dis

Another specificity of RLLIB is that it handles creation of environments "on its own". This implies that you need to create a custom class representing an environment, rather a python object.

And finally, you ask it to use this class, and learn a specific agent. This is really well explained in their documentation: https://docs.ray.io/en/master/rllib-env.html#configuring-environments.

In [13]:
# gym specific, we simply do a copy paste of what we did in the previous cells, wrapping it in the
# MyEnv class, and train a Proximal Policy Optimisation based agent
import gym
import ray
from ray.rllib.agents import ppo
import gym
import numpy as np

      
class MyEnv(gym.Env):
    def __init__(self, env_config):
        import grid2op
        from grid2op.gym_compat import GymEnv
        from grid2op.gym_compat import ScalerAttrConverter, ContinuousToDiscreteConverter, MultiToTupleConverter

        # 1. create the grid2op environment
        if not "env_name" in env_config:
            raise RuntimeError("The configuration for RLLIB should provide the env name")
        nm_env = env_config["env_name"]
        del env_config["env_name"]
        self.env_glop = grid2op.make(nm_env, **env_config)

        # 2. create the gym environment
        self.env_gym = GymEnv(self.env_glop)
        obs_gym = self.env_gym.reset()

        # 3. (optional) customize it (see section above for more information)
        ## customize action space
        self.env_gym.action_space = self.env_gym.action_space.ignore_attr("set_bus").ignore_attr("set_line_status")
        self.env_gym.action_space = self.env_gym.action_space.reencode_space("redispatch",
                                                                             ContinuousToDiscreteConverter(nb_bins=11)
                                                                             )
        self.env_gym.action_space = self.env_gym.action_space.reencode_space("change_bus", MultiToTupleConverter())
        self.env_gym.action_space = self.env_gym.action_space.reencode_space("change_line_status",
                                                                             MultiToTupleConverter())
        self.env_gym.action_space = self.env_gym.action_space.reencode_space("redispatch", MultiToTupleConverter())
        ## customize observation space
        ob_space = self.env_gym.observation_space
        ob_space = ob_space.keep_only_attr(["rho", "gen_p", "load_p", "topo_vect", "actual_dispatch"])
        ob_space = ob_space.reencode_space("actual_dispatch",
                                           ScalerAttrConverter(substract=0.,
                                                               divide=self.env_glop.gen_pmax
                                                               )
                                           )
        ob_space = ob_space.reencode_space("gen_p",
                                           ScalerAttrConverter(substract=0.,
                                                               divide=self.env_glop.gen_pmax
                                                               )
                                           )
        ob_space = ob_space.reencode_space("load_p",
                                           ScalerAttrConverter(substract=obs_gym["load_p"],
                                                               divide=0.5 * obs_gym["load_p"]
                                                               )
                                           )
        self.env_gym.observation_space = ob_space

        # 4. specific to rllib
        self.action_space = self.env_gym.action_space
        self.observation_space = self.env_gym.observation_space

    def reset(self):
        obs = self.env_gym.reset()
        return obs

    def step(self, action):
        obs, reward, done, info = self.env_gym.step(action)
        return obs, reward, done, info

Instructions for updating:
non-resource variables are not supported in the long term


In [14]:
test = MyEnv({"env_name": "l2rpn_case14_sandbox"})

And now you can train it :

In [15]:
if nb_step_train:  # remember: don't forge to change this number to perform an actual training !
    # fist initialize ray
    ray.init()
    try:
        # then define a "trainer"
        trainer = ppo.PPOTrainer(env=MyEnv, config={
            "env_config": {"env_name":"l2rpn_case14_sandbox"},  # config to pass to env class
        })
        # and then train it for a given number of iteration
        for step in range(nb_step_train):
            trainer.train()
    finally:   
        # shutdown ray
        ray.shutdown()

**NB** We want to emphasize here that:
- This encoding is far from being suitable here. It is shown as an example, mainly to demonstrate the use of some of the gym_compat module
- The actions in particular are not really suited here. Actions in grid2op are relatively complex and encoding them this way does not seem like a great idea. For example, with this encoding, the agent will have to learn that it cannot act on more than 2 lines or two substations at the same time...
- The "PPO" agent shown here, with some default parameters is unlikely to lead to a good agent. You might want to read litterature on past L2RPN agents or draw some inspiration from L2RPN baselines packages for more information.

## 2) Stable baselines

This part is not a tutorial on how to use stable baselines. Please refer to [their documentation](https://stable-baselines3.readthedocs.io/en/master/) for more detailed information.

As explained in the header of this notebook, we will follow the recommended usage:
1. Create a grid2op environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
2. Convert it to a gym environment (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
3. (optional) Customize the action space and observation space (see section [0) Recommended initial steps](#0\)-Recommended-initial-steps))
4. Use the framework to train an agent  **(only this part is framework specific)**


The issue with stable beselines 3 is that it expects standard action / observation types as explained there:
https://stable-baselines3.readthedocs.io/en/master/guide/algos.html#rl-algorithms

> Non-array spaces such as Dict or Tuple are not currently supported by any algorithm.

Unfortunately, it's not possible to convert without any "loss of information" an action space of dictionnary type to a vector.

It is possible to use the grid2op framework in such cases, and in this section, we will explain how.


First, as always, we convert the grid2op environment in a gym environment.

In [16]:
env_sb = GymEnv(env_glop)  # sb for "stable baselines"
glop_obs = env_glop.reset()

Then, we need to convert everything into a "Box" as it is the only things that stable baselines seems to digest at time of writing (March 20201).

### Observation Space

We explain here how we convert an observation as a single Box. This step is rather easy, you just need to specify which attributes of the observation you want to keep and if you want so scale them (with the keword `substract` and `divide`)

In [17]:
from grid2op.gym_compat import BoxGymObsSpace
env_sb.observation_space = BoxGymObsSpace(env_sb.init_env.observation_space,
                                          attr_to_keep=["gen_p", "load_p", "topo_vect",
                                                        "rho", "actual_dispatch", "connectivity_matrix"],
                                          divide={"gen_p": env_glop.gen_pmax,
                                                  "load_p": glop_obs.load_p,
                                                  "actual_dispatch": env_glop.gen_pmax},
                                          functs={"connectivity_matrix": (
                                                      lambda grid2obs: grid2obs.connectivity_matrix().flatten(),
                                                      0., 1., None, None,
                                                      )
                                                 }
                                         )
obs_gym = env_sb.reset()

In [18]:
obs_gym in env_sb.observation_space

True

**NB**: the above code is equivalent to something like:

```python
from gym.spaces import Box
class BoxGymObsSpaceExample(Box):
    def __init__(self, observation_space)
        shape = observation_space.n_gen + \     # dimension of gen_p
                observation_space.n_load + \    # load_p
                observation_space.dim_topo + \  # topo_vect
                observation_space.n_line + \    # rho
                observation_space.n_gen + \     # actual_dispatch
                observation_space.dim_topo ** 2 # connectivity_matrix
        
        ob_sp = observation_space
        # lowest value the attribute can take (see doc for more information)
        low = np.concatenate((np.full(shape=(ob_sp.n_gen,), fill_value=0., dtype=dt_float),  # gen_p
                              np.full(shape=(ob_sp.n_load,), fill_value=-np.inf, dtype=dt_float),  # load_p
                              np.full(shape=(ob_sp.dim_topo,), fill_value=-1., dtype=dt_float),  # topo_vect
                              np.full(shape=(ob_sp.n_line,), fill_value=0., dtype=dt_float),  # rho
                              np.full(shape=(ob_sp.n_line,), fill_value=-ob_sp.gen_pmax, dtype=dt_float),  # actual_dispatch
                              np.full(shape=(ob_sp.dim_topo**2,), fill_value=0., dtype=dt_float),  #  connectivity_matrix
                              ))
        
        # highest value the attribute can take
        high = np.concatenate((np.full(shape=(ob_sp.n_gen,), fill_value=np.inf, dtype=dt_float),  # gen_p
                              np.full(shape=(ob_sp.n_load,), fill_value=np.inf, dtype=dt_float),  # load_p
                              np.full(shape=(ob_sp.dim_topo,), fill_value=2., dtype=dt_float),  # topo_vect
                              np.full(shape=(ob_sp.n_line,), fill_value=np.inf, dtype=dt_float),  # rho
                              np.full(shape=(ob_sp.n_line,), fill_value=ob_sp.gen_pmax, dtype=dt_float),  # actual_dispatch
                              np.full(shape=(ob_sp.dim_topo**2,), fill_value=1., dtype=dt_float),  #  connectivity_matrix
                              ))
        Box.__init__(self, low=low, high=high, shape=shape)
     
    def to_gym(self, observation):
        res = np.concatenate((obs.gen_p / obs.gen_pmax,
                              obs.prod_p / glop_obs.load_p,
                              obs.topo_vect.astype(float),
                              obs.rho,
                              obs.actual_dispatch / env_glop.gen_pmax,
                              obs.connectivity_matrix().flatten()
                             ))
        return res
```

So if you want more customization, but making less generic code (the `BoxGymObsSpace` works for all the attribute of the observation) you can customize it by adapting the snippet above or read the documentation here (TODO).

Only the "to_gym" function, and this exact signature is important in this case. It should take an observation in a grid2op format and return this same observation compatible with the gym Box (so a numpy array with the right shape and in the right range)
                

### Action space

Converting the grid2op actions in something that is not a Tuple, nor a Dict. The main restriction in these frameworks is that they do not allow for easy integration of environment where both discrete actions and continuous actions are possible.


#### Using a BoxGymActSpace

We can use the same kind of method explained above with the use of the class `BoxGymActSpace`. In this case, you need to provide a way to convert a numpy array (an element of a gym Box) into a grid2op action.

**NB** This method is particularly suited if you want to focus on CONTINUOUS part of the action space, for example redispatching, curtailment or action on storage unit.

Though we made it possible to also use discrete action, we do not recommend to use it. Prefer using the `MultiDiscreteActSpace` for such purpose.

In [37]:
from grid2op.gym_compat import BoxGymActSpace
scale_gen =  env_sb.init_env.gen_max_ramp_up + env_sb.init_env.gen_max_ramp_down
scale_gen[~env_sb.init_env.gen_redispatchable] = 1.0
env_sb.action_space = BoxGymActSpace(env_sb.init_env.action_space,
                                     attr_to_keep=["redispatch"],
                                     multiply={"redispatch": scale_gen},
                                    )
obs_gym = env_sb.reset()

**NB**: the above code is equivalent to something like:

```python
from gym.spaces import Box
class BoxGymActSpace(Box):
    def __init__(self, action_space)
        shape = observation_space.n_gen  # redispatch
        
        ob_sp = observation_space
        # lowest value the attribute can take (see doc for more information)
        low = np.full(shape=(ob_sp.n_gen,), fill_value=-1., dtype=dt_float)
        
        # highest value the attribute can take
        high = np.full(shape=(ob_sp.n_gen,), fill_value=1., dtype=dt_float)
        
        Box.__init__(self, low=low, high=high, shape=shape)
     
        self.action_space = action_space
        
    def from_gym(self, gym_observation):
        res = self.action_space()
        res.redispatch = gym_observation * scale_gen
        return res
```

So if you want more customization, but making less generic code (the `BoxGymActSpace` works for all the attribute of the action) you can customize it by adapting the snippet above or read the documentation here (TODO). The only important method you need to code is the "from_gym" one that should take into account an action as sampled by the gym Box and return a grid2op action.


#### Using a MultiDiscreteActSpace

TODO

### Wrapping all up and starting the training

First, let's make sure our environment is compatible with stable baselines, thanks to their helper function.

This means that 

In [38]:
from stable_baselines3.common.env_checker import check_env
check_env(env_sb)

So as we see, the environment seems to be compatible with stable baselines. Now we can start the training.

In [40]:
from stable_baselines3 import PPO
model = PPO("MlpPolicy", env_sb, verbose=1)
if nb_step_train:
    model.learn(total_timesteps=nb_step_train)
    # model.save("ppo_stable_baselines3")

Using cpu device
Wrapping the env in a DummyVecEnv.
------------------------------------------
| time/                   |              |
|    fps                  | 57           |
|    iterations           | 1            |
|    time_elapsed         | 35           |
|    total_timesteps      | 2048         |
| train/                  |              |
|    approx_kl            | 0.0012200128 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -8.51        |
|    explained_variance   | -8.32e+09    |
|    learning_rate        | 0.0003       |
|    loss                 | 4.93e+07     |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.000874    |
|    std                  | 1            |
|    value_loss           | 9.86e+07     |
------------------------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 55           |
| 

Again, the goal of this section was not to demonstrate how to train a state of the art algorithm, but rather to demonstrate how to use grid2op with the stable baselines repository.

Most importantly, the neural networks there are not customized for the environment, default parameters are used. This is unlikely to work at all !

For more information and to use tips and tricks to get started with RL agents, the devs of "stable baselines" have done a really nice job. You can have some tips for training RL agents here
https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
and consult any of the resources listed there https://stable-baselines3.readthedocs.io/en/master/guide/rl.html

![](https://blog.planview.com/wp-content/uploads/2020/02/limiting-work-in-progress.jpg)