638 lines (486 loc) · 24.5 KB


This page is organized as follow:

Table of Contents


This module defines the Environment the higher level representation of the world with which an grid2op.Agent.BaseAgent will interact.

The environment receive an grid2op.Action.BaseAction from the grid2op.Agent.BaseAgent in the Environment.step and returns an grid2op.Observation.BaseObservation that the grid2op.Agent.BaseAgent will use to perform the next action.

An environment is better used inside a grid2op.Runner.Runner, mainly because runners abstract the interaction between environment and agent, and ensure the environment are properly reset after each episode.


In this section we present some way to use the Environment class.

Basic Usage

This example is adapted from gym documentation available at gym ):

import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments
episode_count = 100  # i want to make 100 episodes

# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# and now the loop starts
for i in range(episode_count):
    obs = env.reset()
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over

# Close the env and write monitor result info to disk
print("The total reward was {:.2f}".format(total_reward))

What happens here is the following:

  • obs = env.reset() will reset the environment to be usable again. It will load, by default the next "chronics" (you can imagine chronics as the graphics of a video game: it tells where the enemies are located, where are the walls, the ground etc. - each chronics can be thought a different "game level").
  • action = agent.act(obs, reward, done) will chose an action facing the observation ob. This action should be of type grid2op.Action.BaseAction (or one of its derivate class). In case of a video game that would be you receiving and observation (usually display on the screen) and action on a controller. For example you could chose to go "left" / "right" / "up" or "down". Of course in the case of the powergrid the actions are more complicated that than.
  • obs, reward, done, info = env.step(action) is the call to go to the next steps. You can imagine it as being a the next "frame". To continue the parallel with video games, at the previous line you asked "pacman" to go left (for example) and then the next frame is displayed (here returned as an new observation obs).

You might want to customize this general behaviour in multiple way:

  • you might want to study only one chronics (equivalent to only one level of a video game) see Study always the same chronics
  • you might want to loop through the chronics, but not always in the same order. If that is the case you might want to consult the section Shuffle the chronics order
  • you might also have spotted some chronics that have bad properties. In this case, you can "remove" them from the environment (they will be ignored). This is explained in Skipping some chronics
  • you might also want to select at random, the next chronic you will use. This allows some compromise between all the above solution. Instead of ignoring some chronics you might want to select them less frequently, instead of always using the same one, you can sampling it more often and of course, because the sampling is done randomly it's unlikely that the order will remain the same. To use that you can check the Sampling the chronics

In a different scenarios, you might also want to skip the first time steps of the chronics, that would be equivalent to starting into the "middle" of a video game. If that is the case, the subsection Skipping some time steps is made for you.

Finally, you might have noticed that each call to "env.reset" might take a while. This can dramatically increase the training time, especially at the beginning. This is due to the fact that each time env.reset is called, the whole chronics is read from the hard drive. If you want to lower this impact then you might consult the Optimize the data pipeline section.

Chronics Customization

Study always the same chronics

If you spotted a particularly interesting chronics, or if you want, for some reason your agent to see only one chronics, you can do this rather easily with grid2op.

All chronics are given a unique persistent ID (it means that as long as the data is not modified the same chronics will have always the same ID each time you load the environment). The environment has a "set_id" method that allows you to use it. Just add "env.set_id(THE\_ID\_YOU\_WANT)" before the call to "env.reset". This gives the following code:

import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments
episode_count = 100  # i want to make 100 episodes


# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# and now the loop starts
for i in range(episode_count):

    obs = env.reset()

    # now play the episode as usual
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over

# Close the env and write monitor result info to disk
print("The total reward was {:.2f}".format(total_reward))

Shuffle the chronics order

In some other usecase, you might want to go through the whole set of chronics, and then loop again through them, but in a different order (remember that by default it will always loop in the same order 0, 1, 2, 3, ..., 0, 1, 2, 3, ..., 0, 1, 2, 3, ...).

Again, doing so with grid2op is rather easy. To that end you can use the chronics_handler.shuffle function that will do exactly that. You can use it like this:

import numpy as np
import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments
episode_count = 10000  # i want to make lots of episode

# total number of episode
total_episode = len(env.chronics_handler.subpaths)

# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# and now the loop starts
for i in range(episode_count):

    if i % total_episode == 0:
        # I shuffle each time i need to

    obs = env.reset()
    # now play the episode as usual
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over

Skipping some chronics

Some chronics might be too hard to start a training ("learn to walk before running") and conversely some chronics might be too easy after a while (you can solve them without doing nothing basically). This is why grid2op allows you to have some control about which chronics will be used by the environment.

For this purpose you can use the chronics_handler.set_filter function. This function takes a "filtering function" as argument. This "filtering function" takes as argument the full path of the chronics and should return True / False whether or not you want to keep the There is an example:

import numpy as np
import re
import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments

# this is the only line of code to add
# here i select only the chronics that start by "00"
env.chronics_handler.set_filter(lambda path: re.match(".*00[0-9].*", path) is not None)
kept = env.chronics_handler.reset()  # if you don't do that it will not have any effect
print(kept)  # i print the chronics kept

episode_count = 10000  # i want to make lots of episode

# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# and now the loop starts
# it will only used the chronics selected
for i in range(episode_count):
    obs = env.reset()
    # now play the episode as usual
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over

Sampling the chronics

Finally, for even more flexibility, you can choose to sample what will be the next used chronics. To achieve that you can call the chronics_handler.sample_next_chronics This function takes a vector of probabilities as input (if not provided it assumes all probabilities are equal) and will select an id based on this probability vector.

In the following example we assume that the vector of probabilities is always the same and that we want, for some reason oversampling the 10 first chronics, and under sample the last 10:

import numpy as np
import re
import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments

episode_count = 10000  # i want to make lots of episode

# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# total number of episode
total_episode = len(env.chronics_handler.subpaths)
probas = np.ones(total_episode)
# oversample the first 10 episode
probas[:10]*= 5
# undersample the last 10 episode
probas[-10:] /= 5

# and now the loop starts
# it will only used the chronics selected
for i in range(episode_count):

    _ = env.chronics_handler.sample_next_chronics(probas)  # this is added
    obs = env.reset()

    # now play the episode as usual
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over

NB here we have a constant vector of probabilities, but you might imagine adapting it during the training, for example to oversample scenarios your agent is having trouble to solve during the training.

Skipping some time steps

Another way to customize which data your agent will face is to make as if the chronics started at different date and time. This might be handy in case a scenario is hard at the beginning but less hard at the end, or if you want your agent to learn to start controlling the grid at any date and time (in grid2op most of the chronics data provided start at midnight for example).

To achieve this goal, you can use the BaseEnv.fast_forward_chronics function. This function skip a given number of steps. In the following example, we always skip the first 42 time steps before starting the episode:

import numpy as np
import re
import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments

episode_count = 10000  # i want to make lots of episode

# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# and now the loop starts
# it will only used the chronics selected
for i in range(episode_count):
    obs = env.reset()

    # below are the two lines added
    obs = env.get_obs()

    # now play the episode as usual
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over

Generating chronics that are always new

1.6.6 This functionality is only available for some environments, for example "l2rpn_wcci_2022"


A much better alternative to this class is to have a "process" generate the data, thanks to the grid2op.Environment.Environment.generate_data and then to reload the data in a (separate) training script.

This is explained in section generate_data_flow of the documentation.

Though it is not recommended at all (for performance reasons), you have, starting from grid2op 1.6.6 (and using a compatible environment eg "l2rpn_wcci_2022") to generate a possibly infinite amount of data thanks to the grid2op.Chronics.FromChronix2grid class.

The data generation process is rather slow for different reasons. The main one is that the data need to meet a lot of "constraints" to be realistic, some of them are given in the modeled-elements-module module. On our machines, it takes roughly 40-50 seconds to generate a weekly scenario for the l2rpn_wcci_2022 environment (usually an agent will fail in 1 or 2s... This is why we do not recommend to use it)

To generate data "on the fly" you simply need to create the environment with the right chronics class as follow:

import grid2op
from grid2op.Chronics import FromChronix2grid
env_nm = "l2rpn_wcci_2022"  # only compatible environment at time of writing

env = grid2op.make(env_nm,
                   data_feeding_kwargs={"env_path": os.path.join(grid2op.get_current_local_dir(), env_nm),
                                        "with_maintenance": True,  # whether to include maintenance (optional)
                                        "max_iter": 2 * 288,  # duration (in number of steps) of the data generated (optional)

And this is it. Each time you call env.reset() it will internally call chronix2grid package to generate new data for this environment (this is why env.reset() will take roughly 50s...).


For this class to be available, you need to have the "chronix2grid" package installed and working.

Please install it with pip intall grid2op[chronix2grid] and make sure to have the coinor-cbc solver available on your system (more information at


Because I know from experience warnings are skipped half of the time: please consult generate_data_flow for a better way to generate infinite data !

Generate and use an "infinite" data



For this class to be available, you need to have the "chronix2grid" package installed and working.

Please install it with pip intall grid2op[chronix2grid] and make sure to have the coinor-cbc solver available on your system (more information at

In this section we present a new way to generate possibly an infinite amount of data for training your agent ( in case the data shipped with the environment are too limited).

One way to do this is to split the data "generation" process on one python script, and the data "consumption" process (for example by training an agent) on another one.

This is much more efficient than using the grid2op.Chronics.FromChronix2grid because you will not spend 50s waiting the data to be generated at each call to env.reset() after the episode is over.

First, create a script to generate all the data that you want. For example in the script "":

import grid2op            
env_name = "l2rpn_wcci_2022"  # only compatible with what comes next (at time of writing)
env = grid2op.make(env_name)
nb_year = 50  # or any "big" number...
env.generate_data(nb_year=nb_year)  # generates 50 years of data 
# (takes roughly 50s per week, around 45mins per year, in this case 50 * 45 mins = 37.5 hours)

Then create a script to "consume" your data, for example by training an agent (say "") [we demonstrate it with l2rpn baselines but you can use whatever you want]:

import os
import grid2op
from lightsim2grid import LightSimBackend  # highly recommended for speed !

env_name = "l2rpn_wcci_2022"  # only compatible with what comes next (at time of writing)
env = grid2op.make(env_name, backend=LightSimBackend())

# now train an agent
# see l2rpn_baselines package for more information, for example
from l2rpn_baselines.PPO_SB3 import train
nb_iter = 10000  # train for that many iterations
agent_name = "WhaetverIWant"  # or any other name
agent_path = os.path.expand("~")  # or anywhere else on your computer
trained_agent = train(env,
# this agent will be trained only on the data available at the creation of the environment

# the training loop will take some time, so more data will be generated when it's over
# reload them

# and retrain your agent including the data you just generated
trained_agent = train(env,

# once it's over, more time has passed, and more data are available
# reload them

# and retrain your agent
trained_agent = train(env,

# well you got the idea
# etc. etc.


This way of doing things will always increase the size of the data in your hard drive. We do recommend to somehow delete some of the data from time to time

Deleting the data you be done before the env.chronics_handler.init_subpath() for example:

Splitting into raining, validation, test scenarios

In machine learning the "training / validation / test" framework is particularly usefull to avoid overfitting and develop models as performant as possible.

Grid2op allows for such usage at the environment level. There is the possibility to "split" an environment into training / validation and test (ie using only some chronics for training, some others for validation and some others for testing).

This can be done with:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other...
env = grid2op.make(env_name)

# extract 1% of the "chronics" to be used in the validation environment. The other 99% will
# be used for test
nm_env_train, nm_env_val, nm_env_test = env.train_val_split_random(pct_val=1., pct_test=1.)

# and now you can use the training set only to train your agent:
print(f"The name of the training environment is \\"{nm_env_train}\\"")
print(f"The name of the validation environment is \\"{nm_env_val}\\"")
print(f"The name of the test environment is \\"{nm_env_test}\\"")
env_train = grid2op.make(nm_env_train)

You can then use, in the above case:

import grid2op
env_name = "l2rpn_case14_sandbox"  # matching above

env_train = grid2op.make(env_name+"_train")  # to only use the "training chronics"
# do whatever you want with env_train

And then, at time of validation:

import grid2op
env_name = "l2rpn_case14_sandbox"  # matching above

env_val = grid2op.make(env_name+"_val") # to only use the "validation chronics"
# do whatever you want with env_val

# and of course
env_test = grid2op.make(env_name+"_test")


Environments can be customized in three major ways:

  • `Backend`: you change the solver that computes the state of the power more or less faste or be more realistically
  • `Parameters`: you change the behaviour of the Environment. For example you can prevent the powerline to be disconnected when too much current flows on it etc.
  • `Rules`: you can affect the operational constraint that your agent must meet. For example you can affect more or less powerlines in the same action etc.

You can do these at creation time:

import grid2op
env_name = "l2rpn_case14_sandbox"  # or any other name

# create the regular environment:
env_reg = grid2op.make(env_name)

# to change the backend
# (here using the lightsim2grid faster backend)
from lightsim2grid import LightSimBackend
env_faster = grid2op.make(env_name, backend=LightSimBackend())

# to change the parameters, for example
# to prevent line disconnect when there is overflow
param = env_reg.parameters
env_easier = grid2op.make(env_name, param=param)

Of course you can combine everything. More examples are given in section env_cust_makeenv.

Detailed Documentation by class
