grid2op.Environment

Environment

This page is organized as follow:

Table of Contents

Objectives

This module defines the Environment the higher level representation of the world with which an grid2op.Agent.BaseAgent will interact.

The environment receive an grid2op.Action.BaseAction from the grid2op.Agent.BaseAgent in the Environment.step and returns an grid2op.Observation.BaseObservation that the grid2op.Agent.BaseAgent will use to perform the next action.

An environment is better used inside a grid2op.Runner.Runner, mainly because runners abstract the interaction between environment and agent, and ensure the environment are properly reset after each episode.

Usage

In this section we present some way to use the Environment class.

Basic Usage

This example is adapted from gym documentation available at gym random_agent.py ):

import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments
episode_count = 100  # i want to make 100 episodes

# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# and now the loop starts
for i in range(episode_count):
    obs = env.reset()
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over
           break

# Close the env and write monitor result info to disk
env.close()
print("The total reward was {:.2f}".format(total_reward))

What happens here is the following:

obs = env.reset() will reset the environment to be usable again. It will load, by default the next "chronics" (you can imagine chronics as the graphics of a video game: it tells where the enemies are located, where are the walls, the ground etc. - each chronics can be thought a different "game level").
action = agent.act(obs, reward, done) will chose an action facing the observation ob. This action should be of type grid2op.Action.BaseAction (or one of its derivate class). In case of a video game that would be you receiving and observation (usually display on the screen) and action on a controller. For example you could chose to go "left" / "right" / "up" or "down". Of course in the case of the powergrid the actions are more complicated that than.
obs, reward, done, info = env.step(action) is the call to go to the next steps. You can imagine it as being a the next "frame". To continue the parallel with video games, at the previous line you asked "pacman" to go left (for example) and then the next frame is displayed (here returned as an new observation obs).

You might want to customize this general behaviour in multiple way:

you might want to study only one chronics (equivalent to only one level of a video game) see Study always the same chronics
you might want to loop through the chronics, but not always in the same order. If that is the case you might want to consult the section Shuffle the chronics order
you might also have spotted some chronics that have bad properties. In this case, you can "remove" them from the environment (they will be ignored). This is explained in Skipping some chronics
you might also want to select at random, the next chronic you will use. This allows some compromise between all the above solution. Instead of ignoring some chronics you might want to select them less frequently, instead of always using the same one, you can sampling it more often and of course, because the sampling is done randomly it's unlikely that the order will remain the same. To use that you can check the Sampling the chronics

In a different scenarios, you might also want to skip the first time steps of the chronics, that would be equivalent to starting into the "middle" of a video game. If that is the case, the subsection Skipping some time steps is made for you.

Finally, you might have noticed that each call to "env.reset" might take a while. This can dramatically increase the training time, especially at the beginning. This is due to the fact that each time env.reset is called, the whole chronics is read from the hard drive. If you want to lower this impact then you might consult the Optimize the data pipeline section.

Chronics Customization

Study always the same chronics

If you spotted a particularly interesting chronics, or if you want, for some reason your agent to see only one chronics, you can do this rather easily with grid2op.

All chronics are given a unique persistent ID (it means that as long as the data is not modified the same chronics will have always the same ID each time you load the environment). The environment has a "set_id" method that allows you to use it. Just add "env.set_id(THE\_ID\_YOU\_WANT)" before the call to "env.reset". This gives the following code:

import grid2op
from grid2op.Agent import RandomAgent
env = grid2op.make()
agent = RandomAgent(env.action_space)
env.seed(0)  # for reproducible experiments
episode_count = 100  # i want to make 100 episodes

###################################
THE_CHRONIC_ID = 42
###################################

# i initialize some useful variables
reward = 0
done = False
total_reward = 0

# and now the loop starts
for i in range(episode_count):
    ###################################
    env.set_id(THE_CHRONIC_ID)
    ###################################

    obs = env.reset()

    # now play the episode as usual
    while True:
       action = agent.act(obs, reward, done)
       obs, reward, done, info = env.step(action)
       total_reward += reward
       if done:
           # in this case the episode is over
           break

# Close the env and write monitor result info to disk
env.close()
print("The total reward was {:.2f}".format(total_reward))