Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection bug for forecast observation #568

Closed
jna29 opened this issue Dec 7, 2023 · 15 comments
Closed

Garbage collection bug for forecast observation #568

jna29 opened this issue Dec 7, 2023 · 15 comments
Labels
documentation Improvements or additions to documentation

Comments

@jna29
Copy link

jna29 commented Dec 7, 2023

Environment

  • Grid2op version: 1.9.7
  • System: Springdale Open Enterprise Linux 9.2 (Parma)

Bug description

Appending an Observation object returned from forecast_obs = obs.forecast_env().reset() to a list class attribute in a function results in forecast_obs being garbage collected when we the function is exited. I believe that this is due to the fact that __del__ is being called on forecast_obs._obs_env when the function foo below is exited.

The error still persists if we remove LightSimBackend (i.e. use the default backend) or change the environment or if we copy forecast_obs before adding it to the list.

How to reproduce

Command line

Code snippet

import grid2op
from lightsim2grid import LightSimBackend

env_name = "l2rpn_wcci_2022"
env = grid2op.make(env_name, backend=LightSimBackend())
_ = env.reset()

class Manager:
    obs_list = []

def foo():
    obs = env.reset()
    forecast_env = obs.get_forecast_env()
    forecast_obs = forecast_env.reset()
    Manager.obs_list.append(forecast_obs)
    print(f"forecast_obs: {forecast_obs}")
    print(f"[FOO] Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"[FOO] Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")


if __name__ == '__main__':
    foo()
    print(f"Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")

Current output

forecast_obs: <grid2op.Space.GridObjects.ObservationWCCI2022_l2rpn_wcci_2022 object at 0x7f2f7e5f3e80>    
[FOO] Manager.obs_list[0]._obs_env: <grid2op.Environment._obsEnv._ObsEnv_l2rpn_wcci_2022 object at 0x7f2f7e
5f3ee0>                                                                                                   
[FOO] Manager.obs_list[0]._obs_env.backend is None: False                                                  
Manager.obs_list[0]._obs_env: <grid2op.Environment._obsEnv._ObsEnv_l2rpn_wcci_2022 object at 0x7f2f7e5f3ee0
>                                                                                                          
Manager.obs_list[0]._obs_env.backend is None: True 

Expected output

Manager.obs_list[0]._obs_env.backend is None: False

The backend should still remain even after we exit foo

@jna29 jna29 added the bug Something isn't working label Dec 7, 2023
@BDonnot
Copy link
Collaborator

BDonnot commented Dec 22, 2023

Hello,

This seems to be a bug indeed. In short term (I will not fix this in 2023) you can call:

forecast_obs = forecast_env.reset().copy()
forecast_obs._obs_env = forecast_obs._obs_env.copy()

EDIT: I now (after further investigation) believe this is not a good idea to do this. See my next post for more information.

@jna29
Copy link
Author

jna29 commented Jan 5, 2024

Thanks this worked

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 10, 2024

Hello,

Upon further investigation I am not sure this is a bug from grid2op (though it sure looks like it).

What happens here is that when the "forecast environment" is deleted, all the resources it holds are closed. This is a normal / pythonic behaviour.
But as grid2op might use some external libraries (coded in lower level languages such as c++ or even remotely) I chose to dive a bit deeper on the "ownership" of the attributes. A fancy word to say that "everything belong to the original environment, if you delete it, it deletes everything it possesses" (unless your force the copies of course, which is not the best solution I think now that I looked at it).
This concept of "environment owns the objects and might delete them regardless of what is done elsewhere" is not really pythonic I fully agree. I'll document it somewhere.

If I understood your usecase, you want to store the forecasted observations.

Maybe you can use https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecast_arrays or https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecasted_inj

If you want to still be able to perform powerflows with what you have, you can also use the Simulator class which is basically a super light environment that owns its simulator, see https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_simulator

If you detail a bit more your needs I might help you find the most efficient way to do it :-)

And, if you want minimal changes to the code you sent me, you can of course still keep a reference to each forecast env you creates (to avoid their deletion) with:

import grid2op
from lightsim2grid import LightSimBackend

env_name = "l2rpn_wcci_2022"
env = grid2op.make(env_name, backend=LightSimBackend())
_ = env.reset()

class Manager:
    obs_list = []
    for_env_list = []

def foo():
    obs = env.reset()
    forecast_env = obs.get_forecast_env()
    forecast_obs = forecast_env.reset()
    Manager.obs_list.append(forecast_obs)
    Manager.for_env_list.append(forecast_env)
    print(f"forecast_obs: {forecast_obs}")
    print(f"[FOO] Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"[FOO] Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")


if __name__ == '__main__':
    foo()
    print(f"Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
    print(f"Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")

(I am not sure saving some data as class attribute in this way is a good think to do in general [just like global variables] it might lead to really hard to spot and debug wrong behaviour. I would advise against it)

@BDonnot BDonnot added documentation Improvements or additions to documentation and removed bug Something isn't working labels Jan 10, 2024
BDonnot added a commit to BDonnot/Grid2Op that referenced this issue Jan 10, 2024
@jna29
Copy link
Author

jna29 commented Jan 16, 2024

So I need to store the forecasted observations to use forecast_obs.get_env_from_external_forecasts later on in other functions and I need to make simulations from arbitrary states. The Simulator class doesn't allow this as the returned observations can't be used later on to create another Simulator (as far as I know). Also, I believe the Simulator.predict method doesn't compute the reward of a (state, action) pair.

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 16, 2024

Hello,

A Simulator can be copied and you can generate another "observation" from it if you want.

That's how it's designed to work if I remember correctly.

Though I agree, you don't get any reward from the simulator. Only an environment can compute it. That being said, you can compute something like a reward from the observation you got with the simulator.

@jna29
Copy link
Author

jna29 commented Jan 17, 2024

So the issue is that I want to perform a prediction from the observation I get from Simulator.predict but calling sim_stressed.current_obs.get_simulator() where sim_stressed = obs.predict() returns the error:

grid2op.Exceptions.observationExceptions.BaseObservationError: Grid2OpException BaseObservationError "Impossible to build a simulator is the observation space does not support it. This can be the case if the observation is loaded from disk or if the backend cannot be copied for example."

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 18, 2024

I'm not sure you need the observation for this.

You get a new simulator class (with its observation) each time you call simulator.predict

So basically your work flow could look like :

simulator_init = obs.get_simulator()
# to simulate action whatever on the current step 
simulator1 = simulator.predict(act=whatever) 
simulator2 = simulator.predict(act=wheteve2) 

# to pass to next forecast step 
forecats_horizon = 1
load_p, load_q, prod_p, prod_v, maintenance = obs.get_forecast_arrays()

simulator1_h1 = simulator.predict(act=whatever, new_gen_p=gen_p[forecast_horizon, :], new_load_p=same, etc. ) 

This, of I understood correctly should do what you want.

@jna29
Copy link
Author

jna29 commented Feb 8, 2024

Sorry for the late reply.

So what you say would work for my use case other than the fact that I can't compute rewards and terminality (whether done=True) without having access to an environment

@BDonnot
Copy link
Collaborator

BDonnot commented Feb 8, 2024

I don't know your use case you did not really described it. Of if you did I missed it.

What I'm saying is that from what I understand maybe the Simulator would work.

You have access to done flag there (not an output of the predict function but an attribute of the Simulator object). And also as far as I know every rewards (as far as I know) can be computed using the observation (and maybe past observations) in grid2op. I never encountered a reward that could not be computed from the actual state of the grid.

Not sure if it answers your non question. But now you have maybe more information to decide what you want to use and how.

Because at the end, if it works and you're happy about the solution then you can use whatever you want 😊

@jna29
Copy link
Author

jna29 commented Feb 8, 2024

So I want to be able to perform rollouts with Grid2op environments in an MCTS style where I can start a rollout of arbitrary length from an arbitrary state I have visited before and get the next state, reward and done flag after every action. I would also like to inject predictions from an external source of the load and generator values when perform an action. This is why I use a forecast environment as it allows me to do all of this.

I inspected the Simulator object and I couldn't see any attribute that corresponds to the done flag (I only saw backend, converged, current_obs and error) and the reward classes derived from BaseReward need an Environment object to be input to the BaseReward.__call__ method (https://grid2op.readthedocs.io/en/latest/reward.html#grid2op.Reward.BaseReward.__call__). For the latter, are these the correct reward functions to look at?

@BDonnot
Copy link
Collaborator

BDonnot commented Feb 9, 2024

Hello,

So I want to be able to perform rollouts with Grid2op environments in an MCTS style where I can start a rollout of arbitrary length from an arbitrary state I have visited before and get the next state, reward and done flag after every action. I would also like to inject predictions from an external source of the load and generator values when perform an action. This is why I use a forecast environment as it allows me to do all of this.

Thanks for describing me your usecase. It's clearer now.

Some people have already applied with success MCTS to grid2op so i'm confident it's doable.

And if you find that ForecastedEnv works for you then no issue at all with me :-)

I inspected the Simulator object and I couldn't see any attribute that corresponds to the done flag (I only saw backend, converged, current_obs and error)

converged should match the flag done.

The main difference for the simulator is that it will not check the validity of the action. It will do the action even though "in reality" this action would be illegal.

eward classes derived from BaseReward need an Environment object to be input to the BaseReward.call method (https://grid2op.readthedocs.io/en/latest/reward.html#grid2op.Reward.BaseReward.__call__). For the latter, are these the correct reward functions to look at?

Yes these are.

Notice I did not say "currently the reward is computed without the environment". I said that for the vast majority of reward, in fact you can just use an observation.

For example, for EconomicReward you have:

    def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
        if has_error or is_illegal or is_ambiguous:
            res = self.reward_min
        else:
            # compute the cost of the grid
            res = dt_float((env.get_obs(_do_copy=False).prod_p * env.gen_cost_per_MW).sum() * env.delta_time_seconds / 3600.0)
            # we want to minimize the cost by maximizing the reward so let's take the opposite
            res *= dt_float(-1.0)
            # to be sure it's positive, add the highest possible cost
            res += self.worst_cost

        res = np.interp(
            res, [dt_float(0.0), self.worst_cost], [self.reward_min, self.reward_max]
        )
        return dt_float(res)

This reward litterally calls "env.get_obs()" so it does not use anything from the environment that would be hidden in the observation.

Another example, the EpisodeDurationReward for which you have the implementation

    def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
        if is_done:
            res = env.nb_time_step
            if np.isfinite(self.total_time_steps):
                res /= self.total_time_steps
        else:
            res = self.reward_min
        return res

Here it uses env.nb_time_step which is the same as https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.current_step and also self.total_time_steps which is initialized a bit above (in the actual code, not in the piece of code there) with https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.max_step

So again no issues to use an observation in this case. And I could continue for most rewards available in grid2op.

@jna29
Copy link
Author

jna29 commented Feb 13, 2024

Thanks for clarifying my questions.

However, I don't understand why the reward functions require as input an Environment? If it only depends on the observation wouldn't it make sense to just have an observation be an input instead of an environment?

@BDonnot
Copy link
Collaborator

BDonnot commented Feb 14, 2024

Because in the RL community, in theory you can design some rewards that uses part of the environment not shown in the observation.
If you look at Markov Decision Process (MDP) and especially Partially Observable Markov Decision Process (POMDP) then you notice the reward kernal takes as input the state of the environment (and not the observation) and the action.

Beside you have settings in which it would not make real sense to use the observation. For example if you consider noisy observation. Or a multi agent setting.

@jna29
Copy link
Author

jna29 commented Feb 17, 2024

I see. I'll try and modify the reward function for my purpose as I want to use reward functions that only use the observation

@BDonnot
Copy link
Collaborator

BDonnot commented Feb 19, 2024

Let me know if you encounter any issues :-)

@BDonnot BDonnot closed this as completed Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants