New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbage collection bug for forecast observation #568
Comments
Hello, This seems to be a bug indeed. In short term (I will not fix this in 2023) you can call: forecast_obs = forecast_env.reset().copy()
forecast_obs._obs_env = forecast_obs._obs_env.copy() EDIT: I now (after further investigation) believe this is not a good idea to do this. See my next post for more information. |
Thanks this worked |
Hello, Upon further investigation I am not sure this is a bug from grid2op (though it sure looks like it). What happens here is that when the "forecast environment" is deleted, all the resources it holds are closed. This is a normal / pythonic behaviour. If I understood your usecase, you want to store the forecasted observations. Maybe you can use https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecast_arrays or https://grid2op.readthedocs.io/en/latest/observation.html#grid2op.Observation.BaseObservation.get_forecasted_inj If you want to still be able to perform powerflows with what you have, you can also use the If you detail a bit more your needs I might help you find the most efficient way to do it :-) And, if you want minimal changes to the code you sent me, you can of course still keep a reference to each forecast env you creates (to avoid their deletion) with: import grid2op
from lightsim2grid import LightSimBackend
env_name = "l2rpn_wcci_2022"
env = grid2op.make(env_name, backend=LightSimBackend())
_ = env.reset()
class Manager:
obs_list = []
for_env_list = []
def foo():
obs = env.reset()
forecast_env = obs.get_forecast_env()
forecast_obs = forecast_env.reset()
Manager.obs_list.append(forecast_obs)
Manager.for_env_list.append(forecast_env)
print(f"forecast_obs: {forecast_obs}")
print(f"[FOO] Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
print(f"[FOO] Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}")
if __name__ == '__main__':
foo()
print(f"Manager.obs_list[0]._obs_env: {Manager.obs_list[0]._obs_env}")
print(f"Manager.obs_list[0]._obs_env.backend is None: {Manager.obs_list[0]._obs_env.backend is None}") (I am not sure saving some data as class attribute in this way is a good think to do in general [just like global variables] it might lead to really hard to spot and debug wrong behaviour. I would advise against it) |
So I need to store the forecasted observations to use |
Hello, A That's how it's designed to work if I remember correctly. Though I agree, you don't get any reward from the simulator. Only an environment can compute it. That being said, you can compute something like a reward from the observation you got with the simulator. |
So the issue is that I want to perform a prediction from the observation I get from
|
I'm not sure you need the observation for this. You get a new simulator class (with its observation) each time you call So basically your work flow could look like : simulator_init = obs.get_simulator()
# to simulate action whatever on the current step
simulator1 = simulator.predict(act=whatever)
simulator2 = simulator.predict(act=wheteve2)
# to pass to next forecast step
forecats_horizon = 1
load_p, load_q, prod_p, prod_v, maintenance = obs.get_forecast_arrays()
simulator1_h1 = simulator.predict(act=whatever, new_gen_p=gen_p[forecast_horizon, :], new_load_p=same, etc. ) This, of I understood correctly should do what you want. |
Sorry for the late reply. So what you say would work for my use case other than the fact that I can't compute rewards and terminality (whether |
I don't know your use case you did not really described it. Of if you did I missed it. What I'm saying is that from what I understand maybe the Simulator would work. You have access to Not sure if it answers your non question. But now you have maybe more information to decide what you want to use and how. Because at the end, if it works and you're happy about the solution then you can use whatever you want 😊 |
So I want to be able to perform rollouts with Grid2op environments in an MCTS style where I can start a rollout of arbitrary length from an arbitrary state I have visited before and get the next state, reward and done flag after every action. I would also like to inject predictions from an external source of the load and generator values when perform an action. This is why I use a forecast environment as it allows me to do all of this. I inspected the Simulator object and I couldn't see any attribute that corresponds to the |
Hello,
Thanks for describing me your usecase. It's clearer now. Some people have already applied with success MCTS to grid2op so i'm confident it's doable. And if you find that
The main difference for the simulator is that it will not check the validity of the action. It will do the action even though "in reality" this action would be illegal.
Yes these are. Notice I did not say "currently the reward is computed without the environment". I said that for the vast majority of reward, in fact you can just use an observation. For example, for def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
if has_error or is_illegal or is_ambiguous:
res = self.reward_min
else:
# compute the cost of the grid
res = dt_float((env.get_obs(_do_copy=False).prod_p * env.gen_cost_per_MW).sum() * env.delta_time_seconds / 3600.0)
# we want to minimize the cost by maximizing the reward so let's take the opposite
res *= dt_float(-1.0)
# to be sure it's positive, add the highest possible cost
res += self.worst_cost
res = np.interp(
res, [dt_float(0.0), self.worst_cost], [self.reward_min, self.reward_max]
)
return dt_float(res) This reward litterally calls "env.get_obs()" so it does not use anything from the environment that would be hidden in the observation. Another example, the def __call__(self, action, env, has_error, is_done, is_illegal, is_ambiguous):
if is_done:
res = env.nb_time_step
if np.isfinite(self.total_time_steps):
res /= self.total_time_steps
else:
res = self.reward_min
return res Here it uses So again no issues to use an observation in this case. And I could continue for most rewards available in grid2op. |
Thanks for clarifying my questions. However, I don't understand why the reward functions require as input an |
Because in the RL community, in theory you can design some rewards that uses part of the environment not shown in the observation. Beside you have settings in which it would not make real sense to use the observation. For example if you consider noisy observation. Or a multi agent setting. |
I see. I'll try and modify the reward function for my purpose as I want to use reward functions that only use the observation |
Let me know if you encounter any issues :-) |
Environment
1.9.7
Springdale Open Enterprise Linux 9.2 (Parma)
Bug description
Appending an
Observation
object returned fromforecast_obs = obs.forecast_env().reset()
to a list class attribute in a function results inforecast_obs
being garbage collected when we the function is exited. I believe that this is due to the fact that__del__
is being called onforecast_obs._obs_env
when the functionfoo
below is exited.The error still persists if we remove
LightSimBackend
(i.e. use the default backend) or change the environment or if we copyforecast_obs
before adding it to the list.How to reproduce
Command line
Code snippet
Current output
Expected output
The backend should still remain even after we exit
foo
The text was updated successfully, but these errors were encountered: