New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] GymWrapper does not work with nested observation gym.spaces.Dict #640
Comments
Hey @raphajaner thanks for this, and glad you like the lib. |
Oh by the way, since you have worked out the solution, do you want to implement the fix? I can take care of that otherwise. The only thing missing from what i see would be to reproduce your code example in a test. Side note: since we send envs and objects from process to process, I usually try to avoid using lambda functions as they don't serialize well. There are solutions but not using one is always easier :) |
Yes sure, I’ll take care of it :) Thanks for the feedback! |
Hi @vmoens, if not isinstance(self.observation_spec, CompositeSpec):
if self.from_pixels:
self.observation_spec = CompositeSpec(next_pixels=self.observation_spec)
else:
self.observation_spec = CompositeSpec(
next_observation=self.observation_spec
) So for all other obs space types, the "next_" is only then added summarizing the space in the key "next_observation" or "next_pixels", respectively. In the function new_keys = [key[5:] for key in keys] which works fine when the obs are always under the "next_observation" key. This won't work for CompositeSpecs that are directly created from nested Dict gym obs spaces. A fix here should also be easy by renaming the keys in However, I'm in general a bit unsure if for Dict gym obs spaces all the keys of that Dict should be renamed and kept separately instead of simply summarizing the whole Dict in a |
Hey! First I think the Here's what I would see: TensorDict({
“state”: stuff,
“reward”: reward,
“done”: done,
"action": action,
“next_state”: stuff,
"other": foo,
}, []) We would change that in: TensorDict({
“state”: stuff,
“reward”: reward,
“done”: done,
"action": action,
“next”: TensorDict({
“state”: stuff,
}, []),
"other": foo,
}, []) That way, So in your case you'd have this TensorDict({
“state”: stuff,
“reward”: reward,
“done”: done,
"action": action,
"camera": cam,
“next”: TensorDict({
“state”: stuff,
"camera": cam,
}, []),
"other": foo,
}, []) Thoughts? cc @shagunsodhani (by the way it's funny that we were just talking about that feature a couple of hours ago and @raphajaner came with a very similar idea!) |
Forgot to mention: in the MCTS case, we'd have a tensordict slightly different: TensorDict({
“state”: stuff,
“reward”: reward,
“done”: done,
"action": action,
“next”: TensorDict({
"0": TensorDict({
“state”: stuff,
"reward": reward,
"done": done,
}, []),
"1": TensorDict({
“state”: stuff,
"reward": reward,
"done": done,
}, []),
}, []),
"other": foo,
}, []) where For continuouss action domains, we'd need a custom hashing function given by the user to convert actions to a string. |
Yes seems like a very reasonable solution to it! I like the thought of having more flexibility in "next" for other cases. Just a small note about the example with the "camera". I'd see the "camera" as part of the overall "state" along with other sensors, e.g., like this: TensorDict({
“state”: TensorDict({
“vehicle_state”: stuff,
"camera": cam,
}, []),
“reward”: reward,
“done”: done,
"action": action,
“next”: TensorDict({
“state”: TensorDict({
“vehicle_state”: stuff,
"camera": cam,
}, []),
}, []),
"other": foo,
}, []) As you're already thinking about redesigning the API, I guess it could also make sense to think about the "done" part which has been recently split into "truncated" and "terminated" in gym. IMO that made a lot of sense. I think I see you're point regarding continuous action domains in MCTS. I'm not sure if this makes sense but couldn't it be possible to use the object id() of the next state as the hash index and include this directly to the action as info? |
OI thought about it but I guess it'll break in distributed settings (which is typical for mcts) |
I am not sure how actively the community would pick up this change so we may want to wait for a while before changing the API. |
I’m a bit torn by this…
One thing I’m considering is to support it if the env that is being wrapped
has the two booleans outputs, and stick to the old API otherwise.
#614 from @ordinskiy goes in that direction.
People seem to be quite opinionated about it and I don’t think it’s up to
TorchRL to lean towards one or the other option. Still, I kind of think
that we should support both…
|
Supporting both makes sense. Though if this requires a lot of work, we may want to wait a while and see how strong is the demand for the new API. |
Closed by #649 |
Describe the bug
Hi All,
First of all: thanks for the great work here!
I think I have encountered a bug in the GymWrapper in
torchrl.envs.libs.gym.GymWrapper
. When I use agym.Env
with an observation space with nestedgym.spaces.Dict
, a KeyError will be thrown since theGymLikeEnv.read_obs()
function does only add "next_" to the first level of Dict but not to nested sub Dicts:Since
_gym_to_torchrl_spec_transform()
intorchrl.envs.libs.gym
ends "next_" in a recursive call to all sub Dicts, the key is missing the necessary "next_". Nested Dict observation spaces are often used (https://www.gymlibrary.dev/api/spaces/#dict), so I guess this is required to work properly.To Reproduce
Reason and Possible fixes
The issue can be fixed by adding a recursive function call to rename also nested observation space Dicts in
GymLikeEnv.read_obs()
correctly by adding "next_":The style checker required to not use lambda functions, otherwise the fix could also be as simple as
Checklist
The text was updated successfully, but these errors were encountered: