-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] RewardSum
transform for multiple reward keys
#1544
Conversation
RewardSum
transform for multiple reward keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this supposed to work with many done signals?
I think we should wait for #1539 to land first
This transform is supposed to work according to the MARL grouping api. It will look for the _reset entry in the root and consider it the default reset so each reward key will be by default associated with the _reset in its td and, if that is not present, it will be associated with the _reset in root (if present) the behavior aligns with the MARL api |
it should not be dependent from #1539 if we follow this logic |
I thought our default was that there wasn't a "_reset" at the root (only if there is a |
To be clear, the direction I understood we were moving towards: TensorDict({"_reset": reset, "nested": {"done": done}), []) # no allowed!
TensorDict({"nested": {"_reset": reset, "done": done}), []) # allowed
TensorDict({"_reset": reset, "done": done}, []) # allowed The first is not allowed because:
|
i have updated with what discussed, now i ll just have to test it |
# Conflicts: # torchrl/envs/transforms/transforms.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @matteobettini you're happy with 8da298f I'm good with merging this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM some comments
If no ``in_keys`` are specified, this transform assumes ``"reward"`` to be the input key. | ||
However, multiple rewards (e.g. ``"reward1"`` and ``"reward2""``) can also be specified. | ||
out_keys (list of NestedKeys, optional): The output sum keys, should be one per each input key. | ||
reset_keys (list of NestedKeys, optional): the list of reset_keys to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here i preferred having done keys rather than reset keys, this is because users are familiar with what a done key is and could not know about reset keys. Plus there is a 1:1 matching between the 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the way it was done: if I pass env.done_keys
there are some duplicates (eg, truncation / termination / done).
Having a "done_keys" list to me is more dangerous because of this, and eventually the only thing we're pointing is the tree structure where the reset_keys should be found. I personally prefer to pass reset_keys
: it's what is needed, ie there is a lower risk that refactoring the done_keys mechanism in the future will break this transform. Per se asking users to pass a list X when we interpolate a list Y that is present within the env already as env.Y seems a convoluted way of doing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought about this: per se most users won't need to pass reset keys. We just support it if someone really wants to do nasty things like summing part of the rewards but not all etc. Advanced usage requires advanced understanding so it's fine to ask for reset_keys even this isn't something that is always user-facing.
Signed-off-by: Matteo Bettini <matbet@meta.com>
# Conflicts: # torchrl/envs/transforms/transforms.py
…into fix_reward_sum
LGTM |
Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: vmoens <vincentmoens@gmail.com>
This PR extends the
RewardSum
to work with multiple reward keys.