[BugFix] `RewardSum` transform for multiple reward keys #1544

matteobettini · 2023-09-19T14:15:53Z

This PR extends the RewardSum to work with multiple reward keys.

Signed-off-by: Matteo Bettini <matbet@meta.com>

vmoens

How is this supposed to work with many done signals?
I think we should wait for #1539 to land first

torchrl/envs/transforms/transforms.py

matteobettini · 2023-09-19T15:57:29Z

This transform is supposed to work according to the MARL grouping api.

It will look for the _reset entry in the root and consider it the default reset
For each reward key then, if it finds a _reset in its tensordict it will use that instead.

so each reward key will be by default associated with the _reset in its td and, if that is not present, it will be associated with the _reset in root (if present)

the behavior aligns with the MARL api

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini · 2023-09-19T16:14:49Z

it should not be dependent from #1539 if we follow this logic

Signed-off-by: Matteo Bettini <matbet@meta.com>

vmoens · 2023-09-22T09:01:21Z

This transform is supposed to work according to the MARL grouping api.

It will look for the _reset entry in the root and consider it the default reset

I thought our default was that there wasn't a "_reset" at the root (only if there is a "done" at the root)?

vmoens · 2023-09-22T09:05:01Z

To be clear, the direction I understood we were moving towards:

TensorDict({"_reset": reset, "nested": {"done": done}), []) # no allowed! 
TensorDict({"nested": {"_reset": reset, "done": done}), []) # allowed
TensorDict({"_reset": reset, "done": done}, []) # allowed

The first is not allowed because:

the "_reset" is not part of the env.reset_keys which MUST be an iterable that follows the grouped done
it is unclear what this reset refers to and what to do with it in specific cases: does it prevail over sub-resets? do we allow any arbitrary shape? There are way too many decisions to be taken IMO

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini · 2023-09-22T10:13:25Z

i have updated with what discussed, now i ll just have to test it

# Conflicts: # torchrl/envs/transforms/transforms.py

vmoens

If @matteobettini you're happy with 8da298f I'm good with merging this

matteobettini

LGTM some comments

matteobettini · 2023-10-01T09:40:44Z

torchrl/envs/transforms/transforms.py

+            If no ``in_keys`` are specified, this transform assumes ``"reward"`` to be the input key.
+            However, multiple rewards (e.g. ``"reward1"`` and ``"reward2""``) can also be specified.
+        out_keys (list of NestedKeys, optional): The output sum keys, should be one per each input key.
+        reset_keys (list of NestedKeys, optional): the list of reset_keys to be


Here i preferred having done keys rather than reset keys, this is because users are familiar with what a done key is and could not know about reset keys. Plus there is a 1:1 matching between the 2

not the way it was done: if I pass env.done_keys there are some duplicates (eg, truncation / termination / done).
Having a "done_keys" list to me is more dangerous because of this, and eventually the only thing we're pointing is the tree structure where the reset_keys should be found. I personally prefer to pass reset_keys: it's what is needed, ie there is a lower risk that refactoring the done_keys mechanism in the future will break this transform. Per se asking users to pass a list X when we interpolate a list Y that is present within the env already as env.Y seems a convoluted way of doing things.

Another thought about this: per se most users won't need to pass reset keys. We just support it if someone really wants to do nasty things like summing part of the rewards but not all etc. Advanced usage requires advanced understanding so it's fine to ask for reset_keys even this isn't something that is always user-facing.

test/test_transforms.py

torchrl/envs/transforms/transforms.py

…reward_sum

Signed-off-by: Matteo Bettini <matbet@meta.com>

# Conflicts: # torchrl/envs/transforms/transforms.py

…into fix_reward_sum

matteobettini · 2023-10-02T13:14:07Z

LGTM

Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: vmoens <vincentmoens@gmail.com>

update

446ca69

Signed-off-by: Matteo Bettini <matbet@meta.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2023

matteobettini added 3 commits September 19, 2023 15:50

update

26f2de0

Signed-off-by: Matteo Bettini <matbet@meta.com>

update

ab1155a

Signed-off-by: Matteo Bettini <matbet@meta.com>

update

cb8a586

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini marked this pull request as ready for review September 19, 2023 15:03

matteobettini changed the title ~~[BugFIx] RewardSum transform for multiple reward keys~~ [BugFix] RewardSum transform for multiple reward keys Sep 19, 2023

matteobettini added the bug Something isn't working label Sep 19, 2023

vmoens reviewed Sep 19, 2023

View reviewed changes

matteobettini added 2 commits September 19, 2023 17:11

examples

4e5205d

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

cb53b15

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

a1d2a4f

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

06b6e73

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini marked this pull request as draft September 22, 2023 14:08

matteobettini and others added 3 commits September 26, 2023 10:01

Merge branch 'main' into fix_reward_sum

9ff8fb4

Merge branch 'main' into fix_reward_sum

122dbda

# Conflicts: # torchrl/envs/transforms/transforms.py

amend

8da298f

vmoens marked this pull request as ready for review October 1, 2023 06:08

vmoens approved these changes Oct 1, 2023

View reviewed changes

matteobettini commented Oct 1, 2023

View reviewed changes

Merge branch 'main' into fix_reward_sum

71ecabf

matteobettini marked this pull request as draft October 2, 2023 08:36

matteobettini commented Oct 2, 2023

View reviewed changes

torchrl/envs/transforms/transforms.py Show resolved Hide resolved

matteobettini marked this pull request as ready for review October 2, 2023 08:44

matteobettini marked this pull request as draft October 2, 2023 08:47

vmoens and others added 4 commits October 2, 2023 05:14

Merge remote-tracking branch 'origin/main' into fix_reward_sum

fa54dd7

amend

169ee36

Merge remote-tracking branch 'matteobettini/fix_reward_sum' into fix_…

14354db

…reward_sum

suqueeze reset

24354c0

Signed-off-by: Matteo Bettini <matbet@meta.com>

vmoens marked this pull request as ready for review October 2, 2023 12:30

vmoens added 4 commits October 2, 2023 08:31

Merge branch 'main' into fix_reward_sum

1009399

# Conflicts: # torchrl/envs/transforms/transforms.py

Merge branch 'fix_reward_sum' of https://github.com/matteobettini/rl …

3ba9196

…into fix_reward_sum

remove comment

38054ee

amend

179c034

vmoens added 2 commits October 2, 2023 09:48

Merge branch 'main' into fix_reward_sum

48bcb22

fix

5e3a2c5

vmoens merged commit 1697102 into pytorch:main Oct 2, 2023
54 of 59 checks passed

vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023

[BugFix] RewardSum transform for multiple reward keys (pytorch#1544)

02b9a75

Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: vmoens <vincentmoens@gmail.com>

matteobettini deleted the fix_reward_sum branch December 4, 2023 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] `RewardSum` transform for multiple reward keys #1544

[BugFix] `RewardSum` transform for multiple reward keys #1544

matteobettini commented Sep 19, 2023

vmoens left a comment

matteobettini commented Sep 19, 2023

matteobettini commented Sep 19, 2023

vmoens commented Sep 22, 2023

vmoens commented Sep 22, 2023 •

edited

Loading

matteobettini commented Sep 22, 2023

vmoens left a comment

matteobettini left a comment

matteobettini Oct 1, 2023

vmoens Oct 1, 2023

vmoens Oct 2, 2023

matteobettini commented Oct 2, 2023

[BugFix] RewardSum transform for multiple reward keys #1544

[BugFix] RewardSum transform for multiple reward keys #1544

Conversation

matteobettini commented Sep 19, 2023

vmoens left a comment

Choose a reason for hiding this comment

matteobettini commented Sep 19, 2023

matteobettini commented Sep 19, 2023

vmoens commented Sep 22, 2023

vmoens commented Sep 22, 2023 • edited Loading

matteobettini commented Sep 22, 2023

vmoens left a comment

Choose a reason for hiding this comment

matteobettini left a comment

Choose a reason for hiding this comment

matteobettini Oct 1, 2023

Choose a reason for hiding this comment

vmoens Oct 1, 2023

Choose a reason for hiding this comment

vmoens Oct 2, 2023

Choose a reason for hiding this comment

matteobettini commented Oct 2, 2023

[BugFix] `RewardSum` transform for multiple reward keys #1544

[BugFix] `RewardSum` transform for multiple reward keys #1544

vmoens commented Sep 22, 2023 •

edited

Loading