[Feature] RewardSum transform #751

albertbou92 · 2022-12-16T13:49:13Z

Description

Adds a new Transform class, called RewardSum. Which tracks the cumulative reward of all episodes in progress and adds the information to the tensordict as a new key.

Motivation and Context

It can be informative to be able to access the training episode rewards.

e.g. it can be used like this to track the performance during training

for batch in collector:
    train_episode_reward = batch["episode_reward"][batch["done"]]
    if batch["episode_reward"][batch["done"]].numel() > 0:
           print(f"train_episode_rewards {train_episode_reward.mean()}")

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

Sorry for the late review, I was on PTO

Overall LGTM! I like that you code everything in the input-output tensordict. It is not well documented but we should discourage setting attributed to the transform to track call-to-call carry on variables, the tensordict API is made for that.

Also, because we change the output tensordict by adding a new key, it would be nice to add the out-keys to the observation_spec or change the reward_spec to a CompositeSpec with "reward" and "episode_reward" keys.
WDYT?

Beyond this PR:
We should drop the inplace attribute of the transforms. In general inplace is a recipe for bugs and PyTorch developers usually recommend not using it. The perf gain is usually marginal and it significantly narrows the number of applications

torchrl/envs/transforms/transforms.py

albertbou92 · 2022-12-22T19:02:13Z

Sorry for the late review, I was on PTO

Overall LGTM! I like that you code everything in the input-output tensordict. It is not well documented but we should discourage setting attributed to the transform to track call-to-call carry on variables, the tensordict API is made for that.

Also, because we change the output tensordict by adding a new key, it would be nice to add the out-keys to the observation_spec or change the reward_spec to a CompositeSpec with "reward" and "episode_reward" keys. WDYT?

Beyond this PR: We should drop the inplace attribute of the transforms. In general inplace is a recipe for bugs and PyTorch developers usually recommend not using it. The perf gain is usually marginal and it significantly narrows the number of applications

albertbou92 · 2022-12-22T19:04:25Z

Sorry for the late review, I was on PTO

Overall LGTM! I like that you code everything in the input-output tensordict. It is not well documented but we should discourage setting attributed to the transform to track call-to-call carry on variables, the tensordict API is made for that.

Also, because we change the output tensordict by adding a new key, it would be nice to add the out-keys to the observation_spec or change the reward_spec to a CompositeSpec with "reward" and "episode_reward" keys. WDYT?

Beyond this PR: We should drop the inplace attribute of the transforms. In general inplace is a recipe for bugs and PyTorch developers usually recommend not using it. The perf gain is usually marginal and it significantly narrows the number of applications

Not sure if I would consider episode_reward to be a part of the observation or the reward to add them to the spec. In any case I tried to add the key to the reward spec by creating a CompositeSpec and ran into some problems. Should we do it?

albertbou92 · 2022-12-22T19:20:30Z

torchrl/envs/transforms/transforms.py

+        #     reward=reward_spec,
+        #     episode_reward=episode_reward_spec
+        # )
+        return reward_spec


I tried to modify the reward_spec, but that generated problems somewhere else in the code.

In any case, is episode_reward part of the reward?

vmoens · 2022-12-22T20:26:17Z

Not sure if I would consider episode_reward to be a part of the observation or the reward to add them to the spec. In any case I tried to add the key to the reward spec by creating a CompositeSpec and ran into some problems. Should we do it?

It's probably better to put it in the observation_spec.
The reason we need it is that the specs are used to create "fake_data" that will be used as buffers in parallel settings. If a key of the env output is not in the env spec, you risk to miss it and an error will be raised when creating that env in parallel.
If you want to check that the specs match the data from a rollout, you can use torchrl.envs.utils.check_env_specs.

albertbou92 · 2022-12-23T09:29:01Z

Not sure if I would consider episode_reward to be a part of the observation or the reward to add them to the spec. In any case I tried to add the key to the reward spec by creating a CompositeSpec and ran into some problems. Should we do it?

It's probably better to put it in the observation_spec. The reason we need it is that the specs are used to create "fake_data" that will be used as buffers in parallel settings. If a key of the env output is not in the env spec, you risk to miss it and an error will be raised when creating that env in parallel. If you want to check that the specs match the data from a rollout, you can use torchrl.envs.utils.check_env_specs.

Added the transform_observation_spec method and some code to test it.
Now the transform accepts a set of in_keys that should be present in the reward specs (this checked when calling transform_observation_spec). The out_keys are derived from the in_keys (e.g. reward to episode_reward). If no in_keys are provided it is assumed that the key is simply ´reward´. Also if reward_spec is not a ComposeSpec it is assumed that the only in_key is ´reward´.

vmoens · 2022-12-23T09:35:48Z

torchrl/envs/transforms/transforms.py

+        else:
+
+            # If reward_spec is not a CompositeSpec, the only in_key should be ´reward´
+            assert set(self.in_keys) == {


We should not use asserts in any other place than tests
Here I think a KeyError would be appropriate

vmoens · 2022-12-23T09:39:19Z

torchrl/envs/transforms/transforms.py

+                    tensordict[out_key] = 0.0
+
+        # Batched environments
+        elif "reset_workers" in tensordict.keys():


I guess that the default in the batched cased where no "reset_workers" is present would be to reset them all no?
otherwise in multi-agent settings for instance, a user could do

env.reset()

an end up with an episode reward that keeps on piling up

you are right. Now if no "reset_workers" is found in the tensordict for the batched case, all envs are reset

vmoens · 2022-12-23T09:39:55Z

torchrl/envs/transforms/transforms.py

+            reward = tensordict.get(in_key)
+            if out_key not in tensordict.keys():
+                tensordict.set(
+                    out_key, torch.zeros(*tensordict.shape, 1, dtype=reward.dtype)


let's add device=reward.device here

albertbou92 · 2023-01-02T08:10:22Z

My code considered the possibility that reward spec could be an NdUnboundedContinuousTensorSpec, but I see that this class has been removed. Should I consider that reward spec can only be an UnboundedContinuousTensorSpec?

vmoens · 2023-01-02T08:36:39Z

The Nd*Spec classes were confusing so we removed them in favour of their parent class.
Removing the Nd should be enough.

vmoens

We should also add this transform to the doc in the envs.rst file

vmoens

Looks like the tests are passing. The only thing missing is the doc in envs.rst

codecov · 2023-01-02T14:58:33Z

Codecov Report

Merging #751 (ab0ad7b) into main (30175f0) will increase coverage by 0.00%.
The diff coverage is 84.93%.

❗ Current head ab0ad7b differs from pull request most recent head 0902247. Consider uploading reports for the commit 0902247 to get more accurate results

@@           Coverage Diff           @@
##             main     #751   +/-   ##
=======================================
  Coverage   88.72%   88.72%           
=======================================
  Files         123      123           
  Lines       21033    21106   +73     
=======================================
+ Hits        18661    18726   +65     
- Misses       2372     2380    +8

Flag	Coverage Δ
habitat-gpu	`24.77% <13.33%> (-0.05%)`	⬇️
linux-brax	`29.37% <13.33%> (-0.07%)`	⬇️
linux-cpu	`85.27% <84.93%> (-0.01%)`	⬇️
linux-gpu	`86.16% <84.93%> (-0.01%)`	⬇️
linux-jumanji	`30.15% <13.33%> (-0.08%)`	⬇️
linux-outdeps-gpu	`72.17% <84.93%> (+0.06%)`	⬆️
linux-stable-cpu	`85.12% <84.93%> (-0.01%)`	⬇️
linux-stable-gpu	`85.82% <84.93%> (+0.01%)`	⬆️
linux_examples-gpu	`42.71% <13.33%> (-0.12%)`	⬇️
macos-cpu	`85.02% <84.93%> (-0.01%)`	⬇️
olddeps-gpu	`76.03% <84.93%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
torchrl/envs/__init__.py	`100.00% <ø> (ø)`
torchrl/envs/transforms/__init__.py	`100.00% <ø> (ø)`
torchrl/envs/transforms/transforms.py	`86.96% <75.55%> (-0.43%)`	⬇️
test/test_transforms.py	`96.71% <100.00%> (+0.06%)`	⬆️
torchrl/envs/vec_env.py	`69.42% <0.00%> (+0.49%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

albertbou92 · 2023-01-02T15:24:40Z

I added the transform to the list of transforms in envs.rst, but I am not sure if that is enough

albertbou92 added 4 commits December 16, 2022 14:09

sumreward transform and tests

ac1f620

test fix

a10e138

pre-commit ok

d87b909

tests ok

7fb7ae4

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 16, 2022

albertbou92 added 3 commits December 16, 2022 14:52

pre-commit ok

839d9c1

dont use done flag

de2bb10

dont use done flag

53e49e0

albertbou92 force-pushed the sumreward_transform branch from 839d9c1 to 53e49e0 Compare December 20, 2022 10:01

albertbou92 added 5 commits December 20, 2022 11:07

format

c0ed80c

dont use done flag

a25cecd

format

98f0e4c

fix tests

ad51d80

minor fix

daa028b

vmoens reviewed Dec 22, 2022

View reviewed changes

torchrl/envs/transforms/transforms.py Outdated Show resolved Hide resolved

torchrl/envs/transforms/transforms.py Show resolved Hide resolved

albertbou92 added 3 commits December 22, 2022 19:57

suggested changes

9702041

format

d89acce

format

756138d

albertbou92 closed this Dec 22, 2022

albertbou92 reopened this Dec 22, 2022

albertbou92 added 4 commits December 22, 2022 20:07

format

36ded1b

fix

981b73d

fix

13bb979

fix

808bb0b

albertbou92 commented Dec 22, 2022

View reviewed changes

fix

e6a67d6

transform obs spec

d259f3e

albertbou92 added 4 commits December 23, 2022 09:41

format

de25e32

transform obs spec

b49add8

tests

d307485

minor change

265fc2b

vmoens reviewed Dec 23, 2022

View reviewed changes

docs

f10dcf7

vmoens reviewed Dec 23, 2022

View reviewed changes

albertbou92 and others added 4 commits December 23, 2022 10:40

docs

ad8e9bc

review suggested fixes

b192006

format

8be9600

Merge branch 'main' into sumreward_transform

185ba73

vmoens reviewed Jan 2, 2023

View reviewed changes

albertbou92 added 4 commits January 2, 2023 14:39

fix

6aebe0e

fix

2d4150a

fix

4e98a0c

format

ab0ad7b

vmoens approved these changes Jan 2, 2023

View reviewed changes

fix

0902247

vmoens approved these changes Jan 2, 2023

View reviewed changes

vmoens merged commit 5b9ff55 into pytorch:main Jan 2, 2023

albertbou92 deleted the sumreward_transform branch January 18, 2024 10:08

[Feature] RewardSum transform #751

[Feature] RewardSum transform #751

Uh oh!

Conversation

albertbou92 commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albertbou92 commented Dec 22, 2022

Uh oh!

albertbou92 commented Dec 22, 2022

Uh oh!

albertbou92 Dec 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmoens commented Dec 22, 2022

Uh oh!

albertbou92 commented Dec 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

albertbou92 Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

vmoens Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

albertbou92 Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

vmoens Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

albertbou92 Dec 23, 2022

Choose a reason for hiding this comment

Uh oh!

albertbou92 commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens commented Jan 2, 2023

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

albertbou92 commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertbou92 commented Dec 16, 2022 •

edited

Loading

albertbou92 Dec 22, 2022 •

edited

Loading

albertbou92 commented Dec 23, 2022 •

edited

Loading

albertbou92 commented Jan 2, 2023 •

edited

Loading

codecov bot commented Jan 2, 2023 •

edited

Loading

albertbou92 commented Jan 2, 2023 •

edited

Loading