Skip to content

Conversation

@albertbou92
Copy link
Contributor

@albertbou92 albertbou92 commented Dec 16, 2022

Description

Adds a new Transform class, called RewardSum. Which tracks the cumulative reward of all episodes in progress and adds the information to the tensordict as a new key.

Motivation and Context

It can be informative to be able to access the training episode rewards.

e.g. it can be used like this to track the performance during training

for batch in collector:
    train_episode_reward = batch["episode_reward"][batch["done"]]
    if batch["episode_reward"][batch["done"]].numel() > 0:
           print(f"train_episode_rewards {train_episode_reward.mean()}")

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 16, 2022
Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review, I was on PTO

Overall LGTM! I like that you code everything in the input-output tensordict. It is not well documented but we should discourage setting attributed to the transform to track call-to-call carry on variables, the tensordict API is made for that.

Also, because we change the output tensordict by adding a new key, it would be nice to add the out-keys to the observation_spec or change the reward_spec to a CompositeSpec with "reward" and "episode_reward" keys.
WDYT?

Beyond this PR:
We should drop the inplace attribute of the transforms. In general inplace is a recipe for bugs and PyTorch developers usually recommend not using it. The perf gain is usually marginal and it significantly narrows the number of applications

@albertbou92
Copy link
Contributor Author

Sorry for the late review, I was on PTO

Overall LGTM! I like that you code everything in the input-output tensordict. It is not well documented but we should discourage setting attributed to the transform to track call-to-call carry on variables, the tensordict API is made for that.

Also, because we change the output tensordict by adding a new key, it would be nice to add the out-keys to the observation_spec or change the reward_spec to a CompositeSpec with "reward" and "episode_reward" keys. WDYT?

Beyond this PR: We should drop the inplace attribute of the transforms. In general inplace is a recipe for bugs and PyTorch developers usually recommend not using it. The perf gain is usually marginal and it significantly narrows the number of applications

@albertbou92
Copy link
Contributor Author

Sorry for the late review, I was on PTO

Overall LGTM! I like that you code everything in the input-output tensordict. It is not well documented but we should discourage setting attributed to the transform to track call-to-call carry on variables, the tensordict API is made for that.

Also, because we change the output tensordict by adding a new key, it would be nice to add the out-keys to the observation_spec or change the reward_spec to a CompositeSpec with "reward" and "episode_reward" keys. WDYT?

Beyond this PR: We should drop the inplace attribute of the transforms. In general inplace is a recipe for bugs and PyTorch developers usually recommend not using it. The perf gain is usually marginal and it significantly narrows the number of applications

Not sure if I would consider episode_reward to be a part of the observation or the reward to add them to the spec. In any case I tried to add the key to the reward spec by creating a CompositeSpec and ran into some problems. Should we do it?

@albertbou92 albertbou92 reopened this Dec 22, 2022
# reward=reward_spec,
# episode_reward=episode_reward_spec
# )
return reward_spec
Copy link
Contributor Author

@albertbou92 albertbou92 Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to modify the reward_spec, but that generated problems somewhere else in the code.

In any case, is episode_reward part of the reward?

@vmoens
Copy link
Collaborator

vmoens commented Dec 22, 2022

Not sure if I would consider episode_reward to be a part of the observation or the reward to add them to the spec. In any case I tried to add the key to the reward spec by creating a CompositeSpec and ran into some problems. Should we do it?

It's probably better to put it in the observation_spec.
The reason we need it is that the specs are used to create "fake_data" that will be used as buffers in parallel settings. If a key of the env output is not in the env spec, you risk to miss it and an error will be raised when creating that env in parallel.
If you want to check that the specs match the data from a rollout, you can use torchrl.envs.utils.check_env_specs.

@albertbou92
Copy link
Contributor Author

albertbou92 commented Dec 23, 2022

Not sure if I would consider episode_reward to be a part of the observation or the reward to add them to the spec. In any case I tried to add the key to the reward spec by creating a CompositeSpec and ran into some problems. Should we do it?

It's probably better to put it in the observation_spec. The reason we need it is that the specs are used to create "fake_data" that will be used as buffers in parallel settings. If a key of the env output is not in the env spec, you risk to miss it and an error will be raised when creating that env in parallel. If you want to check that the specs match the data from a rollout, you can use torchrl.envs.utils.check_env_specs.

Added the transform_observation_spec method and some code to test it.
Now the transform accepts a set of in_keys that should be present in the reward specs (this checked when calling transform_observation_spec). The out_keys are derived from the in_keys (e.g. reward to episode_reward). If no in_keys are provided it is assumed that the key is simply ´reward´. Also if reward_spec is not a ComposeSpec it is assumed that the only in_key is ´reward´.

else:

# If reward_spec is not a CompositeSpec, the only in_key should be ´reward´
assert set(self.in_keys) == {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use asserts in any other place than tests
Here I think a KeyError would be appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

tensordict[out_key] = 0.0

# Batched environments
elif "reset_workers" in tensordict.keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that the default in the batched cased where no "reset_workers" is present would be to reset them all no?
otherwise in multi-agent settings for instance, a user could do

env.reset()

an end up with an episode reward that keeps on piling up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. Now if no "reset_workers" is found in the tensordict for the batched case, all envs are reset

reward = tensordict.get(in_key)
if out_key not in tensordict.keys():
tensordict.set(
out_key, torch.zeros(*tensordict.shape, 1, dtype=reward.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add device=reward.device here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@albertbou92
Copy link
Contributor Author

albertbou92 commented Jan 2, 2023

My code considered the possibility that reward spec could be an NdUnboundedContinuousTensorSpec, but I see that this class has been removed. Should I consider that reward spec can only be an UnboundedContinuousTensorSpec?

@vmoens
Copy link
Collaborator

vmoens commented Jan 2, 2023

The Nd*Spec classes were confusing so we removed them in favour of their parent class.
Removing the Nd should be enough.

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add this transform to the doc in the envs.rst file

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the tests are passing. The only thing missing is the doc in envs.rst

@codecov
Copy link

codecov bot commented Jan 2, 2023

Codecov Report

Merging #751 (ab0ad7b) into main (30175f0) will increase coverage by 0.00%.
The diff coverage is 84.93%.

❗ Current head ab0ad7b differs from pull request most recent head 0902247. Consider uploading reports for the commit 0902247 to get more accurate results

@@           Coverage Diff           @@
##             main     #751   +/-   ##
=======================================
  Coverage   88.72%   88.72%           
=======================================
  Files         123      123           
  Lines       21033    21106   +73     
=======================================
+ Hits        18661    18726   +65     
- Misses       2372     2380    +8     
Flag Coverage Δ
habitat-gpu 24.77% <13.33%> (-0.05%) ⬇️
linux-brax 29.37% <13.33%> (-0.07%) ⬇️
linux-cpu 85.27% <84.93%> (-0.01%) ⬇️
linux-gpu 86.16% <84.93%> (-0.01%) ⬇️
linux-jumanji 30.15% <13.33%> (-0.08%) ⬇️
linux-outdeps-gpu 72.17% <84.93%> (+0.06%) ⬆️
linux-stable-cpu 85.12% <84.93%> (-0.01%) ⬇️
linux-stable-gpu 85.82% <84.93%> (+0.01%) ⬆️
linux_examples-gpu 42.71% <13.33%> (-0.12%) ⬇️
macos-cpu 85.02% <84.93%> (-0.01%) ⬇️
olddeps-gpu 76.03% <84.93%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
torchrl/envs/__init__.py 100.00% <ø> (ø)
torchrl/envs/transforms/__init__.py 100.00% <ø> (ø)
torchrl/envs/transforms/transforms.py 86.96% <75.55%> (-0.43%) ⬇️
test/test_transforms.py 96.71% <100.00%> (+0.06%) ⬆️
torchrl/envs/vec_env.py 69.42% <0.00%> (+0.49%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@albertbou92
Copy link
Contributor Author

albertbou92 commented Jan 2, 2023

I added the transform to the list of transforms in envs.rst, but I am not sure if that is enough

@vmoens vmoens merged commit 5b9ff55 into pytorch:main Jan 2, 2023
@albertbou92 albertbou92 deleted the sumreward_transform branch January 18, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants