Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Improve collector buffer initialisation when policy spec is unavailable #1547

Merged
merged 12 commits into from
Oct 1, 2023

Conversation

matteobettini
Copy link
Contributor

@matteobettini matteobettini commented Sep 19, 2023

Depends on #1539
Fixes #1565

This pr makes the collector use a sample policy forward on the reset data to generate their buffer when the policy spec is not available or partial.

The prior approach initialised the policy keys by passing the policy a env.fake_tensordict() full of zeros, this made certain policies that use action masks or other masks throw errors as these masks were all False.

This solution increases the generality of this initialization and makes sure that the policy is fed data that is suit to it

Signed-off-by: Matteo Bettini <matbet@meta.com>
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2023
Signed-off-by: Matteo Bettini <matbet@meta.com>
Signed-off-by: Matteo Bettini <matbet@meta.com>
Signed-off-by: Matteo Bettini <matbet@meta.com>
Signed-off-by: Matteo Bettini <matbet@meta.com>
Signed-off-by: Matteo Bettini <matbet@meta.com>
Signed-off-by: Matteo Bettini <matbet@meta.com>
@matteobettini matteobettini changed the title [BugFix] Make initialisation of collector buffer use a small rollout when policy spec is unavailable [BugFix] Improve collector buffer initialisation when policy spec is unavailable Sep 19, 2023
@matteobettini matteobettini added the bug Something isn't working label Sep 19, 2023
Signed-off-by: Matteo Bettini <matbet@meta.com>
if key in self._tensordict_out.keys(isinstance(key, tuple)):
continue
self._tensordict_out.set(key, spec.zero())

else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above here i just refactored

else:
# otherwise, we perform a small number of steps with the policy to
# determine the relevant keys with which to pre-populate _tensordict_out.
# This is the safest thing to do if the spec has None fields or if there is
# no spec at all.
# See #505 for additional context.
self._tensordict_out.update(self._tensordict)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we update the self._tensordict_out with the real data coming from env.reset()

.zero_()
)
# in addition to outputs of the policy, we add traj_ids and step_count to
self._tensordict_out = self.policy(self._tensordict_out.to(self.device))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and feed that to the policy

@matteobettini matteobettini marked this pull request as ready for review October 1, 2023 09:48
@matteobettini matteobettini added the Environments Adds or modifies an environment wrapper label Oct 1, 2023
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are passing, so LGTM :) Thanks a mil for this!

@vmoens vmoens merged commit a02679b into pytorch:main Oct 1, 2023
57 of 59 checks passed
@matteobettini matteobettini deleted the fix_collector_init branch October 2, 2023 07:36
vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023
…unavailable (pytorch#1547)

Signed-off-by: Matteo Bettini <matbet@meta.com>
Co-authored-by: vmoens <vincentmoens@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Environments Adds or modifies an environment wrapper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] fake_tensordict() in EnvBase is incompatible with OneHotDiscreteTensorSpec
3 participants