[BugFix] Improve collector buffer initialisation when policy spec is unavailable #1547

matteobettini · 2023-09-19T19:00:40Z

Depends on #1539
Fixes #1565

This pr makes the collector use a sample policy forward on the reset data to generate their buffer when the policy spec is not available or partial.

The prior approach initialised the policy keys by passing the policy a env.fake_tensordict() full of zeros, this made certain policies that use action masks or other masks throw errors as these masks were all False.

This solution increases the generality of this initialization and makes sure that the policy is fed data that is suit to it

Signed-off-by: Matteo Bettini <matbet@meta.com>

test/test_collector.py

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini · 2023-09-20T07:58:56Z

torchrl/collectors/collectors.py

+                    if key in self._tensordict_out.keys(isinstance(key, tuple)):
+                        continue
+                    self._tensordict_out.set(key, spec.zero())
+
        else:


above here i just refactored

matteobettini · 2023-09-20T07:59:24Z

torchrl/collectors/collectors.py

        else:
            # otherwise, we perform a small number of steps with the policy to
            # determine the relevant keys with which to pre-populate _tensordict_out.
            # This is the safest thing to do if the spec has None fields or if there is
            # no spec at all.
            # See #505 for additional context.
+            self._tensordict_out.update(self._tensordict)


we update the self._tensordict_out with the real data coming from env.reset()

matteobettini · 2023-09-20T07:59:44Z

torchrl/collectors/collectors.py

-                .zero_()
-            )
-        # in addition to outputs of the policy, we add traj_ids and step_count to
+                self._tensordict_out = self.policy(self._tensordict_out.to(self.device))


and feed that to the policy

vmoens

Tests are passing, so LGTM :) Thanks a mil for this!

…unavailable (pytorch#1547) Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: vmoens <vincentmoens@gmail.com>

amend

e04fb24

Signed-off-by: Matteo Bettini <matbet@meta.com>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2023

matteobettini added 5 commits September 19, 2023 20:03

amend

2a6d953

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

ccec23f

Signed-off-by: Matteo Bettini <matbet@meta.com>

move reset

31fffce

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

1656396

Signed-off-by: Matteo Bettini <matbet@meta.com>

amend

539aa03

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini commented Sep 19, 2023

View reviewed changes

test/test_collector.py Show resolved Hide resolved

amend

d0956ba

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini changed the title ~~[BugFix] Make initialisation of collector buffer use a small rollout when policy spec is unavailable~~ [BugFix] Improve collector buffer initialisation when policy spec is unavailable Sep 19, 2023

matteobettini added the bug Something isn't working label Sep 19, 2023

amend

1be51c2

Signed-off-by: Matteo Bettini <matbet@meta.com>

matteobettini commented Sep 20, 2023

View reviewed changes

matteobettini added 2 commits September 20, 2023 21:51

Merge branch 'main' into fix_collector_init

9cc07fb

Merge branch 'main' into fix_collector_init

fa87cc1

vmoens mentioned this pull request Sep 22, 2023

[BUG] fake_tensordict() in EnvBase is incompatible with OneHotDiscreteTensorSpec #1565

Closed

3 tasks

matteobettini and others added 2 commits September 26, 2023 10:01

Merge branch 'main' into fix_collector_init

71667ad

Merge remote-tracking branch 'origin/main' into fix_collector_init

a0c66fe

matteobettini marked this pull request as ready for review October 1, 2023 09:48

matteobettini added the Environments Adds or modifies an environment wrapper label Oct 1, 2023

vmoens approved these changes Oct 1, 2023

View reviewed changes

vmoens merged commit a02679b into pytorch:main Oct 1, 2023
57 of 59 checks passed

matteobettini deleted the fix_collector_init branch October 2, 2023 07:36

vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023

[BugFix] Improve collector buffer initialisation when policy spec is …

78a9961

…unavailable (pytorch#1547) Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: vmoens <vincentmoens@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Improve collector buffer initialisation when policy spec is unavailable #1547

[BugFix] Improve collector buffer initialisation when policy spec is unavailable #1547

matteobettini commented Sep 19, 2023 •

edited

Loading

matteobettini Sep 20, 2023

matteobettini Sep 20, 2023

matteobettini Sep 20, 2023

vmoens left a comment

[BugFix] Improve collector buffer initialisation when policy spec is unavailable #1547

[BugFix] Improve collector buffer initialisation when policy spec is unavailable #1547

Conversation

matteobettini commented Sep 19, 2023 • edited Loading

matteobettini Sep 20, 2023

Choose a reason for hiding this comment

matteobettini Sep 20, 2023

Choose a reason for hiding this comment

matteobettini Sep 20, 2023

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

matteobettini commented Sep 19, 2023 •

edited

Loading