[BugFix] Fix collector reset with truncation #1021

vmoens · 2023-04-04T11:37:11Z

Description

Alternative to #1015

matteobettini · 2023-04-04T11:54:15Z

torchrl/collectors/collectors.py

-            if (_reset is None and done.any()) or (
-                _reset is not None and done[_reset].any()
-            ):
+            reset_idx = done_or_terminated.squeeze(-1)


Here let's keep in mind that the shape of done is not always [*batch_size,1] but it follows the done_spec which could be [*batch_size,*F]

got it
Then what should we do? Reduce done until its shape matches the tensordict's?

Suggested change

reset_idx = done_or_terminated.squeeze(-1)

reset_idx = done_or_terminated

while reset_idx.ndim > self._tensordict.ndim:

reset_idx = reset_idx.any(-1)

We do it a few lines later:

done_or_terminated.sum( tuple(range(self._tensordict.batch_dims, done_or_terminated.ndim)), dtype=torch.bool, )

we can do it once for these 2 use cases

got it, done

matteobettini · 2023-04-04T12:37:23Z

torchrl/collectors/collectors.py

+                td_reset[traj_done_or_terminated], inplace=True
+            )
+            done = self._tensordict[traj_done_or_terminated].get("done")
+            if (_reset is None and done.any()) or (_reset is not None and done.any()):


This check here might be worth keeping like before because
self._tensordict.get("done")[_reset]
checks allt the done_sec dims, while
self._tensordict[traj_done_or_terminated].get("done")
uses only traj_done_or_terminated which is smaller than the _reset

This check here might be worth keeping like before because self._tensordict.get("done")[_reset] checks allt the done_sec dims, while self._tensordict[traj_done_or_terminated].get("done") uses only traj_done_or_terminated which is smaller than the _reset

Not sure I get it
Say done.shape = [3, 4, 5] and traj_done_or_terminated.shape = [3]

self._tensordict[traj_done_or_terminated].get("done").any(). tells us if any (leave) env is done.
What are you suggesting we do instead?

taking your example if _reset.shape = [3, 4, 5] but if only _reset[0,0,0] = True, when you compute traj_done_or_terminated you get [True, False,False].

now when you do self._tensordict[traj_done_or_terminated].get("done") you get a done of shape [4,5] and you check that none of it is True, while you should check only [0,0]

I'm not sure I see why that is necessary since both options will raise the exception if anything is done.

If in MARL you have agents that can be done when init both these options will raise an error (bc an agent is done).
In other words: if in MARL an agent can be done after init, then there's a chance that you'll bump into this error bc it can be "naturally" done after a reset.

I guess my question is: in the scenario where there are bags of envs with a batch size that is richer than the tensordict batch-size, is there any rationale to check that only the envs that were done are not done anymore, and not that after we call reset (presumably only on these env) nothing from these bags of envs is done anymore?

To rephrase is: if in you example I just call reset on [0, 0, 0] and after reset I still have a done, it can only come from [0, 0, 0] since I did not touch the other envs

I was thinking the same thing
What about: we take the simplest option (for now) and leave it as is until we figure out what to do with envs that can start with a done or env that keep being executed when done?

The if in this pr has an or which can be removed BTW dunno if you saw it

sorry I did not get that

np it is fixed now

init

6223494

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2023

matteobettini reviewed Apr 4, 2023

View reviewed changes

vmoens added 3 commits April 4, 2023 13:19

empty commit

b0d9629

amend

822f518

amend

4ad5fb9

matteobettini reviewed Apr 4, 2023

View reviewed changes

vmoens added the bug Something isn't working label Apr 4, 2023

amend

1e7ad78

vmoens merged commit 164a438 into main Apr 4, 2023

vmoens mentioned this pull request Apr 4, 2023

[BugFix] Fix in resetting SumReward and StepCounter transforms #1015

Closed

albertbou92 pushed a commit to PyTorchRL/rl that referenced this pull request Apr 12, 2023

[BugFix] Fix collector reset with truncation (pytorch#1021)

0d4c32a

vmoens deleted the fix_collector_reset branch May 12, 2023 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix collector reset with truncation #1021

[BugFix] Fix collector reset with truncation #1021

vmoens commented Apr 4, 2023 •

edited

Loading

matteobettini Apr 4, 2023

vmoens Apr 4, 2023

matteobettini Apr 4, 2023

vmoens Apr 4, 2023

matteobettini Apr 4, 2023

vmoens Apr 4, 2023

matteobettini Apr 4, 2023 •

edited

Loading

vmoens Apr 4, 2023

vmoens Apr 4, 2023

vmoens Apr 4, 2023

matteobettini Apr 4, 2023

matteobettini Apr 4, 2023

vmoens Apr 4, 2023

matteobettini Apr 4, 2023

[BugFix] Fix collector reset with truncation #1021

[BugFix] Fix collector reset with truncation #1021

Conversation

vmoens commented Apr 4, 2023 • edited Loading

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matteobettini Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens commented Apr 4, 2023 •

edited

Loading

matteobettini Apr 4, 2023 •

edited

Loading