[BugFix] ReplayBuffer's storage now signal back when changes happen #614

paulomarciano · 2022-10-26T13:33:37Z

Description

The prototype modular Replay Buffer has a bug (#606) when storage is shared across buffers. This happens because the different parts implicitly assume to always be aware of changes in storage, which was no true. This change introduces an API for storage to signal back to the buffers that changes were made. Buffers can then handle this signal however and whenever they see fit.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

close #606

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

Awesome and really quick!
Thanks!
Can we discuss about ReplayBuffer.update_priority which seems a bit too customize for prioritized sampling?

vmoens · 2022-10-26T16:51:43Z

torchrl/data/replay_buffers/samplers.py

@@ -28,7 +28,7 @@ def extend(self, index: torch.Tensor) -> None:
        pass

    def update_priority(
-        self, index: Union[int, torch.Tensor], priority: Union[int, torch.Tensor]
+        self, index: Union[int, torch.Tensor], priority: Union[float, torch.Tensor]


thx for that :)

vmoens · 2022-10-26T17:06:04Z

torchrl/data/replay_buffers/storages.py

@@ -29,6 +29,9 @@ class Storage:

    def __init__(self, max_size: int) -> None:
        self.max_size = int(max_size)
+        # Prototype feature. RBs that use a given instance of Storage should add
+        # themselves to this set.
+        self.attached_entities = set()


do we want this to be public? Is any other object accessing that?

Nope. I can rename it.

vmoens · 2022-10-26T17:07:41Z

torchrl/data/replay_buffers/storages.py

    def __getitem__(self, item):
        return self.get(item)

    def __setitem__(self, index, value):
-        return self.set(index, value)
+        ret = self.set(index, value)
+        for i in self.attached_entities:


usually (at least in our code base) i points to an integer.
Can we rename this?

vmoens · 2022-10-26T17:10:46Z

torchrl/data/replay_buffers/rb_prototype.py

+        Args:
+            index: The modified index from storage.
+        """
+        return self.update_priority(index, self._sampler.default_priority)


That's cool
In the future we will probably want the user to update the priority of the sampler directly, without passing through the replay buffer (otherwise the replay buffer will need to implement custom update methods for all the modules e put in it).

One fix for now could be that the sampler has a mark_update method that falls back on update_priority for prioritized sampling. If sampling is uniform, mark_update is a no-op.
Like this we could remove the ReplayBuffer.update_priority method altogether (in a future PR).

Would that make sense?

It makes sense, yes.

I debated with myself for a while if we should attach samplers to storage or if we should attach RBs to storage, and decided for the latter because it is more extensible: It's possible that we need components other than the sampler to be aware of changes, and it's the RB's duty to coordinate the different moving parts.

The counterpoint to this would be YAGNI. It's simpler if we attach the samplers directly. I can definitely make that change.

By the way, another advantage of having the mark_update method instead of directly calling update_priority is that samplers can implement the update whenever they feel like it. We could, for example, simply store the modified indexes inside the sampler until the next call to sample, then update everything lazily.

I debated with myself for a while if we should attach samplers to storage or if we should attach RBs to storage, and decided for the latter because it is more extensible: It's possible that we need components other than the sampler to be aware of changes, and it's the RB's duty to coordinate the different moving parts.

I fully agree. I still think we should attach the RB to the storage, not all other children (the parent is the minimal sufficient information).

My point is mostly about generality of the name update_priority in the replay buffer. I would rather name that mark_update:

class ReplayBuffer: def mark_update(self, ...): self.sampler.mark_update(...) class PrioritizedSampler: def mark_update(self, ...): self.update_priority(...) def update_priority(self, ...): foo(...)

By the way, another advantage of having the mark_update method instead of directly calling update_priority is that samplers can implement the update whenever they feel like it. We could, for example, simply store the modified indexes inside the sampler until the next call to sample, then update everything lazily.

Yes that would be cool! That would definitely be an advantage (we could also update things in batch when needed, which I would expect to cut the compute time of those ops by a bit)

Ah yes. Just made that change.

vmoens

LGTM, thanks for the high-quality work

Can I ask you to create an issue for the lazy implementation of mark_update -> update_priority we talked about?
I can assign someone else to take care of it

vmoens · 2022-10-27T10:43:45Z

I think not all tests are running now, probably because you create the PR from your main branch an not a separate branch. Not sure what to do in that case. I will run the tests locally and see if it's all working properly.

ReplayBuffer's storage now signal back when changes happen.

64c9af0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 26, 2022

vmoens reviewed Oct 26, 2022

View reviewed changes

paulomarciano and others added 4 commits October 27, 2022 10:21

Merge branch 'pytorch:main' into main

fc5552d

Storage attaches to sampler instead of Prototype Replay Buffer

47c36a3

Attach ReplayBuffers, but update is a pass-through

58f9049

Minor Linting Issues.

8f375e0

vmoens changed the title ~~ReplayBuffer's storage now signal back when changes happen.~~ [BugFix] ReplayBuffer's storage now signal back when changes happen Oct 27, 2022

vmoens added the bug Something isn't working label Oct 27, 2022

vmoens approved these changes Oct 27, 2022

View reviewed changes

vmoens merged commit 6bd3b47 into pytorch:main Oct 27, 2022

vmoens mentioned this pull request Nov 5, 2022

[BUG] GymWrapper does not work with nested observation gym.spaces.Dict #640

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] ReplayBuffer's storage now signal back when changes happen #614

[BugFix] ReplayBuffer's storage now signal back when changes happen #614

paulomarciano commented Oct 26, 2022

vmoens left a comment

vmoens Oct 26, 2022

vmoens Oct 26, 2022

paulomarciano Oct 27, 2022

vmoens Oct 26, 2022

paulomarciano Oct 27, 2022

vmoens Oct 26, 2022

paulomarciano Oct 27, 2022

vmoens Oct 27, 2022 •

edited

Loading

paulomarciano Oct 27, 2022

vmoens left a comment

vmoens commented Oct 27, 2022

[BugFix] ReplayBuffer's storage now signal back when changes happen #614

[BugFix] ReplayBuffer's storage now signal back when changes happen #614

Conversation

paulomarciano commented Oct 26, 2022

Description

Motivation and Context

Types of changes

Checklist

vmoens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens Oct 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

vmoens commented Oct 27, 2022

vmoens Oct 27, 2022 •

edited

Loading