Skip to content

[Feature] MultiCollector: track policy version per fresh continue command#3759

Merged
vmoens merged 6 commits into
gh/vmoens/275/basefrom
gh/vmoens/275/head
May 18, 2026
Merged

[Feature] MultiCollector: track policy version per fresh continue command#3759
vmoens merged 6 commits into
gh/vmoens/275/basefrom
gh/vmoens/275/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented May 15, 2026

Stack from ghstack (oldest at bottom):

When the parent calls update_policy_weights_() and immediately
sends a continue to a worker, the resulting batch was getting
tagged with the pre-update policy version: the worker's policy
version is only bumped when increment_version() is called, and
that previously only fired on direct user calls. Off-policy logging,
staleness gating, and prioritized sampling that key on the version
saw a 1-step lag.

Pass track_policy_version=(self.policy_version_tracker is not None)
to the worker, plumb a fresh_command flag through the recv loop
so the worker knows whether msg was just received vs replayed
from a timed-out iteration, and increment the version on each fresh
continue / continue_random (only when not in run_free,
where the worker drives its own pacing).

No new unit test: existing MultiCollector policy-version coverage
in test/test_collectors.py continues to assert the per-batch
version. End-to-end coverage comes through the example at the top of
the stack.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 15, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3759

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 4 New Failures, 1 Cancelled Job

As of commit 4479a5d with merge base 0a01ee8 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
Copy link
Copy Markdown
Collaborator Author

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unmergeable, this goes against the idea of policy version tracking. Continue != update policy weight

[ghstack-poisoned]
@vmoens vmoens changed the title [Feature] MultiCollector: track policy version per fresh continue command [BugFix] MultiCollector: track policy version bug fixes May 16, 2026
[ghstack-poisoned]
@vmoens vmoens changed the title [BugFix] MultiCollector: track policy version bug fixes [Feature] MultiCollector: track policy version per fresh continue command May 17, 2026
[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request May 18, 2026
…mand

When the parent calls ``update_policy_weights_()`` and immediately
sends a ``continue`` to a worker, the resulting batch was getting
tagged with the *pre-update* policy version: the worker's policy
version is only bumped when ``increment_version()`` is called, and
that previously only fired on direct user calls. Off-policy logging,
staleness gating, and prioritized sampling that key on the version
saw a 1-step lag.

Pass ``track_policy_version=(self.policy_version_tracker is not None)``
to the worker, plumb a ``fresh_command`` flag through the recv loop
so the worker knows whether ``msg`` was just received vs replayed
from a timed-out iteration, and increment the version on each fresh
``continue`` / ``continue_random`` (only when not in ``run_free``,
where the worker drives its own pacing).

No new unit test: existing ``MultiCollector`` policy-version coverage
in ``test/test_collectors.py`` continues to assert the per-batch
version. End-to-end coverage comes through the example at the top of
the stack.

ghstack-source-id: d780a00
Pull-Request: #3759
@vmoens vmoens merged commit 4479a5d into gh/vmoens/275/base May 18, 2026
106 of 113 checks passed
@vmoens vmoens deleted the gh/vmoens/275/head branch May 18, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BugFix CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Feature New feature Integrations/torch_geometric Integrations WeightUpdate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant