[Feature] MultiCollector: track policy version per fresh continue command by vmoens · Pull Request #3759 · pytorch/rl

vmoens · 2026-05-15T07:20:51Z

Stack from ghstack (oldest at bottom):

When the parent calls update_policy_weights_() and immediately
sends a continue to a worker, the resulting batch was getting
tagged with the pre-update policy version: the worker's policy
version is only bumped when increment_version() is called, and
that previously only fired on direct user calls. Off-policy logging,
staleness gating, and prioritized sampling that key on the version
saw a 1-step lag.

Pass track_policy_version=(self.policy_version_tracker is not None)
to the worker, plumb a fresh_command flag through the recv loop
so the worker knows whether msg was just received vs replayed
from a timed-out iteration, and increment the version on each fresh
continue / continue_random (only when not in run_free,
where the worker drives its own pacing).

No new unit test: existing MultiCollector policy-version coverage
in test/test_collectors.py continues to assert the per-batch
version. End-to-end coverage comes through the example at the top of
the stack.

[ghstack-poisoned]

pytorch-bot · 2026-05-15T07:20:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3759

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 4 New Failures, 1 Cancelled Job

As of commit 4479a5d with merge base 0a01ee8 ():

NEW FAILURES - The following jobs have failed:

Build Windows Wheels / pytorch/rl / build-wheel-py3_10-cpu (gh)
Process completed with exit code 1.
Build Windows Wheels / pytorch/rl / upload / upload-wheel-py3_10-cpu (gh)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.10_cpu_x64
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Process completed with exit code 1.
Unit-tests on Linux / tests-optdeps (3.12, 13.0) / linux-job (gh)
test/transforms/test_env_transforms.py::TestTargetReturn::test_parallel_trans_env_check[device0-constant]

CANCELLED JOB - The following job was cancelled. Please retry:

Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

vmoens

Unmergeable, this goes against the idea of policy version tracking. Continue != update policy weight

[ghstack-poisoned]

…mand When the parent calls ``update_policy_weights_()`` and immediately sends a ``continue`` to a worker, the resulting batch was getting tagged with the *pre-update* policy version: the worker's policy version is only bumped when ``increment_version()`` is called, and that previously only fired on direct user calls. Off-policy logging, staleness gating, and prioritized sampling that key on the version saw a 1-step lag. Pass ``track_policy_version=(self.policy_version_tracker is not None)`` to the worker, plumb a ``fresh_command`` flag through the recv loop so the worker knows whether ``msg`` was just received vs replayed from a timed-out iteration, and increment the version on each fresh ``continue`` / ``continue_random`` (only when not in ``run_free``, where the worker drives its own pacing). No new unit test: existing ``MultiCollector`` policy-version coverage in ``test/test_collectors.py`` continues to assert the per-batch version. End-to-end coverage comes through the example at the top of the stack. ghstack-source-id: d780a00 Pull-Request: #3759

Update

836774f

[ghstack-poisoned]

This was referenced May 15, 2026

[BugFix] Recurrent policy auto-register with policy_factory #3753

Merged

[Performance] LSTM/GRU scan: canonical strides + cuDNN flat-storage clones + thread-local recurrent mode #3754

Merged

github-actions Bot added Feature New feature Collectors Integrations/torch_geometric Integrations and removed Feature New feature labels May 15, 2026

github-actions Bot added the Feature New feature label May 15, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2026

vmoens mentioned this pull request May 15, 2026

[Feature] Collector.fake_tensordict() / MultiCollector.fake_tensordict() #3761

Closed

Update

ad550ce

[ghstack-poisoned]

This was referenced May 15, 2026

[Feature] timeit.mark_start/mark_end for non-context-manager timing #3762

Merged

[Example] Isaac Lab RNN PPO with compact memory + knowledge-base notes #3763

Merged

[Feature] Collector.fake_tensordict() / MultiCollector.fake_tensordict() #3764

Merged

vmoens commented May 15, 2026

View reviewed changes

Update

80cb99f

[ghstack-poisoned]

github-actions Bot added the WeightUpdate label May 15, 2026

This was referenced May 15, 2026

[Refactor] Keep [B, T] dim in value estimators #3767

Merged

[Refactor] Simplify LSTM/GRUModule recurrent-mode shape normalization #3768

Merged

[Example] Add Isaac RNN PPO rollout mode flags #3769

Merged

vmoens changed the title ~~[Feature] MultiCollector: track policy version per fresh continue command~~ [BugFix] MultiCollector: track policy version bug fixes May 16, 2026

github-actions Bot added the BugFix label May 16, 2026

Update

9e7d896

[ghstack-poisoned]

vmoens changed the title ~~[BugFix] MultiCollector: track policy version bug fixes~~ [Feature] MultiCollector: track policy version per fresh continue command May 17, 2026

This was referenced May 17, 2026

[Test] Enable scan compile RNN tests on Windows #3770

Closed

[BugFix] Fix GAE compact path bias on recurrent value nets at internal truncations #3771

Merged

Update

8df3671

[ghstack-poisoned]

This was referenced May 18, 2026

[Example] Expose compact GAE cat dimension #3775

Merged

[Doc] Migrate shifted=True callers to legacy/compact + docstring polish #3776

Merged

Update

4479a5d

[ghstack-poisoned]

vmoens merged commit 4479a5d into gh/vmoens/275/base May 18, 2026
106 of 113 checks passed

vmoens deleted the gh/vmoens/275/head branch May 18, 2026 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] MultiCollector: track policy version per fresh continue command#3759

[Feature] MultiCollector: track policy version per fresh continue command#3759
vmoens merged 6 commits into
gh/vmoens/275/basefrom
gh/vmoens/275/head

vmoens commented May 15, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented May 15, 2026 •

edited

Loading

Uh oh!

vmoens left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vmoens commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3759

❗ 1 Active SEVs

❌ 4 New Failures, 1 Cancelled Job

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vmoens commented May 15, 2026 •

edited

Loading

pytorch-bot Bot commented May 15, 2026 •

edited

Loading