Skip to content

[BugFix] Fix stale model reference in MultiCollector weight sync after device-cast#3587

Merged
vmoens merged 5 commits into
mainfrom
fix-update-weights
Apr 5, 2026
Merged

[BugFix] Fix stale model reference in MultiCollector weight sync after device-cast#3587
vmoens merged 5 commits into
mainfrom
fix-update-weights

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 30, 2026

Summary

  • Fixes a bug where update_policy_weights_() silently fails to update one worker's policy in MultiAsyncCollector/MultiSyncCollector when workers use different policy_device values
  • Root cause: _make_policy_factory calls scheme.init_on_receiver(model=policy) storing a weakref to the original policy, but _get_policy_and_device later deepcopies the policy to place it on the target device. The scheme's model reference becomes stale — weight updates go to the original (unused) object
  • After register_scheme_receiver, we now check if the scheme's model matches the collector's actual policy and fix it if they diverge
  • Adds a non-regression test that zeros weights and verifies all workers produce zero actions

Test plan

  • New test test_weight_update_after_device_cast passes (4 variants: Sync/Async × MP/SharedMem)
  • All existing TestPolicyFactory tests still pass
  • CI

🤖 Generated with Claude Code

…eepcopy

When policy_device differs from the policy's native device,
_get_policy_and_device creates a deepcopy on the target device. However,
the weight sync scheme's model reference was set before the deepcopy
(in _make_policy_factory), so subsequent weight updates via the background
thread would silently update the original (unused) object instead of the
collector's actual policy. This caused one worker to never receive weight
updates in MultiAsyncCollector when workers had heterogeneous devices.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 30, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3587

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 30, 2026
Adds logging at key points in the weight sync pipeline to diagnose
why one async worker may not be receiving weight updates:

- MultiAsyncCollector: log which worker produced each batch, and
  weight param fingerprint (sum) when update_policy_weights_ is called
- _runner.py: log policy param fingerprint at rollout start, and
  whether the stale-model-reference fix fires
- _mp.py send(): log number of transports and weight fingerprint
- _mp.py _background_receive_loop(): log param fingerprint BEFORE and
  AFTER weight application per worker, plus model identity

All gated behind DEBUG level (torchrl_logger.isEnabledFor(10)).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}23$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 78.0560μs 76.9697μs 12.9921 KOps/s 12.5707 KOps/s $\color{#35bf28}+3.35\%$
test_tensor_to_bytestream_speed[torch.save] 0.1420ms 0.1382ms 7.2361 KOps/s 7.2265 KOps/s $\color{#35bf28}+0.13\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1047s 0.1042s 9.5985 Ops/s 9.2060 Ops/s $\color{#35bf28}+4.26\%$
test_tensor_to_bytestream_speed[numpy] 2.5526μs 2.5322μs 394.9093 KOps/s 386.0641 KOps/s $\color{#35bf28}+2.29\%$
test_tensor_to_bytestream_speed[safetensors] 38.5535μs 38.1551μs 26.2088 KOps/s 28.2183 KOps/s $\textbf{\color{#d91a1a}-7.12\%}$
test_simple 0.5524s 0.5462s 1.8309 Ops/s 1.7822 Ops/s $\color{#35bf28}+2.73\%$
test_transformed 1.1939s 1.1031s 0.9065 Ops/s 0.9177 Ops/s $\color{#d91a1a}-1.21\%$
test_serial 1.6573s 1.6496s 0.6062 Ops/s 0.5938 Ops/s $\color{#35bf28}+2.09\%$
test_parallel 0.9969s 0.9940s 1.0060 Ops/s 0.9970 Ops/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[True-True-True-True-True] 0.2021ms 39.6338μs 25.2310 KOps/s 24.8805 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[True-True-True-True-False] 46.9810μs 21.9805μs 45.4949 KOps/s 45.0283 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-True-True-False-True] 59.6210μs 23.4043μs 42.7273 KOps/s 44.2039 KOps/s $\color{#d91a1a}-3.34\%$
test_step_mdp_speed[True-True-True-False-False] 37.2010μs 12.3613μs 80.8975 KOps/s 81.3374 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[True-True-False-True-True] 79.8910μs 43.1638μs 23.1675 KOps/s 23.4459 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[True-True-False-True-False] 55.0500μs 24.5238μs 40.7768 KOps/s 40.5577 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[True-True-False-False-True] 0.1053ms 25.8015μs 38.7575 KOps/s 40.3099 KOps/s $\color{#d91a1a}-3.85\%$
test_step_mdp_speed[True-True-False-False-False] 51.2010μs 15.0839μs 66.2958 KOps/s 67.5398 KOps/s $\color{#d91a1a}-1.84\%$
test_step_mdp_speed[True-False-True-True-True] 74.6710μs 45.9358μs 21.7695 KOps/s 22.5052 KOps/s $\color{#d91a1a}-3.27\%$
test_step_mdp_speed[True-False-True-True-False] 53.3710μs 27.2811μs 36.6555 KOps/s 37.2960 KOps/s $\color{#d91a1a}-1.72\%$
test_step_mdp_speed[True-False-True-False-True] 55.2610μs 25.6570μs 38.9757 KOps/s 40.2623 KOps/s $\color{#d91a1a}-3.20\%$
test_step_mdp_speed[True-False-True-False-False] 44.5300μs 14.7705μs 67.7025 KOps/s 67.3860 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[True-False-False-True-True] 0.1154ms 48.0401μs 20.8159 KOps/s 21.1825 KOps/s $\color{#d91a1a}-1.73\%$
test_step_mdp_speed[True-False-False-True-False] 69.6900μs 29.3416μs 34.0813 KOps/s 34.1542 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[True-False-False-False-True] 58.7910μs 27.6349μs 36.1862 KOps/s 36.1603 KOps/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[True-False-False-False-False] 44.6410μs 17.0353μs 58.7017 KOps/s 56.9676 KOps/s $\color{#35bf28}+3.04\%$
test_step_mdp_speed[False-True-True-True-True] 92.9410μs 45.3877μs 22.0324 KOps/s 22.3972 KOps/s $\color{#d91a1a}-1.63\%$
test_step_mdp_speed[False-True-True-True-False] 95.2210μs 27.1877μs 36.7814 KOps/s 37.3188 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[False-True-True-False-True] 2.8978ms 30.6969μs 32.5766 KOps/s 35.2272 KOps/s $\textbf{\color{#d91a1a}-7.52\%}$
test_step_mdp_speed[False-True-True-False-False] 45.5610μs 16.5693μs 60.3525 KOps/s 61.1902 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-True-False-True-True] 79.2410μs 47.9696μs 20.8465 KOps/s 21.2707 KOps/s $\color{#d91a1a}-1.99\%$
test_step_mdp_speed[False-True-False-True-False] 63.1800μs 29.5209μs 33.8744 KOps/s 34.1258 KOps/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[False-True-False-False-True] 66.0510μs 31.7170μs 31.5289 KOps/s 32.6788 KOps/s $\color{#d91a1a}-3.52\%$
test_step_mdp_speed[False-True-False-False-False] 90.7810μs 19.2272μs 52.0098 KOps/s 54.0100 KOps/s $\color{#d91a1a}-3.70\%$
test_step_mdp_speed[False-False-True-True-True] 94.7910μs 51.0408μs 19.5922 KOps/s 19.7547 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[False-False-True-True-False] 65.9410μs 32.6366μs 30.6405 KOps/s 31.1354 KOps/s $\color{#d91a1a}-1.59\%$
test_step_mdp_speed[False-False-True-False-True] 79.4910μs 31.4523μs 31.7942 KOps/s 32.3424 KOps/s $\color{#d91a1a}-1.69\%$
test_step_mdp_speed[False-False-True-False-False] 45.7300μs 18.6911μs 53.5013 KOps/s 53.5159 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[False-False-False-True-True] 0.1208ms 52.5186μs 19.0409 KOps/s 19.2844 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[False-False-False-True-False] 65.9700μs 34.5489μs 28.9445 KOps/s 29.3048 KOps/s $\color{#d91a1a}-1.23\%$
test_step_mdp_speed[False-False-False-False-True] 0.2214ms 32.5072μs 30.7624 KOps/s 31.0611 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[False-False-False-False-False] 49.7610μs 21.1047μs 47.3828 KOps/s 47.7245 KOps/s $\color{#d91a1a}-0.72\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8578s 0.7355s 1.3597 Ops/s 1.3906 Ops/s $\color{#d91a1a}-2.22\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7213s 0.6122s 1.6335 Ops/s 1.6997 Ops/s $\color{#d91a1a}-3.90\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7280s 1.6446s 0.6081 Ops/s 0.6245 Ops/s $\color{#d91a1a}-2.64\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4580s 1.3786s 0.7254 Ops/s 0.7292 Ops/s $\color{#d91a1a}-0.52\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9860s 1.8801s 0.5319 Ops/s 0.5427 Ops/s $\color{#d91a1a}-1.99\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7574s 1.6760s 0.5966 Ops/s 0.6129 Ops/s $\color{#d91a1a}-2.65\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6040s 4.5507s 0.2197 Ops/s 0.2215 Ops/s $\color{#d91a1a}-0.79\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.3993s 4.2966s 0.2327 Ops/s 0.2331 Ops/s $\color{#d91a1a}-0.15\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0059s 1.8483s 0.5410 Ops/s 0.5431 Ops/s $\color{#d91a1a}-0.38\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6573s 1.5670s 0.6382 Ops/s 0.6377 Ops/s $\color{#35bf28}+0.07\%$
test_values[generalized_advantage_estimate-True-True] 9.9790ms 9.7287ms 102.7882 Ops/s 96.3014 Ops/s $\textbf{\color{#35bf28}+6.74\%}$
test_values[vec_generalized_advantage_estimate-True-True] 16.3805ms 12.2679ms 81.5136 Ops/s 55.5394 Ops/s $\textbf{\color{#35bf28}+46.77\%}$
test_values[td0_return_estimate-False-False] 0.2456ms 0.1294ms 7.7298 KOps/s 7.6296 KOps/s $\color{#35bf28}+1.31\%$
test_values[td1_return_estimate-False-False] 26.9315ms 26.2447ms 38.1029 Ops/s 35.8893 Ops/s $\textbf{\color{#35bf28}+6.17\%}$
test_values[vec_td1_return_estimate-False-False] 18.3795ms 12.2319ms 81.7532 Ops/s 55.7823 Ops/s $\textbf{\color{#35bf28}+46.56\%}$
test_values[td_lambda_return_estimate-True-False] 39.5590ms 38.6869ms 25.8486 Ops/s 24.5863 Ops/s $\textbf{\color{#35bf28}+5.13\%}$
test_values[vec_td_lambda_return_estimate-True-False] 18.2593ms 12.2536ms 81.6090 Ops/s 56.3889 Ops/s $\textbf{\color{#35bf28}+44.73\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.2568ms 8.6472ms 115.6440 Ops/s 109.9454 Ops/s $\textbf{\color{#35bf28}+5.18\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.8542ms 1.5706ms 636.6880 Ops/s 655.2986 Ops/s $\color{#d91a1a}-2.84\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4618ms 0.4176ms 2.3948 KOps/s 2.4153 KOps/s $\color{#d91a1a}-0.85\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 31.7262ms 30.7958ms 32.4719 Ops/s 32.1320 Ops/s $\color{#35bf28}+1.06\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.2832ms 1.7976ms 556.3041 Ops/s 556.2903 Ops/s $+0.00\%$
test_dqn_speed[False-None] 1.8222ms 1.3678ms 731.1045 Ops/s 727.5999 Ops/s $\color{#35bf28}+0.48\%$
test_dqn_speed[False-backward] 1.9394ms 1.8908ms 528.8839 Ops/s 524.3179 Ops/s $\color{#35bf28}+0.87\%$
test_dqn_speed[True-None] 1.0153ms 0.5633ms 1.7752 KOps/s 1.6780 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_dqn_speed[True-backward] 1.0756ms 1.0323ms 968.7412 Ops/s 920.0856 Ops/s $\textbf{\color{#35bf28}+5.29\%}$
test_dqn_speed[reduce-overhead-None] 1.0200ms 0.5642ms 1.7725 KOps/s 1.7280 KOps/s $\color{#35bf28}+2.58\%$
test_ddpg_speed[False-None] 3.2684ms 2.8600ms 349.6463 Ops/s 348.9763 Ops/s $\color{#35bf28}+0.19\%$
test_ddpg_speed[False-backward] 4.2076ms 4.0543ms 246.6506 Ops/s 245.0534 Ops/s $\color{#35bf28}+0.65\%$
test_ddpg_speed[True-None] 1.9172ms 1.4729ms 678.9198 Ops/s 664.5766 Ops/s $\color{#35bf28}+2.16\%$
test_ddpg_speed[True-backward] 2.6031ms 2.5089ms 398.5815 Ops/s 389.5341 Ops/s $\color{#35bf28}+2.32\%$
test_ddpg_speed[reduce-overhead-None] 1.8913ms 1.4459ms 691.6318 Ops/s 687.9280 Ops/s $\color{#35bf28}+0.54\%$
test_sac_speed[False-None] 8.5723ms 8.0413ms 124.3579 Ops/s 123.2364 Ops/s $\color{#35bf28}+0.91\%$
test_sac_speed[False-backward] 12.1033ms 11.3442ms 88.1508 Ops/s 86.9935 Ops/s $\color{#35bf28}+1.33\%$
test_sac_speed[True-None] 2.7251ms 2.2790ms 438.7950 Ops/s 429.5764 Ops/s $\color{#35bf28}+2.15\%$
test_sac_speed[True-backward] 4.6346ms 4.2612ms 234.6771 Ops/s 206.9155 Ops/s $\textbf{\color{#35bf28}+13.42\%}$
test_sac_speed[reduce-overhead-None] 2.8124ms 2.2593ms 442.6196 Ops/s 421.9421 Ops/s $\color{#35bf28}+4.90\%$
test_redq_speed[False-None] 15.1188ms 11.0019ms 90.8933 Ops/s 90.7640 Ops/s $\color{#35bf28}+0.14\%$
test_redq_speed[False-backward] 24.1625ms 19.3675ms 51.6329 Ops/s 54.7298 Ops/s $\textbf{\color{#d91a1a}-5.66\%}$
test_redq_speed[True-None] 6.3102ms 4.9414ms 202.3702 Ops/s 207.8298 Ops/s $\color{#d91a1a}-2.63\%$
test_redq_speed[reduce-overhead-None] 5.2163ms 4.7525ms 210.4148 Ops/s 195.1319 Ops/s $\textbf{\color{#35bf28}+7.83\%}$
test_redq_deprec_speed[False-None] 12.0362ms 11.3966ms 87.7458 Ops/s 86.3507 Ops/s $\color{#35bf28}+1.62\%$
test_redq_deprec_speed[False-backward] 17.0810ms 16.3806ms 61.0480 Ops/s 60.0173 Ops/s $\color{#35bf28}+1.72\%$
test_redq_deprec_speed[True-None] 4.1366ms 3.7754ms 264.8746 Ops/s 248.8793 Ops/s $\textbf{\color{#35bf28}+6.43\%}$
test_redq_deprec_speed[True-backward] 8.1177ms 7.8627ms 127.1828 Ops/s 124.5740 Ops/s $\color{#35bf28}+2.09\%$
test_redq_deprec_speed[reduce-overhead-None] 4.1812ms 3.7379ms 267.5330 Ops/s 256.7343 Ops/s $\color{#35bf28}+4.21\%$
test_td3_speed[False-None] 8.1452ms 7.9915ms 125.1335 Ops/s 122.1691 Ops/s $\color{#35bf28}+2.43\%$
test_td3_speed[False-backward] 11.2616ms 10.9153ms 91.6143 Ops/s 90.3298 Ops/s $\color{#35bf28}+1.42\%$
test_td3_speed[True-None] 1.9927ms 1.9123ms 522.9255 Ops/s 510.1473 Ops/s $\color{#35bf28}+2.50\%$
test_td3_speed[True-backward] 3.9819ms 3.8023ms 262.9972 Ops/s 264.2577 Ops/s $\color{#d91a1a}-0.48\%$
test_td3_speed[reduce-overhead-None] 1.9398ms 1.8764ms 532.9388 Ops/s 528.5495 Ops/s $\color{#35bf28}+0.83\%$
test_cql_speed[False-None] 30.4610ms 27.1701ms 36.8051 Ops/s 37.0688 Ops/s $\color{#d91a1a}-0.71\%$
test_cql_speed[False-backward] 36.9375ms 36.2677ms 27.5728 Ops/s 26.9028 Ops/s $\color{#35bf28}+2.49\%$
test_cql_speed[True-None] 13.7107ms 13.1236ms 76.1987 Ops/s 76.5937 Ops/s $\color{#d91a1a}-0.52\%$
test_cql_speed[True-backward] 19.6710ms 19.0399ms 52.5212 Ops/s 53.1036 Ops/s $\color{#d91a1a}-1.10\%$
test_cql_speed[reduce-overhead-None] 13.5389ms 12.7863ms 78.2089 Ops/s 74.8299 Ops/s $\color{#35bf28}+4.52\%$
test_a2c_speed[False-None] 5.9817ms 5.5557ms 179.9950 Ops/s 176.3455 Ops/s $\color{#35bf28}+2.07\%$
test_a2c_speed[False-backward] 12.6884ms 12.1699ms 82.1702 Ops/s 80.6880 Ops/s $\color{#35bf28}+1.84\%$
test_a2c_speed[True-None] 4.2179ms 4.0065ms 249.5926 Ops/s 249.5182 Ops/s $\color{#35bf28}+0.03\%$
test_a2c_speed[True-backward] 9.2147ms 8.9838ms 111.3109 Ops/s 110.2261 Ops/s $\color{#35bf28}+0.98\%$
test_a2c_speed[reduce-overhead-None] 4.6544ms 3.9369ms 254.0070 Ops/s 247.9305 Ops/s $\color{#35bf28}+2.45\%$
test_ppo_speed[False-None] 6.4481ms 5.9763ms 167.3263 Ops/s 167.8428 Ops/s $\color{#d91a1a}-0.31\%$
test_ppo_speed[False-backward] 13.0462ms 12.6326ms 79.1602 Ops/s 77.9470 Ops/s $\color{#35bf28}+1.56\%$
test_ppo_speed[True-None] 4.3136ms 3.8667ms 258.6177 Ops/s 246.7814 Ops/s $\color{#35bf28}+4.80\%$
test_ppo_speed[True-backward] 9.0845ms 8.7531ms 114.2455 Ops/s 104.3004 Ops/s $\textbf{\color{#35bf28}+9.54\%}$
test_ppo_speed[reduce-overhead-None] 3.9664ms 3.8148ms 262.1353 Ops/s 252.2799 Ops/s $\color{#35bf28}+3.91\%$
test_reinforce_speed[False-None] 5.1098ms 4.6292ms 216.0188 Ops/s 212.7638 Ops/s $\color{#35bf28}+1.53\%$
test_reinforce_speed[False-backward] 10.8007ms 7.7688ms 128.7206 Ops/s 131.2831 Ops/s $\color{#d91a1a}-1.95\%$
test_reinforce_speed[True-None] 3.6644ms 3.0984ms 322.7432 Ops/s 315.0846 Ops/s $\color{#35bf28}+2.43\%$
test_reinforce_speed[True-backward] 8.8094ms 8.3748ms 119.4056 Ops/s 120.9890 Ops/s $\color{#d91a1a}-1.31\%$
test_reinforce_speed[reduce-overhead-None] 3.6034ms 3.0598ms 326.8158 Ops/s 317.4713 Ops/s $\color{#35bf28}+2.94\%$
test_iql_speed[False-None] 21.2401ms 20.3626ms 49.1096 Ops/s 48.2434 Ops/s $\color{#35bf28}+1.80\%$
test_iql_speed[False-backward] 32.1804ms 31.3636ms 31.8840 Ops/s 31.9018 Ops/s $\color{#d91a1a}-0.06\%$
test_iql_speed[True-None] 9.5261ms 8.9340ms 111.9322 Ops/s 110.9971 Ops/s $\color{#35bf28}+0.84\%$
test_iql_speed[True-backward] 18.1869ms 17.5181ms 57.0838 Ops/s 57.5661 Ops/s $\color{#d91a1a}-0.84\%$
test_iql_speed[reduce-overhead-None] 9.4336ms 9.0075ms 111.0184 Ops/s 110.3658 Ops/s $\color{#35bf28}+0.59\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.0760ms 5.8404ms 171.2208 Ops/s 169.1350 Ops/s $\color{#35bf28}+1.23\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.7364ms 0.3720ms 2.6883 KOps/s 3.2298 KOps/s $\textbf{\color{#d91a1a}-16.77\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7784ms 0.3060ms 3.2674 KOps/s 3.5676 KOps/s $\textbf{\color{#d91a1a}-8.41\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2125ms 5.6474ms 177.0731 Ops/s 175.3499 Ops/s $\color{#35bf28}+0.98\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7109ms 0.2995ms 3.3390 KOps/s 3.1575 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7901ms 0.2750ms 3.6364 KOps/s 3.3425 KOps/s $\textbf{\color{#35bf28}+8.79\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8134ms 1.3247ms 754.9013 Ops/s 738.8817 Ops/s $\color{#35bf28}+2.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5164ms 1.2453ms 803.0171 Ops/s 795.2523 Ops/s $\color{#35bf28}+0.98\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.9148ms 5.9587ms 167.8223 Ops/s 170.7736 Ops/s $\color{#d91a1a}-1.73\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0222ms 0.4573ms 2.1869 KOps/s 2.0666 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9135ms 0.4339ms 2.3045 KOps/s 2.1983 KOps/s $\color{#35bf28}+4.83\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1152ms 5.6553ms 176.8247 Ops/s 175.6041 Ops/s $\color{#35bf28}+0.70\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.3071ms 0.3576ms 2.7967 KOps/s 3.3244 KOps/s $\textbf{\color{#d91a1a}-15.87\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4935ms 0.2772ms 3.6077 KOps/s 2.8124 KOps/s $\textbf{\color{#35bf28}+28.28\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1537ms 5.6252ms 177.7717 Ops/s 177.0100 Ops/s $\color{#35bf28}+0.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.1891ms 0.2930ms 3.4133 KOps/s 2.7999 KOps/s $\textbf{\color{#35bf28}+21.91\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6358ms 0.4166ms 2.4006 KOps/s 2.9359 KOps/s $\textbf{\color{#d91a1a}-18.23\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1346ms 5.7351ms 174.3648 Ops/s 171.4488 Ops/s $\color{#35bf28}+1.70\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0013ms 0.4519ms 2.2128 KOps/s 1.8924 KOps/s $\textbf{\color{#35bf28}+16.93\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7682ms 0.4359ms 2.2941 KOps/s 1.9834 KOps/s $\textbf{\color{#35bf28}+15.66\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.9952s 24.8224ms 40.2863 Ops/s 49.3589 Ops/s $\textbf{\color{#d91a1a}-18.38\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.3688ms 2.0122ms 496.9686 Ops/s 502.3853 Ops/s $\color{#d91a1a}-1.08\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.0910ms 1.2142ms 823.5713 Ops/s 804.9381 Ops/s $\color{#35bf28}+2.31\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5553ms 4.9509ms 201.9848 Ops/s 194.1783 Ops/s $\color{#35bf28}+4.02\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 12.8358ms 1.9648ms 508.9454 Ops/s 489.3299 Ops/s $\color{#35bf28}+4.01\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.9187ms 1.1496ms 869.8773 Ops/s 857.5569 Ops/s $\color{#35bf28}+1.44\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.0830ms 5.1605ms 193.7808 Ops/s 190.9139 Ops/s $\color{#35bf28}+1.50\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.0159ms 1.9184ms 521.2801 Ops/s 513.7577 Ops/s $\color{#35bf28}+1.46\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.3533ms 1.0789ms 926.8557 Ops/s 905.7621 Ops/s $\color{#35bf28}+2.33\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 40.4908ms 38.2294ms 26.1579 Ops/s 25.5181 Ops/s $\color{#35bf28}+2.51\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3329ms 17.6835ms 56.5498 Ops/s 54.7876 Ops/s $\color{#35bf28}+3.22\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 42.6817ms 39.1820ms 25.5219 Ops/s 24.3539 Ops/s $\color{#35bf28}+4.80\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.0581ms 18.3473ms 54.5038 Ops/s 54.2682 Ops/s $\color{#35bf28}+0.43\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.3236ms 40.4809ms 24.7030 Ops/s 23.5990 Ops/s $\color{#35bf28}+4.68\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.0901ms 19.5350ms 51.1902 Ops/s 50.3379 Ops/s $\color{#35bf28}+1.69\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8760ms 0.2206ms 4.5336 KOps/s 4.5212 KOps/s $\color{#35bf28}+0.27\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7037ms 1.4708ms 679.9022 Ops/s 647.1109 Ops/s $\textbf{\color{#35bf28}+5.07\%}$
test_storage_write_lazystack[100-img_shape2-large_img] 3.3047ms 2.5069ms 398.9028 Ops/s 391.8193 Ops/s $\color{#35bf28}+1.81\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.6278ms 3.1204ms 320.4712 Ops/s 305.9890 Ops/s $\color{#35bf28}+4.73\%$
test_storage_write_contiguous[50-img_shape0-small] 0.1985ms 0.1334ms 7.4961 KOps/s 7.3820 KOps/s $\color{#35bf28}+1.54\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3445ms 0.1872ms 5.3415 KOps/s 4.9100 KOps/s $\textbf{\color{#35bf28}+8.79\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.5630ms 1.9355ms 516.6550 Ops/s 520.5096 Ops/s $\color{#d91a1a}-0.74\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.7171ms 1.4275ms 700.5160 Ops/s 710.9646 Ops/s $\color{#d91a1a}-1.47\%$
test_collector_stack_then_write[50-img_shape0-small] 1.5572ms 1.1147ms 897.1170 Ops/s 904.6944 Ops/s $\color{#d91a1a}-0.84\%$
test_collector_stack_then_write[100-img_shape1-atari] 4.0506ms 3.5630ms 280.6598 Ops/s 273.0624 Ops/s $\color{#35bf28}+2.78\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.6762ms 5.9173ms 168.9971 Ops/s 166.8631 Ops/s $\color{#35bf28}+1.28\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.8508ms 7.3569ms 135.9273 Ops/s 135.9802 Ops/s $\color{#d91a1a}-0.04\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.7037ms 0.2719ms 3.6785 KOps/s 3.5202 KOps/s $\color{#35bf28}+4.49\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.9873ms 1.5266ms 655.0445 Ops/s 608.1558 Ops/s $\textbf{\color{#35bf28}+7.71\%}$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.9529ms 2.5729ms 388.6660 Ops/s 380.8271 Ops/s $\color{#35bf28}+2.06\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.7905ms 3.2511ms 307.5873 Ops/s 293.7758 Ops/s $\color{#35bf28}+4.70\%$
test_collector_without_rb[100-img_shape0-atari] 32.5890ms 32.1095ms 31.1434 Ops/s 30.4110 Ops/s $\color{#35bf28}+2.41\%$
test_collector_without_rb[200-img_shape1-large_batch] 63.4069ms 62.9628ms 15.8824 Ops/s 15.4028 Ops/s $\color{#35bf28}+3.11\%$
test_collector_with_rb[100-img_shape0-atari] 38.4355ms 36.5557ms 27.3555 Ops/s 26.6583 Ops/s $\color{#35bf28}+2.62\%$
test_collector_with_rb[200-img_shape1-large_batch] 72.6072ms 72.0978ms 13.8701 Ops/s 13.5890 Ops/s $\color{#35bf28}+2.07\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 30, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.0278μs 80.2491μs 12.4612 KOps/s 11.9585 KOps/s $\color{#35bf28}+4.20\%$
test_tensor_to_bytestream_speed[torch.save] 0.1410ms 0.1404ms 7.1202 KOps/s 7.0780 KOps/s $\color{#35bf28}+0.60\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1099s 0.1092s 9.1541 Ops/s 9.5327 Ops/s $\color{#d91a1a}-3.97\%$
test_tensor_to_bytestream_speed[numpy] 2.6168μs 2.6035μs 384.0947 KOps/s 403.9074 KOps/s $\color{#d91a1a}-4.91\%$
test_tensor_to_bytestream_speed[safetensors] 36.7711μs 36.5890μs 27.3306 KOps/s 27.1449 KOps/s $\color{#35bf28}+0.68\%$
test_simple 0.9235s 0.8226s 1.2157 Ops/s 1.2365 Ops/s $\color{#d91a1a}-1.68\%$
test_transformed 1.3840s 1.3801s 0.7246 Ops/s 0.7145 Ops/s $\color{#35bf28}+1.41\%$
test_serial 2.3253s 2.3161s 0.4318 Ops/s 0.4326 Ops/s $\color{#d91a1a}-0.19\%$
test_parallel 1.9129s 1.8068s 0.5535 Ops/s 0.5502 Ops/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-True-True-True-True] 0.2337ms 41.7979μs 23.9247 KOps/s 23.7834 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-True-True-True-False] 61.1810μs 23.4587μs 42.6280 KOps/s 43.5288 KOps/s $\color{#d91a1a}-2.07\%$
test_step_mdp_speed[True-True-True-False-True] 57.4410μs 23.4648μs 42.6171 KOps/s 42.4611 KOps/s $\color{#35bf28}+0.37\%$
test_step_mdp_speed[True-True-True-False-False] 40.0610μs 12.9605μs 77.1574 KOps/s 77.8724 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-True-False-True-True] 78.8010μs 44.8350μs 22.3040 KOps/s 22.9979 KOps/s $\color{#d91a1a}-3.02\%$
test_step_mdp_speed[True-True-False-True-False] 58.4110μs 25.7928μs 38.7705 KOps/s 39.8502 KOps/s $\color{#d91a1a}-2.71\%$
test_step_mdp_speed[True-True-False-False-True] 89.7820μs 26.3567μs 37.9411 KOps/s 38.9386 KOps/s $\color{#d91a1a}-2.56\%$
test_step_mdp_speed[True-True-False-False-False] 40.9310μs 15.6802μs 63.7745 KOps/s 66.1746 KOps/s $\color{#d91a1a}-3.63\%$
test_step_mdp_speed[True-False-True-True-True] 95.9530μs 46.4156μs 21.5445 KOps/s 21.4923 KOps/s $\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-False-True-True-False] 60.8510μs 27.9247μs 35.8106 KOps/s 36.0940 KOps/s $\color{#d91a1a}-0.79\%$
test_step_mdp_speed[True-False-True-False-True] 60.6720μs 25.8868μs 38.6297 KOps/s 37.9891 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[True-False-True-False-False] 42.6410μs 15.5052μs 64.4946 KOps/s 64.3111 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-False-False-True-True] 0.1111ms 49.4319μs 20.2298 KOps/s 20.3737 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[True-False-False-True-False] 62.6310μs 30.9371μs 32.3236 KOps/s 33.1618 KOps/s $\color{#d91a1a}-2.53\%$
test_step_mdp_speed[True-False-False-False-True] 63.9810μs 28.8396μs 34.6745 KOps/s 35.4076 KOps/s $\color{#d91a1a}-2.07\%$
test_step_mdp_speed[True-False-False-False-False] 46.5320μs 17.9528μs 55.7017 KOps/s 55.7393 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-True-True-True-True] 76.7220μs 47.0095μs 21.2723 KOps/s 21.2922 KOps/s $\color{#d91a1a}-0.09\%$
test_step_mdp_speed[False-True-True-True-False] 92.1720μs 28.3315μs 35.2964 KOps/s 35.7227 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[False-True-True-False-True] 2.5406ms 30.6587μs 32.6171 KOps/s 33.6092 KOps/s $\color{#d91a1a}-2.95\%$
test_step_mdp_speed[False-True-True-False-False] 47.1410μs 17.5303μs 57.0441 KOps/s 59.8289 KOps/s $\color{#d91a1a}-4.65\%$
test_step_mdp_speed[False-True-False-True-True] 88.0420μs 49.3152μs 20.2777 KOps/s 20.3972 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[False-True-False-True-False] 59.7810μs 30.7375μs 32.5335 KOps/s 32.6283 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-True-False-False-True] 97.8520μs 32.1565μs 31.0979 KOps/s 31.2468 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[False-True-False-False-False] 54.1110μs 19.6096μs 50.9955 KOps/s 52.4813 KOps/s $\color{#d91a1a}-2.83\%$
test_step_mdp_speed[False-False-True-True-True] 86.4920μs 51.7592μs 19.3202 KOps/s 19.5614 KOps/s $\color{#d91a1a}-1.23\%$
test_step_mdp_speed[False-False-True-True-False] 64.9920μs 33.3664μs 29.9703 KOps/s 30.4508 KOps/s $\color{#d91a1a}-1.58\%$
test_step_mdp_speed[False-False-True-False-True] 81.4820μs 31.9744μs 31.2750 KOps/s 31.6391 KOps/s $\color{#d91a1a}-1.15\%$
test_step_mdp_speed[False-False-True-False-False] 49.0410μs 19.9587μs 50.1034 KOps/s 52.7596 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_step_mdp_speed[False-False-False-True-True] 0.1181ms 54.6081μs 18.3123 KOps/s 18.6815 KOps/s $\color{#d91a1a}-1.98\%$
test_step_mdp_speed[False-False-False-True-False] 67.2620μs 35.9456μs 27.8198 KOps/s 28.5320 KOps/s $\color{#d91a1a}-2.50\%$
test_step_mdp_speed[False-False-False-False-True] 60.8420μs 34.1038μs 29.3223 KOps/s 29.7542 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[False-False-False-False-False] 58.3210μs 22.0777μs 45.2946 KOps/s 46.2549 KOps/s $\color{#d91a1a}-2.08\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7337s 0.7204s 1.3881 Ops/s 1.3422 Ops/s $\color{#35bf28}+3.42\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7135s 0.6096s 1.6403 Ops/s 1.6407 Ops/s $\color{#d91a1a}-0.02\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7517s 1.6498s 0.6061 Ops/s 0.6108 Ops/s $\color{#d91a1a}-0.77\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5055s 1.4198s 0.7043 Ops/s 0.7031 Ops/s $\color{#35bf28}+0.17\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9588s 1.8789s 0.5322 Ops/s 0.5284 Ops/s $\color{#35bf28}+0.72\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.8011s 1.7067s 0.5859 Ops/s 0.6007 Ops/s $\color{#d91a1a}-2.46\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7207s 4.6133s 0.2168 Ops/s 0.2165 Ops/s $\color{#35bf28}+0.11\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5910s 4.4683s 0.2238 Ops/s 0.2271 Ops/s $\color{#d91a1a}-1.44\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0156s 1.8755s 0.5332 Ops/s 0.5376 Ops/s $\color{#d91a1a}-0.81\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6871s 1.5955s 0.6268 Ops/s 0.6286 Ops/s $\color{#d91a1a}-0.29\%$
test_values[generalized_advantage_estimate-True-True] 20.9816ms 20.5039ms 48.7712 Ops/s 49.4542 Ops/s $\color{#d91a1a}-1.38\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1420s 3.7516ms 266.5539 Ops/s 265.0445 Ops/s $\color{#35bf28}+0.57\%$
test_values[td0_return_estimate-False-False] 0.1047ms 82.1855μs 12.1676 KOps/s 12.1781 KOps/s $\color{#d91a1a}-0.09\%$
test_values[td1_return_estimate-False-False] 48.8558ms 48.2699ms 20.7168 Ops/s 20.2392 Ops/s $\color{#35bf28}+2.36\%$
test_values[vec_td1_return_estimate-False-False] 1.3192ms 1.0785ms 927.2482 Ops/s 918.2039 Ops/s $\color{#35bf28}+0.98\%$
test_values[td_lambda_return_estimate-True-False] 79.2459ms 78.5762ms 12.7265 Ops/s 12.5004 Ops/s $\color{#35bf28}+1.81\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2896ms 1.0692ms 935.2353 Ops/s 932.9930 Ops/s $\color{#35bf28}+0.24\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.8829ms 20.5285ms 48.7127 Ops/s 49.0397 Ops/s $\color{#d91a1a}-0.67\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0186ms 0.7474ms 1.3379 KOps/s 1.3380 KOps/s $-0.01\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8076ms 0.6654ms 1.5028 KOps/s 1.5023 KOps/s $\color{#35bf28}+0.04\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5535ms 1.4824ms 674.5713 Ops/s 676.9365 Ops/s $\color{#d91a1a}-0.35\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8462ms 0.6837ms 1.4626 KOps/s 1.4673 KOps/s $\color{#d91a1a}-0.32\%$
test_dqn_speed[False-None] 1.7342ms 1.5638ms 639.4656 Ops/s 638.1585 Ops/s $\color{#35bf28}+0.20\%$
test_dqn_speed[False-backward] 2.2907ms 2.1809ms 458.5309 Ops/s 457.9869 Ops/s $\color{#35bf28}+0.12\%$
test_dqn_speed[True-None] 0.7185ms 0.6014ms 1.6629 KOps/s 1.6374 KOps/s $\color{#35bf28}+1.56\%$
test_dqn_speed[True-backward] 1.2985ms 1.2681ms 788.5806 Ops/s 862.9976 Ops/s $\textbf{\color{#d91a1a}-8.62\%}$
test_dqn_speed[reduce-overhead-None] 0.7045ms 0.6252ms 1.5996 KOps/s 1.5894 KOps/s $\color{#35bf28}+0.64\%$
test_ddpg_speed[False-None] 3.3349ms 2.9646ms 337.3105 Ops/s 333.7800 Ops/s $\color{#35bf28}+1.06\%$
test_ddpg_speed[False-backward] 4.6976ms 4.3187ms 231.5523 Ops/s 234.2858 Ops/s $\color{#d91a1a}-1.17\%$
test_ddpg_speed[True-None] 1.4774ms 1.3855ms 721.7868 Ops/s 702.5776 Ops/s $\color{#35bf28}+2.73\%$
test_ddpg_speed[True-backward] 2.6320ms 2.5623ms 390.2729 Ops/s 404.4580 Ops/s $\color{#d91a1a}-3.51\%$
test_ddpg_speed[reduce-overhead-None] 1.4881ms 1.3882ms 720.3487 Ops/s 705.9544 Ops/s $\color{#35bf28}+2.04\%$
test_sac_speed[False-None] 8.7626ms 8.3558ms 119.6769 Ops/s 118.1789 Ops/s $\color{#35bf28}+1.27\%$
test_sac_speed[False-backward] 11.8982ms 11.4270ms 87.5120 Ops/s 88.4218 Ops/s $\color{#d91a1a}-1.03\%$
test_sac_speed[True-None] 2.1671ms 1.9509ms 512.5931 Ops/s 511.3271 Ops/s $\color{#35bf28}+0.25\%$
test_sac_speed[True-backward] 3.9185ms 3.7444ms 267.0688 Ops/s 276.3228 Ops/s $\color{#d91a1a}-3.35\%$
test_sac_speed[reduce-overhead-None] 16.7103ms 10.1693ms 98.3348 Ops/s 99.0951 Ops/s $\color{#d91a1a}-0.77\%$
test_redq_deprec_speed[False-None] 10.1744ms 9.3268ms 107.2173 Ops/s 106.3378 Ops/s $\color{#35bf28}+0.83\%$
test_redq_deprec_speed[False-backward] 13.0339ms 12.5535ms 79.6588 Ops/s 80.8345 Ops/s $\color{#d91a1a}-1.45\%$
test_redq_deprec_speed[True-None] 2.8300ms 2.7429ms 364.5822 Ops/s 361.7694 Ops/s $\color{#35bf28}+0.78\%$
test_redq_deprec_speed[True-backward] 4.8308ms 4.4493ms 224.7536 Ops/s 227.7992 Ops/s $\color{#d91a1a}-1.34\%$
test_redq_deprec_speed[reduce-overhead-None] 14.6907ms 9.6988ms 103.1050 Ops/s 103.6153 Ops/s $\color{#d91a1a}-0.49\%$
test_td3_speed[False-None] 8.2831ms 8.1801ms 122.2481 Ops/s 121.6831 Ops/s $\color{#35bf28}+0.46\%$
test_td3_speed[False-backward] 10.9511ms 10.6470ms 93.9231 Ops/s 93.0029 Ops/s $\color{#35bf28}+0.99\%$
test_td3_speed[True-None] 1.8056ms 1.7210ms 581.0548 Ops/s 582.5595 Ops/s $\color{#d91a1a}-0.26\%$
test_td3_speed[True-backward] 3.6590ms 3.2721ms 305.6143 Ops/s 303.1344 Ops/s $\color{#35bf28}+0.82\%$
test_td3_speed[reduce-overhead-None] 55.7214ms 26.0284ms 38.4196 Ops/s 38.1557 Ops/s $\color{#35bf28}+0.69\%$
test_cql_speed[False-None] 17.8139ms 17.4441ms 57.3259 Ops/s 57.0850 Ops/s $\color{#35bf28}+0.42\%$
test_cql_speed[False-backward] 23.2195ms 22.7313ms 43.9922 Ops/s 43.6171 Ops/s $\color{#35bf28}+0.86\%$
test_cql_speed[True-None] 3.6221ms 3.4901ms 286.5221 Ops/s 283.7604 Ops/s $\color{#35bf28}+0.97\%$
test_cql_speed[True-backward] 6.2393ms 5.7970ms 172.5041 Ops/s 171.1397 Ops/s $\color{#35bf28}+0.80\%$
test_cql_speed[reduce-overhead-None] 19.2525ms 11.9621ms 83.5976 Ops/s 83.0774 Ops/s $\color{#35bf28}+0.63\%$
test_a2c_speed[False-None] 3.3807ms 3.2726ms 305.5628 Ops/s 299.4716 Ops/s $\color{#35bf28}+2.03\%$
test_a2c_speed[False-backward] 6.7989ms 6.3269ms 158.0564 Ops/s 162.9348 Ops/s $\color{#d91a1a}-2.99\%$
test_a2c_speed[True-None] 1.5672ms 1.4594ms 685.2017 Ops/s 685.6255 Ops/s $\color{#d91a1a}-0.06\%$
test_a2c_speed[True-backward] 3.3961ms 3.3034ms 302.7210 Ops/s 317.3265 Ops/s $\color{#d91a1a}-4.60\%$
test_a2c_speed[reduce-overhead-None] 1.1877ms 1.1128ms 898.6612 Ops/s 892.4345 Ops/s $\color{#35bf28}+0.70\%$
test_ppo_speed[False-None] 4.0368ms 3.9315ms 254.3538 Ops/s 252.5184 Ops/s $\color{#35bf28}+0.73\%$
test_ppo_speed[False-backward] 7.6246ms 7.2240ms 138.4276 Ops/s 142.0952 Ops/s $\color{#d91a1a}-2.58\%$
test_ppo_speed[True-None] 1.7227ms 1.6099ms 621.1565 Ops/s 620.7946 Ops/s $\color{#35bf28}+0.06\%$
test_ppo_speed[True-backward] 3.6329ms 3.5077ms 285.0882 Ops/s 299.4155 Ops/s $\color{#d91a1a}-4.79\%$
test_ppo_speed[reduce-overhead-None] 1.3002ms 1.1720ms 853.2784 Ops/s 839.5047 Ops/s $\color{#35bf28}+1.64\%$
test_reinforce_speed[False-None] 2.4874ms 2.3554ms 424.5579 Ops/s 423.2604 Ops/s $\color{#35bf28}+0.31\%$
test_reinforce_speed[False-backward] 3.9267ms 3.4676ms 288.3830 Ops/s 285.1405 Ops/s $\color{#35bf28}+1.14\%$
test_reinforce_speed[True-None] 1.5887ms 1.4715ms 679.5749 Ops/s 690.6557 Ops/s $\color{#d91a1a}-1.60\%$
test_reinforce_speed[True-backward] 3.3588ms 3.3025ms 302.8032 Ops/s 298.9322 Ops/s $\color{#35bf28}+1.29\%$
test_reinforce_speed[reduce-overhead-None] 15.3608ms 8.7946ms 113.7064 Ops/s 114.0211 Ops/s $\color{#d91a1a}-0.28\%$
test_iql_speed[False-None] 9.8757ms 9.5551ms 104.6566 Ops/s 104.1396 Ops/s $\color{#35bf28}+0.50\%$
test_iql_speed[False-backward] 13.9267ms 13.4599ms 74.2949 Ops/s 74.5260 Ops/s $\color{#d91a1a}-0.31\%$
test_iql_speed[True-None] 2.5528ms 2.3397ms 427.4046 Ops/s 424.7639 Ops/s $\color{#35bf28}+0.62\%$
test_iql_speed[True-backward] 5.5268ms 4.9155ms 203.4382 Ops/s 196.2770 Ops/s $\color{#35bf28}+3.65\%$
test_iql_speed[reduce-overhead-None] 16.4241ms 10.0463ms 99.5396 Ops/s 99.9013 Ops/s $\color{#d91a1a}-0.36\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.7638ms 5.9940ms 166.8337 Ops/s 165.9973 Ops/s $\color{#35bf28}+0.50\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6121ms 0.3597ms 2.7804 KOps/s 2.5392 KOps/s $\textbf{\color{#35bf28}+9.50\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7049ms 0.3355ms 2.9808 KOps/s 2.6824 KOps/s $\textbf{\color{#35bf28}+11.12\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0520ms 5.8246ms 171.6857 Ops/s 172.2540 Ops/s $\color{#d91a1a}-0.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7697ms 0.2825ms 3.5397 KOps/s 3.3494 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4498ms 0.2645ms 3.7806 KOps/s 3.4218 KOps/s $\textbf{\color{#35bf28}+10.49\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4798ms 1.2657ms 790.1022 Ops/s 777.3782 Ops/s $\color{#35bf28}+1.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6118ms 1.2172ms 821.5385 Ops/s 851.2271 Ops/s $\color{#d91a1a}-3.49\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.9405ms 6.1085ms 163.7068 Ops/s 169.1323 Ops/s $\color{#d91a1a}-3.21\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.5269ms 0.5316ms 1.8810 KOps/s 2.2978 KOps/s $\textbf{\color{#d91a1a}-18.14\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8142ms 0.5152ms 1.9408 KOps/s 2.3967 KOps/s $\textbf{\color{#d91a1a}-19.02\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9597ms 5.8409ms 171.2067 Ops/s 174.0910 Ops/s $\color{#d91a1a}-1.66\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.4458ms 0.3979ms 2.5134 KOps/s 3.4631 KOps/s $\textbf{\color{#d91a1a}-27.42\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5508ms 0.3706ms 2.6986 KOps/s 2.6186 KOps/s $\color{#35bf28}+3.05\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0624ms 5.7622ms 173.5439 Ops/s 176.0249 Ops/s $\color{#d91a1a}-1.41\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9744ms 0.2881ms 3.4706 KOps/s 2.5502 KOps/s $\textbf{\color{#35bf28}+36.09\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5037ms 0.2672ms 3.7428 KOps/s 3.1381 KOps/s $\textbf{\color{#35bf28}+19.27\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.1462ms 5.9595ms 167.7981 Ops/s 166.6892 Ops/s $\color{#35bf28}+0.67\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1453ms 0.4420ms 2.2627 KOps/s 2.2552 KOps/s $\color{#35bf28}+0.33\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6194ms 0.4216ms 2.3718 KOps/s 2.3491 KOps/s $\color{#35bf28}+0.97\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.7223s 19.4072ms 51.5274 Ops/s 197.9671 Ops/s $\textbf{\color{#d91a1a}-73.97\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.9144ms 1.9918ms 502.0672 Ops/s 503.2511 Ops/s $\color{#d91a1a}-0.24\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 8.1804ms 1.2692ms 787.9214 Ops/s 748.1744 Ops/s $\textbf{\color{#35bf28}+5.31\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5333ms 5.0579ms 197.7089 Ops/s 195.2160 Ops/s $\color{#35bf28}+1.28\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9120ms 1.8211ms 549.1243 Ops/s 553.3194 Ops/s $\color{#d91a1a}-0.76\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.6603ms 1.2971ms 770.9533 Ops/s 1.0329 KOps/s $\textbf{\color{#d91a1a}-25.36\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.6674ms 5.2076ms 192.0275 Ops/s 187.0189 Ops/s $\color{#35bf28}+2.68\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 13.3225ms 2.2642ms 441.6645 Ops/s 497.1345 Ops/s $\textbf{\color{#d91a1a}-11.16\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.2025ms 1.1160ms 896.0341 Ops/s 852.4634 Ops/s $\textbf{\color{#35bf28}+5.11\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 44.6031ms 39.4286ms 25.3623 Ops/s 25.4170 Ops/s $\color{#d91a1a}-0.22\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7996ms 18.1330ms 55.1479 Ops/s 55.4363 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 43.7873ms 40.9133ms 24.4419 Ops/s 24.5962 Ops/s $\color{#d91a1a}-0.63\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.0668ms 18.7442ms 53.3499 Ops/s 54.6487 Ops/s $\color{#d91a1a}-2.38\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 45.2167ms 42.6320ms 23.4565 Ops/s 23.5330 Ops/s $\color{#d91a1a}-0.32\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.0187ms 20.0820ms 49.7959 Ops/s 50.8494 Ops/s $\color{#d91a1a}-2.07\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8649ms 0.2210ms 4.5243 KOps/s 4.5437 KOps/s $\color{#d91a1a}-0.43\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6872ms 1.4312ms 698.7385 Ops/s 698.3963 Ops/s $\color{#35bf28}+0.05\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.8142ms 2.3735ms 421.3184 Ops/s 421.5884 Ops/s $\color{#d91a1a}-0.06\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.2517ms 2.9815ms 335.4059 Ops/s 334.8433 Ops/s $\color{#35bf28}+0.17\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2908ms 0.1676ms 5.9648 KOps/s 6.0848 KOps/s $\color{#d91a1a}-1.97\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.4406ms 0.2576ms 3.8815 KOps/s 4.4575 KOps/s $\textbf{\color{#d91a1a}-12.92\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1433ms 1.8997ms 526.4079 Ops/s 536.6940 Ops/s $\color{#d91a1a}-1.92\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6500ms 1.4396ms 694.6334 Ops/s 691.6212 Ops/s $\color{#35bf28}+0.44\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3379ms 1.1638ms 859.2731 Ops/s 852.1754 Ops/s $\color{#35bf28}+0.83\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.6873ms 3.6843ms 271.4246 Ops/s 269.9490 Ops/s $\color{#35bf28}+0.55\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.9990ms 6.0955ms 164.0548 Ops/s 164.0041 Ops/s $\color{#35bf28}+0.03\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 12.6853ms 7.2608ms 137.7266 Ops/s 138.0057 Ops/s $\color{#d91a1a}-0.20\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.3919ms 0.2791ms 3.5826 KOps/s 3.6453 KOps/s $\color{#d91a1a}-1.72\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.8794ms 1.5490ms 645.5703 Ops/s 663.9785 Ops/s $\color{#d91a1a}-2.77\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 3.0263ms 2.5001ms 399.9782 Ops/s 401.2837 Ops/s $\color{#d91a1a}-0.33\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5455ms 3.2458ms 308.0944 Ops/s 313.2913 Ops/s $\color{#d91a1a}-1.66\%$
test_collector_without_rb[100-img_shape0-atari] 34.3355ms 33.3680ms 29.9688 Ops/s 30.8315 Ops/s $\color{#d91a1a}-2.80\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.2379ms 65.0223ms 15.3793 Ops/s 15.4402 Ops/s $\color{#d91a1a}-0.39\%$
test_collector_with_rb[100-img_shape0-atari] 38.5936ms 37.7283ms 26.5053 Ops/s 26.8487 Ops/s $\color{#d91a1a}-1.28\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.0783ms 73.4495ms 13.6148 Ops/s 13.7145 Ops/s $\color{#d91a1a}-0.73\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 55.0887ms 54.8049ms 18.2465 Ops/s 18.1603 Ops/s $\color{#35bf28}+0.47\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1096s 0.1091s 9.1662 Ops/s 9.1500 Ops/s $\color{#35bf28}+0.18\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 56.9176ms 56.6134ms 17.6637 Ops/s 17.4759 Ops/s $\color{#35bf28}+1.07\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1136s 0.1130s 8.8530 Ops/s 8.8408 Ops/s $\color{#35bf28}+0.14\%$

vmoens and others added 3 commits March 30, 2026 17:17
Adds worker startup logging (INFO level) showing:
- policy id, wrapped_policy id, and whether they match
- scheme model id and whether it matches policy/wrapped_policy
- param fingerprint at end of rollout (not just start)

This will reveal whether the scheme updates the same object the
collector uses for inference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ging

SharedMemWeightSyncScheme.prepare_weights() only updated the first
unique weight buffer (index [0]), so when workers ran on different
devices (e.g. cuda:4 and cuda:6), only the first device's shared
memory buffer received new weights. The second worker stayed stale.

Fix: iterate over ALL unique weight buffers in prepare_weights().

Also removes the verbose diagnostic logging added during debugging
and reformats touched files with ufmt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vmoens vmoens merged commit be89a87 into main Apr 5, 2026
62 of 92 checks passed
@vmoens vmoens deleted the fix-update-weights branch April 20, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BugFix CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors WeightUpdate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant