Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API#3556
Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API#3556vmoens wants to merge 1 commit intogh/vmoens/240/basefrom
Conversation
…g API
- Replace manual stateless_init_process_group + collective_rpc("update_weight")
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
- Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
- Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
- Fix LoRA weight extraction (merge_and_unload before state_dict)
- Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
- Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
- Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghstack-source-id: 1a2d958
Pull-Request: #3556
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3556
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New Failures, 1 Unrelated FailureAs of commit f1a6f7b with merge base 4e2e787 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 85.7745μs | 83.4153μs | 11.9882 KOps/s | 12.4565 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1443ms | 0.1418ms | 7.0509 KOps/s | 7.1844 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1045s | 0.1044s | 9.5776 Ops/s | 8.3270 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.6318μs | 2.6257μs | 380.8567 KOps/s | 408.2107 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 37.4488μs | 37.1111μs | 26.9462 KOps/s | 25.6734 KOps/s | |
| test_simple | 0.7963s | 0.7945s | 1.2586 Ops/s | 1.2220 Ops/s | |
| test_transformed | 1.3875s | 1.3863s | 0.7214 Ops/s | 0.7055 Ops/s | |
| test_serial | 2.3219s | 2.3199s | 0.4311 Ops/s | 0.4250 Ops/s | |
| test_parallel | 1.9119s | 1.8176s | 0.5502 Ops/s | 0.5602 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.3828ms | 42.2412μs | 23.6736 KOps/s | 24.3559 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 53.5410μs | 23.0973μs | 43.2951 KOps/s | 43.9947 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 52.6810μs | 23.9984μs | 41.6694 KOps/s | 42.9842 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 40.2310μs | 12.7556μs | 78.3972 KOps/s | 79.3294 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 87.3320μs | 44.6420μs | 22.4004 KOps/s | 22.6172 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 99.6720μs | 25.3218μs | 39.4917 KOps/s | 39.8397 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 0.1704ms | 25.8028μs | 38.7555 KOps/s | 38.4048 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 37.8900μs | 15.2955μs | 65.3789 KOps/s | 65.2202 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 95.6110μs | 46.4071μs | 21.5484 KOps/s | 21.2326 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 57.8010μs | 27.9194μs | 35.8173 KOps/s | 35.8963 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 59.6910μs | 26.2512μs | 38.0935 KOps/s | 38.4047 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 42.2110μs | 15.2830μs | 65.4322 KOps/s | 66.0097 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 84.6420μs | 49.3498μs | 20.2635 KOps/s | 20.5353 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 54.6910μs | 30.6723μs | 32.6027 KOps/s | 32.9814 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 52.1610μs | 28.6375μs | 34.9193 KOps/s | 35.3932 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 50.2210μs | 18.0048μs | 55.5406 KOps/s | 56.0665 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 88.6920μs | 48.0362μs | 20.8176 KOps/s | 21.0580 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 51.2810μs | 28.2226μs | 35.4326 KOps/s | 36.1634 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.5313ms | 30.3865μs | 32.9094 KOps/s | 33.5250 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 56.9810μs | 16.8343μs | 59.4025 KOps/s | 58.9124 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 88.2610μs | 49.0643μs | 20.3814 KOps/s | 20.0625 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 68.0810μs | 30.7046μs | 32.5684 KOps/s | 32.9124 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 70.3220μs | 31.6985μs | 31.5473 KOps/s | 30.8283 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 48.8410μs | 19.3474μs | 51.6866 KOps/s | 51.8173 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 96.9120μs | 51.7330μs | 19.3300 KOps/s | 19.3340 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 62.2710μs | 32.9416μs | 30.3567 KOps/s | 30.6320 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 62.8710μs | 31.9813μs | 31.2683 KOps/s | 30.4018 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 46.1110μs | 19.5249μs | 51.2166 KOps/s | 51.9201 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 84.6310μs | 53.9374μs | 18.5400 KOps/s | 18.6097 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 67.4510μs | 35.7415μs | 27.9787 KOps/s | 28.4094 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 68.5110μs | 34.0521μs | 29.3668 KOps/s | 29.4856 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 52.2910μs | 21.7844μs | 45.9044 KOps/s | 46.2394 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7267s | 0.7221s | 1.3849 Ops/s | 1.3331 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7140s | 0.6101s | 1.6390 Ops/s | 1.6276 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7426s | 1.6479s | 0.6068 Ops/s | 0.6079 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5162s | 1.4302s | 0.6992 Ops/s | 0.6998 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9853s | 1.8995s | 0.5264 Ops/s | 0.5243 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7547s | 1.6769s | 0.5963 Ops/s | 0.5931 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6473s | 4.5678s | 0.2189 Ops/s | 0.2165 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5397s | 4.4743s | 0.2235 Ops/s | 0.2254 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9675s | 1.8647s | 0.5363 Ops/s | 0.5246 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7390s | 1.6064s | 0.6225 Ops/s | 0.6339 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 21.2690ms | 20.7573ms | 48.1757 Ops/s | 48.0337 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1315s | 3.5590ms | 280.9817 Ops/s | 288.5059 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1077ms | 82.4213μs | 12.1328 KOps/s | 12.0235 KOps/s | |
| test_values[td1_return_estimate-False-False] | 49.1981ms | 48.7692ms | 20.5048 Ops/s | 20.4692 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3618ms | 1.0974ms | 911.2810 Ops/s | 917.4576 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 80.3804ms | 79.8272ms | 12.5271 Ops/s | 12.5672 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.2816ms | 1.0957ms | 912.6582 Ops/s | 921.3412 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 21.0763ms | 20.7615ms | 48.1660 Ops/s | 45.6922 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0369ms | 0.7593ms | 1.3171 KOps/s | 1.3177 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7777ms | 0.6848ms | 1.4602 KOps/s | 1.4703 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5456ms | 1.4957ms | 668.5795 Ops/s | 671.2041 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.8243ms | 0.6978ms | 1.4332 KOps/s | 1.4017 KOps/s | |
| test_dqn_speed[False-None] | 1.7065ms | 1.6056ms | 622.8138 Ops/s | 619.5534 Ops/s | |
| test_dqn_speed[False-backward] | 2.5243ms | 2.2581ms | 442.8498 Ops/s | 449.5003 Ops/s | |
| test_dqn_speed[True-None] | 1.1474ms | 0.5886ms | 1.6990 KOps/s | 1.6599 KOps/s | |
| test_dqn_speed[True-backward] | 1.1565ms | 1.1051ms | 904.8857 Ops/s | 884.1143 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.7692ms | 0.6102ms | 1.6387 KOps/s | 1.6174 KOps/s | |
| test_ddpg_speed[False-None] | 3.4341ms | 3.0315ms | 329.8660 Ops/s | 334.4742 Ops/s | |
| test_ddpg_speed[False-backward] | 4.7555ms | 4.3482ms | 229.9781 Ops/s | 226.9653 Ops/s | |
| test_ddpg_speed[True-None] | 1.8357ms | 1.3478ms | 741.9592 Ops/s | 745.2143 Ops/s | |
| test_ddpg_speed[True-backward] | 2.4365ms | 2.3476ms | 425.9649 Ops/s | 398.9026 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.5206ms | 1.3863ms | 721.3671 Ops/s | 728.7283 Ops/s | |
| test_sac_speed[False-None] | 8.9557ms | 8.5652ms | 116.7517 Ops/s | 118.1132 Ops/s | |
| test_sac_speed[False-backward] | 12.3451ms | 11.5036ms | 86.9294 Ops/s | 85.6685 Ops/s | |
| test_sac_speed[True-None] | 2.2532ms | 1.8691ms | 535.0098 Ops/s | 539.6322 Ops/s | |
| test_sac_speed[True-backward] | 3.7250ms | 3.6263ms | 275.7653 Ops/s | 277.5910 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 16.7048ms | 10.5867ms | 94.4581 Ops/s | 96.4204 Ops/s | |
| test_redq_deprec_speed[False-None] | 10.3676ms | 9.5010ms | 105.2524 Ops/s | 104.5580 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.2590ms | 12.8208ms | 77.9981 Ops/s | 78.3113 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.7038ms | 2.5549ms | 391.4021 Ops/s | 370.6978 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.6343ms | 4.2242ms | 236.7334 Ops/s | 235.8948 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 14.7800ms | 9.6912ms | 103.1865 Ops/s | 102.3891 Ops/s | |
| test_td3_speed[False-None] | 8.4513ms | 8.3367ms | 119.9509 Ops/s | 119.2635 Ops/s | |
| test_td3_speed[False-backward] | 11.4108ms | 10.9571ms | 91.2650 Ops/s | 90.3033 Ops/s | |
| test_td3_speed[True-None] | 1.6393ms | 1.6188ms | 617.7249 Ops/s | 616.7764 Ops/s | |
| test_td3_speed[True-backward] | 3.5756ms | 3.1174ms | 320.7846 Ops/s | 332.5152 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 84.6411ms | 26.0669ms | 38.3628 Ops/s | 38.1620 Ops/s | |
| test_cql_speed[False-None] | 18.1758ms | 17.7220ms | 56.4271 Ops/s | 56.0658 Ops/s | |
| test_cql_speed[False-backward] | 23.8950ms | 23.3687ms | 42.7923 Ops/s | 43.3594 Ops/s | |
| test_cql_speed[True-None] | 3.3988ms | 3.3093ms | 302.1796 Ops/s | 302.6918 Ops/s | |
| test_cql_speed[True-backward] | 6.0478ms | 5.5641ms | 179.7246 Ops/s | 177.7661 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 0.8438s | 17.4792ms | 57.2109 Ops/s | 82.0797 Ops/s | |
| test_a2c_speed[False-None] | 3.5066ms | 3.3692ms | 296.8088 Ops/s | 297.5482 Ops/s | |
| test_a2c_speed[False-backward] | 6.9524ms | 6.5371ms | 152.9731 Ops/s | 154.1938 Ops/s | |
| test_a2c_speed[True-None] | 1.5514ms | 1.4035ms | 712.5155 Ops/s | 726.2850 Ops/s | |
| test_a2c_speed[True-backward] | 3.2082ms | 3.1559ms | 316.8629 Ops/s | 315.7619 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.0963ms | 1.0388ms | 962.6453 Ops/s | 961.8308 Ops/s | |
| test_ppo_speed[False-None] | 4.1666ms | 3.9963ms | 250.2333 Ops/s | 249.5948 Ops/s | |
| test_ppo_speed[False-backward] | 7.7311ms | 7.3085ms | 136.8273 Ops/s | 135.9632 Ops/s | |
| test_ppo_speed[True-None] | 1.6090ms | 1.5130ms | 660.9312 Ops/s | 663.8760 Ops/s | |
| test_ppo_speed[True-backward] | 3.3392ms | 3.2921ms | 303.7548 Ops/s | 316.3064 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.2102ms | 1.0976ms | 911.0559 Ops/s | 898.8570 Ops/s | |
| test_reinforce_speed[False-None] | 2.5416ms | 2.4088ms | 415.1505 Ops/s | 415.3324 Ops/s | |
| test_reinforce_speed[False-backward] | 3.9948ms | 3.5510ms | 281.6120 Ops/s | 294.4175 Ops/s | |
| test_reinforce_speed[True-None] | 1.4858ms | 1.3772ms | 726.1331 Ops/s | 731.6897 Ops/s | |
| test_reinforce_speed[True-backward] | 3.5383ms | 3.1997ms | 312.5336 Ops/s | 332.8618 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 17.1402ms | 9.5451ms | 104.7661 Ops/s | 110.9270 Ops/s | |
| test_iql_speed[False-None] | 10.3441ms | 9.7303ms | 102.7719 Ops/s | 102.5863 Ops/s | |
| test_iql_speed[False-backward] | 14.2655ms | 13.6743ms | 73.1297 Ops/s | 74.5150 Ops/s | |
| test_iql_speed[True-None] | 2.4533ms | 2.2558ms | 443.2961 Ops/s | 432.4438 Ops/s | |
| test_iql_speed[True-backward] | 5.7605ms | 4.9660ms | 201.3685 Ops/s | 207.9516 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 17.2094ms | 10.5162ms | 95.0910 Ops/s | 98.1469 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.3903ms | 6.0024ms | 166.5998 Ops/s | 167.9188 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7506ms | 0.3893ms | 2.5687 KOps/s | 2.7241 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6406ms | 0.3762ms | 2.6581 KOps/s | 2.8581 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.5916ms | 5.7992ms | 172.4384 Ops/s | 172.1634 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.0889ms | 0.2841ms | 3.5193 KOps/s | 2.8235 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6952ms | 0.2650ms | 3.7737 KOps/s | 2.9823 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.4828ms | 1.2747ms | 784.5133 Ops/s | 719.3486 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6013ms | 1.1890ms | 841.0692 Ops/s | 767.8807 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 10.2071ms | 6.1152ms | 163.5268 Ops/s | 166.8742 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 2.0153ms | 0.5216ms | 1.9173 KOps/s | 1.8620 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8438ms | 0.4517ms | 2.2138 KOps/s | 1.9115 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.8997ms | 5.7897ms | 172.7216 Ops/s | 171.4806 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.3093ms | 0.3363ms | 2.9732 KOps/s | 2.6002 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5694ms | 0.3735ms | 2.6777 KOps/s | 3.6550 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.9900ms | 5.7245ms | 174.6868 Ops/s | 174.2136 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.8898ms | 0.3502ms | 2.8553 KOps/s | 3.3835 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5762ms | 0.3294ms | 3.0361 KOps/s | 3.1507 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.3014ms | 5.9685ms | 167.5465 Ops/s | 166.1863 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8301ms | 0.5386ms | 1.8565 KOps/s | 2.2332 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6839ms | 0.4503ms | 2.2205 KOps/s | 2.3269 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.9606s | 24.6514ms | 40.5656 Ops/s | 195.2357 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 3.8998ms | 1.8811ms | 531.6135 Ops/s | 534.9487 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 7.7595ms | 1.3241ms | 755.2305 Ops/s | 1.0126 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 6.6255ms | 5.0666ms | 197.3696 Ops/s | 178.5610 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.9971ms | 1.8303ms | 546.3627 Ops/s | 535.9744 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 6.7667ms | 1.2959ms | 771.6410 Ops/s | 1.0019 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 6.7388ms | 5.1807ms | 193.0241 Ops/s | 44.7407 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 13.5889ms | 2.3140ms | 432.1589 Ops/s | 482.3534 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.4270ms | 1.1256ms | 888.4480 Ops/s | 739.5860 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 41.9660ms | 39.1188ms | 25.5632 Ops/s | 25.4679 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.5407ms | 18.2151ms | 54.8996 Ops/s | 54.0815 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 44.0089ms | 39.7622ms | 25.1495 Ops/s | 24.5868 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.9725ms | 18.4942ms | 54.0711 Ops/s | 52.3686 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 46.9486ms | 42.0511ms | 23.7806 Ops/s | 23.6956 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.5972ms | 20.0924ms | 49.7701 Ops/s | 48.8793 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8953ms | 0.2229ms | 4.4863 KOps/s | 4.3290 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.7434ms | 1.4019ms | 713.3368 Ops/s | 710.5648 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.6762ms | 2.2543ms | 443.5958 Ops/s | 440.1980 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.0542ms | 2.8801ms | 347.2056 Ops/s | 346.6363 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2526ms | 0.1646ms | 6.0770 KOps/s | 5.9803 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3994ms | 0.2543ms | 3.9328 KOps/s | 4.3310 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9422ms | 1.7986ms | 555.9987 Ops/s | 555.3127 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5732ms | 1.3763ms | 726.5858 Ops/s | 790.4390 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.8593ms | 1.1489ms | 870.3707 Ops/s | 861.2764 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.7266ms | 3.5848ms | 278.9570 Ops/s | 271.9082 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 11.3721ms | 5.7802ms | 173.0053 Ops/s | 176.5960 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 15.0907ms | 7.1134ms | 140.5807 Ops/s | 143.5128 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4380ms | 0.2754ms | 3.6317 KOps/s | 3.4807 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6718ms | 1.5038ms | 664.9724 Ops/s | 664.4669 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.5946ms | 2.4039ms | 415.9880 Ops/s | 417.4639 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3848ms | 3.1113ms | 321.4081 Ops/s | 321.6165 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 34.5658ms | 33.0216ms | 30.2832 Ops/s | 29.9781 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 65.0397ms | 64.6868ms | 15.4591 Ops/s | 15.3127 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 38.3335ms | 37.6584ms | 26.5545 Ops/s | 26.3977 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 96.9552ms | 75.5378ms | 13.2384 Ops/s | 13.5347 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 56.0094ms | 55.6320ms | 17.9753 Ops/s | 17.9927 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1111s | 0.1109s | 9.0200 Ops/s | 9.0193 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 58.0584ms | 57.6895ms | 17.3342 Ops/s | 17.3643 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1153s | 0.1148s | 8.7093 Ops/s | 8.7474 Ops/s |
Stack from ghstack (oldest at bottom):
with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com