Skip to content

Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API#3556

Open
vmoens wants to merge 1 commit intogh/vmoens/240/basefrom
gh/vmoens/240/head
Open

Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API#3556
vmoens wants to merge 1 commit intogh/vmoens/240/basefrom
gh/vmoens/240/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 21, 2026

Stack from ghstack (oldest at bottom):


  • Replace manual stateless_init_process_group + collective_rpc("update_weight")
    with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
  • Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
  • Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
  • Fix LoRA weight extraction (merge_and_unload before state_dict)
  • Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
  • Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
  • Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 21, 2026
…g API

- Replace manual stateless_init_process_group + collective_rpc("update_weight")
  with vLLM's native WeightTransferConfig/NCCLWeightTransferEngine API
- Fix VLLM_USE_V1 env var removal (V1 always on in 0.17+)
- Fix NCCL weight sync deadlock by dispatching worker RPCs before trainer joins
- Fix LoRA weight extraction (merge_and_unload before state_dict)
- Fix weight transfer KeyError by using HF model directly (not TransformersWrapper)
- Fix prompt_logprobs length mismatch in _RequestOutput_tc for V1 engine
- Auto-propagate WANDB_API_KEY, HF_TOKEN, HF_HOME to Ray workers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghstack-source-id: 1a2d958
Pull-Request: #3556
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 21, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3556

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Unrelated Failure

As of commit f1a6f7b with merge base 4e2e787 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function
[Quality] Quality [Quality] Fix typos and add codespell

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions github-actions bot added llm/ LLM-related PR, triggers LLM CI tests sota-implementations/ Modules WeightUpdate labels Mar 21, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 21, 2026
@github-actions
Copy link
Contributor

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Fix vLLM >= 0.17 compatibility: migrate to native WeightTransferConfig API

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function
[Quality] Quality [Quality] Fix typos and add codespell

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.7745μs 83.4153μs 11.9882 KOps/s 12.4565 KOps/s $\color{#d91a1a}-3.76\%$
test_tensor_to_bytestream_speed[torch.save] 0.1443ms 0.1418ms 7.0509 KOps/s 7.1844 KOps/s $\color{#d91a1a}-1.86\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1045s 0.1044s 9.5776 Ops/s 8.3270 Ops/s $\textbf{\color{#35bf28}+15.02\%}$
test_tensor_to_bytestream_speed[numpy] 2.6318μs 2.6257μs 380.8567 KOps/s 408.2107 KOps/s $\textbf{\color{#d91a1a}-6.70\%}$
test_tensor_to_bytestream_speed[safetensors] 37.4488μs 37.1111μs 26.9462 KOps/s 25.6734 KOps/s $\color{#35bf28}+4.96\%$
test_simple 0.7963s 0.7945s 1.2586 Ops/s 1.2220 Ops/s $\color{#35bf28}+3.00\%$
test_transformed 1.3875s 1.3863s 0.7214 Ops/s 0.7055 Ops/s $\color{#35bf28}+2.24\%$
test_serial 2.3219s 2.3199s 0.4311 Ops/s 0.4250 Ops/s $\color{#35bf28}+1.42\%$
test_parallel 1.9119s 1.8176s 0.5502 Ops/s 0.5602 Ops/s $\color{#d91a1a}-1.80\%$
test_step_mdp_speed[True-True-True-True-True] 0.3828ms 42.2412μs 23.6736 KOps/s 24.3559 KOps/s $\color{#d91a1a}-2.80\%$
test_step_mdp_speed[True-True-True-True-False] 53.5410μs 23.0973μs 43.2951 KOps/s 43.9947 KOps/s $\color{#d91a1a}-1.59\%$
test_step_mdp_speed[True-True-True-False-True] 52.6810μs 23.9984μs 41.6694 KOps/s 42.9842 KOps/s $\color{#d91a1a}-3.06\%$
test_step_mdp_speed[True-True-True-False-False] 40.2310μs 12.7556μs 78.3972 KOps/s 79.3294 KOps/s $\color{#d91a1a}-1.18\%$
test_step_mdp_speed[True-True-False-True-True] 87.3320μs 44.6420μs 22.4004 KOps/s 22.6172 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[True-True-False-True-False] 99.6720μs 25.3218μs 39.4917 KOps/s 39.8397 KOps/s $\color{#d91a1a}-0.87\%$
test_step_mdp_speed[True-True-False-False-True] 0.1704ms 25.8028μs 38.7555 KOps/s 38.4048 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[True-True-False-False-False] 37.8900μs 15.2955μs 65.3789 KOps/s 65.2202 KOps/s $\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-False-True-True-True] 95.6110μs 46.4071μs 21.5484 KOps/s 21.2326 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[True-False-True-True-False] 57.8010μs 27.9194μs 35.8173 KOps/s 35.8963 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[True-False-True-False-True] 59.6910μs 26.2512μs 38.0935 KOps/s 38.4047 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[True-False-True-False-False] 42.2110μs 15.2830μs 65.4322 KOps/s 66.0097 KOps/s $\color{#d91a1a}-0.87\%$
test_step_mdp_speed[True-False-False-True-True] 84.6420μs 49.3498μs 20.2635 KOps/s 20.5353 KOps/s $\color{#d91a1a}-1.32\%$
test_step_mdp_speed[True-False-False-True-False] 54.6910μs 30.6723μs 32.6027 KOps/s 32.9814 KOps/s $\color{#d91a1a}-1.15\%$
test_step_mdp_speed[True-False-False-False-True] 52.1610μs 28.6375μs 34.9193 KOps/s 35.3932 KOps/s $\color{#d91a1a}-1.34\%$
test_step_mdp_speed[True-False-False-False-False] 50.2210μs 18.0048μs 55.5406 KOps/s 56.0665 KOps/s $\color{#d91a1a}-0.94\%$
test_step_mdp_speed[False-True-True-True-True] 88.6920μs 48.0362μs 20.8176 KOps/s 21.0580 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-True-True-True-False] 51.2810μs 28.2226μs 35.4326 KOps/s 36.1634 KOps/s $\color{#d91a1a}-2.02\%$
test_step_mdp_speed[False-True-True-False-True] 2.5313ms 30.3865μs 32.9094 KOps/s 33.5250 KOps/s $\color{#d91a1a}-1.84\%$
test_step_mdp_speed[False-True-True-False-False] 56.9810μs 16.8343μs 59.4025 KOps/s 58.9124 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[False-True-False-True-True] 88.2610μs 49.0643μs 20.3814 KOps/s 20.0625 KOps/s $\color{#35bf28}+1.59\%$
test_step_mdp_speed[False-True-False-True-False] 68.0810μs 30.7046μs 32.5684 KOps/s 32.9124 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[False-True-False-False-True] 70.3220μs 31.6985μs 31.5473 KOps/s 30.8283 KOps/s $\color{#35bf28}+2.33\%$
test_step_mdp_speed[False-True-False-False-False] 48.8410μs 19.3474μs 51.6866 KOps/s 51.8173 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[False-False-True-True-True] 96.9120μs 51.7330μs 19.3300 KOps/s 19.3340 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[False-False-True-True-False] 62.2710μs 32.9416μs 30.3567 KOps/s 30.6320 KOps/s $\color{#d91a1a}-0.90\%$
test_step_mdp_speed[False-False-True-False-True] 62.8710μs 31.9813μs 31.2683 KOps/s 30.4018 KOps/s $\color{#35bf28}+2.85\%$
test_step_mdp_speed[False-False-True-False-False] 46.1110μs 19.5249μs 51.2166 KOps/s 51.9201 KOps/s $\color{#d91a1a}-1.35\%$
test_step_mdp_speed[False-False-False-True-True] 84.6310μs 53.9374μs 18.5400 KOps/s 18.6097 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[False-False-False-True-False] 67.4510μs 35.7415μs 27.9787 KOps/s 28.4094 KOps/s $\color{#d91a1a}-1.52\%$
test_step_mdp_speed[False-False-False-False-True] 68.5110μs 34.0521μs 29.3668 KOps/s 29.4856 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[False-False-False-False-False] 52.2910μs 21.7844μs 45.9044 KOps/s 46.2394 KOps/s $\color{#d91a1a}-0.72\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7267s 0.7221s 1.3849 Ops/s 1.3331 Ops/s $\color{#35bf28}+3.88\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7140s 0.6101s 1.6390 Ops/s 1.6276 Ops/s $\color{#35bf28}+0.70\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7426s 1.6479s 0.6068 Ops/s 0.6079 Ops/s $\color{#d91a1a}-0.18\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5162s 1.4302s 0.6992 Ops/s 0.6998 Ops/s $\color{#d91a1a}-0.09\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9853s 1.8995s 0.5264 Ops/s 0.5243 Ops/s $\color{#35bf28}+0.40\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7547s 1.6769s 0.5963 Ops/s 0.5931 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6473s 4.5678s 0.2189 Ops/s 0.2165 Ops/s $\color{#35bf28}+1.14\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5397s 4.4743s 0.2235 Ops/s 0.2254 Ops/s $\color{#d91a1a}-0.83\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9675s 1.8647s 0.5363 Ops/s 0.5246 Ops/s $\color{#35bf28}+2.22\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7390s 1.6064s 0.6225 Ops/s 0.6339 Ops/s $\color{#d91a1a}-1.80\%$
test_values[generalized_advantage_estimate-True-True] 21.2690ms 20.7573ms 48.1757 Ops/s 48.0337 Ops/s $\color{#35bf28}+0.30\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1315s 3.5590ms 280.9817 Ops/s 288.5059 Ops/s $\color{#d91a1a}-2.61\%$
test_values[td0_return_estimate-False-False] 0.1077ms 82.4213μs 12.1328 KOps/s 12.0235 KOps/s $\color{#35bf28}+0.91\%$
test_values[td1_return_estimate-False-False] 49.1981ms 48.7692ms 20.5048 Ops/s 20.4692 Ops/s $\color{#35bf28}+0.17\%$
test_values[vec_td1_return_estimate-False-False] 1.3618ms 1.0974ms 911.2810 Ops/s 917.4576 Ops/s $\color{#d91a1a}-0.67\%$
test_values[td_lambda_return_estimate-True-False] 80.3804ms 79.8272ms 12.5271 Ops/s 12.5672 Ops/s $\color{#d91a1a}-0.32\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2816ms 1.0957ms 912.6582 Ops/s 921.3412 Ops/s $\color{#d91a1a}-0.94\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.0763ms 20.7615ms 48.1660 Ops/s 45.6922 Ops/s $\textbf{\color{#35bf28}+5.41\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0369ms 0.7593ms 1.3171 KOps/s 1.3177 KOps/s $\color{#d91a1a}-0.05\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7777ms 0.6848ms 1.4602 KOps/s 1.4703 KOps/s $\color{#d91a1a}-0.69\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5456ms 1.4957ms 668.5795 Ops/s 671.2041 Ops/s $\color{#d91a1a}-0.39\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8243ms 0.6978ms 1.4332 KOps/s 1.4017 KOps/s $\color{#35bf28}+2.25\%$
test_dqn_speed[False-None] 1.7065ms 1.6056ms 622.8138 Ops/s 619.5534 Ops/s $\color{#35bf28}+0.53\%$
test_dqn_speed[False-backward] 2.5243ms 2.2581ms 442.8498 Ops/s 449.5003 Ops/s $\color{#d91a1a}-1.48\%$
test_dqn_speed[True-None] 1.1474ms 0.5886ms 1.6990 KOps/s 1.6599 KOps/s $\color{#35bf28}+2.36\%$
test_dqn_speed[True-backward] 1.1565ms 1.1051ms 904.8857 Ops/s 884.1143 Ops/s $\color{#35bf28}+2.35\%$
test_dqn_speed[reduce-overhead-None] 0.7692ms 0.6102ms 1.6387 KOps/s 1.6174 KOps/s $\color{#35bf28}+1.32\%$
test_ddpg_speed[False-None] 3.4341ms 3.0315ms 329.8660 Ops/s 334.4742 Ops/s $\color{#d91a1a}-1.38\%$
test_ddpg_speed[False-backward] 4.7555ms 4.3482ms 229.9781 Ops/s 226.9653 Ops/s $\color{#35bf28}+1.33\%$
test_ddpg_speed[True-None] 1.8357ms 1.3478ms 741.9592 Ops/s 745.2143 Ops/s $\color{#d91a1a}-0.44\%$
test_ddpg_speed[True-backward] 2.4365ms 2.3476ms 425.9649 Ops/s 398.9026 Ops/s $\textbf{\color{#35bf28}+6.78\%}$
test_ddpg_speed[reduce-overhead-None] 1.5206ms 1.3863ms 721.3671 Ops/s 728.7283 Ops/s $\color{#d91a1a}-1.01\%$
test_sac_speed[False-None] 8.9557ms 8.5652ms 116.7517 Ops/s 118.1132 Ops/s $\color{#d91a1a}-1.15\%$
test_sac_speed[False-backward] 12.3451ms 11.5036ms 86.9294 Ops/s 85.6685 Ops/s $\color{#35bf28}+1.47\%$
test_sac_speed[True-None] 2.2532ms 1.8691ms 535.0098 Ops/s 539.6322 Ops/s $\color{#d91a1a}-0.86\%$
test_sac_speed[True-backward] 3.7250ms 3.6263ms 275.7653 Ops/s 277.5910 Ops/s $\color{#d91a1a}-0.66\%$
test_sac_speed[reduce-overhead-None] 16.7048ms 10.5867ms 94.4581 Ops/s 96.4204 Ops/s $\color{#d91a1a}-2.04\%$
test_redq_deprec_speed[False-None] 10.3676ms 9.5010ms 105.2524 Ops/s 104.5580 Ops/s $\color{#35bf28}+0.66\%$
test_redq_deprec_speed[False-backward] 13.2590ms 12.8208ms 77.9981 Ops/s 78.3113 Ops/s $\color{#d91a1a}-0.40\%$
test_redq_deprec_speed[True-None] 2.7038ms 2.5549ms 391.4021 Ops/s 370.6978 Ops/s $\textbf{\color{#35bf28}+5.59\%}$
test_redq_deprec_speed[True-backward] 4.6343ms 4.2242ms 236.7334 Ops/s 235.8948 Ops/s $\color{#35bf28}+0.36\%$
test_redq_deprec_speed[reduce-overhead-None] 14.7800ms 9.6912ms 103.1865 Ops/s 102.3891 Ops/s $\color{#35bf28}+0.78\%$
test_td3_speed[False-None] 8.4513ms 8.3367ms 119.9509 Ops/s 119.2635 Ops/s $\color{#35bf28}+0.58\%$
test_td3_speed[False-backward] 11.4108ms 10.9571ms 91.2650 Ops/s 90.3033 Ops/s $\color{#35bf28}+1.06\%$
test_td3_speed[True-None] 1.6393ms 1.6188ms 617.7249 Ops/s 616.7764 Ops/s $\color{#35bf28}+0.15\%$
test_td3_speed[True-backward] 3.5756ms 3.1174ms 320.7846 Ops/s 332.5152 Ops/s $\color{#d91a1a}-3.53\%$
test_td3_speed[reduce-overhead-None] 84.6411ms 26.0669ms 38.3628 Ops/s 38.1620 Ops/s $\color{#35bf28}+0.53\%$
test_cql_speed[False-None] 18.1758ms 17.7220ms 56.4271 Ops/s 56.0658 Ops/s $\color{#35bf28}+0.64\%$
test_cql_speed[False-backward] 23.8950ms 23.3687ms 42.7923 Ops/s 43.3594 Ops/s $\color{#d91a1a}-1.31\%$
test_cql_speed[True-None] 3.3988ms 3.3093ms 302.1796 Ops/s 302.6918 Ops/s $\color{#d91a1a}-0.17\%$
test_cql_speed[True-backward] 6.0478ms 5.5641ms 179.7246 Ops/s 177.7661 Ops/s $\color{#35bf28}+1.10\%$
test_cql_speed[reduce-overhead-None] 0.8438s 17.4792ms 57.2109 Ops/s 82.0797 Ops/s $\textbf{\color{#d91a1a}-30.30\%}$
test_a2c_speed[False-None] 3.5066ms 3.3692ms 296.8088 Ops/s 297.5482 Ops/s $\color{#d91a1a}-0.25\%$
test_a2c_speed[False-backward] 6.9524ms 6.5371ms 152.9731 Ops/s 154.1938 Ops/s $\color{#d91a1a}-0.79\%$
test_a2c_speed[True-None] 1.5514ms 1.4035ms 712.5155 Ops/s 726.2850 Ops/s $\color{#d91a1a}-1.90\%$
test_a2c_speed[True-backward] 3.2082ms 3.1559ms 316.8629 Ops/s 315.7619 Ops/s $\color{#35bf28}+0.35\%$
test_a2c_speed[reduce-overhead-None] 1.0963ms 1.0388ms 962.6453 Ops/s 961.8308 Ops/s $\color{#35bf28}+0.08\%$
test_ppo_speed[False-None] 4.1666ms 3.9963ms 250.2333 Ops/s 249.5948 Ops/s $\color{#35bf28}+0.26\%$
test_ppo_speed[False-backward] 7.7311ms 7.3085ms 136.8273 Ops/s 135.9632 Ops/s $\color{#35bf28}+0.64\%$
test_ppo_speed[True-None] 1.6090ms 1.5130ms 660.9312 Ops/s 663.8760 Ops/s $\color{#d91a1a}-0.44\%$
test_ppo_speed[True-backward] 3.3392ms 3.2921ms 303.7548 Ops/s 316.3064 Ops/s $\color{#d91a1a}-3.97\%$
test_ppo_speed[reduce-overhead-None] 1.2102ms 1.0976ms 911.0559 Ops/s 898.8570 Ops/s $\color{#35bf28}+1.36\%$
test_reinforce_speed[False-None] 2.5416ms 2.4088ms 415.1505 Ops/s 415.3324 Ops/s $\color{#d91a1a}-0.04\%$
test_reinforce_speed[False-backward] 3.9948ms 3.5510ms 281.6120 Ops/s 294.4175 Ops/s $\color{#d91a1a}-4.35\%$
test_reinforce_speed[True-None] 1.4858ms 1.3772ms 726.1331 Ops/s 731.6897 Ops/s $\color{#d91a1a}-0.76\%$
test_reinforce_speed[True-backward] 3.5383ms 3.1997ms 312.5336 Ops/s 332.8618 Ops/s $\textbf{\color{#d91a1a}-6.11\%}$
test_reinforce_speed[reduce-overhead-None] 17.1402ms 9.5451ms 104.7661 Ops/s 110.9270 Ops/s $\textbf{\color{#d91a1a}-5.55\%}$
test_iql_speed[False-None] 10.3441ms 9.7303ms 102.7719 Ops/s 102.5863 Ops/s $\color{#35bf28}+0.18\%$
test_iql_speed[False-backward] 14.2655ms 13.6743ms 73.1297 Ops/s 74.5150 Ops/s $\color{#d91a1a}-1.86\%$
test_iql_speed[True-None] 2.4533ms 2.2558ms 443.2961 Ops/s 432.4438 Ops/s $\color{#35bf28}+2.51\%$
test_iql_speed[True-backward] 5.7605ms 4.9660ms 201.3685 Ops/s 207.9516 Ops/s $\color{#d91a1a}-3.17\%$
test_iql_speed[reduce-overhead-None] 17.2094ms 10.5162ms 95.0910 Ops/s 98.1469 Ops/s $\color{#d91a1a}-3.11\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3903ms 6.0024ms 166.5998 Ops/s 167.9188 Ops/s $\color{#d91a1a}-0.79\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7506ms 0.3893ms 2.5687 KOps/s 2.7241 KOps/s $\textbf{\color{#d91a1a}-5.70\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6406ms 0.3762ms 2.6581 KOps/s 2.8581 KOps/s $\textbf{\color{#d91a1a}-7.00\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.5916ms 5.7992ms 172.4384 Ops/s 172.1634 Ops/s $\color{#35bf28}+0.16\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0889ms 0.2841ms 3.5193 KOps/s 2.8235 KOps/s $\textbf{\color{#35bf28}+24.64\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6952ms 0.2650ms 3.7737 KOps/s 2.9823 KOps/s $\textbf{\color{#35bf28}+26.54\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4828ms 1.2747ms 784.5133 Ops/s 719.3486 Ops/s $\textbf{\color{#35bf28}+9.06\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6013ms 1.1890ms 841.0692 Ops/s 767.8807 Ops/s $\textbf{\color{#35bf28}+9.53\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.2071ms 6.1152ms 163.5268 Ops/s 166.8742 Ops/s $\color{#d91a1a}-2.01\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0153ms 0.5216ms 1.9173 KOps/s 1.8620 KOps/s $\color{#35bf28}+2.97\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8438ms 0.4517ms 2.2138 KOps/s 1.9115 KOps/s $\textbf{\color{#35bf28}+15.82\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.8997ms 5.7897ms 172.7216 Ops/s 171.4806 Ops/s $\color{#35bf28}+0.72\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.3093ms 0.3363ms 2.9732 KOps/s 2.6002 KOps/s $\textbf{\color{#35bf28}+14.35\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5694ms 0.3735ms 2.6777 KOps/s 3.6550 KOps/s $\textbf{\color{#d91a1a}-26.74\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9900ms 5.7245ms 174.6868 Ops/s 174.2136 Ops/s $\color{#35bf28}+0.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8898ms 0.3502ms 2.8553 KOps/s 3.3835 KOps/s $\textbf{\color{#d91a1a}-15.61\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5762ms 0.3294ms 3.0361 KOps/s 3.1507 KOps/s $\color{#d91a1a}-3.64\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.3014ms 5.9685ms 167.5465 Ops/s 166.1863 Ops/s $\color{#35bf28}+0.82\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8301ms 0.5386ms 1.8565 KOps/s 2.2332 KOps/s $\textbf{\color{#d91a1a}-16.87\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6839ms 0.4503ms 2.2205 KOps/s 2.3269 KOps/s $\color{#d91a1a}-4.57\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.9606s 24.6514ms 40.5656 Ops/s 195.2357 Ops/s $\textbf{\color{#d91a1a}-79.22\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.8998ms 1.8811ms 531.6135 Ops/s 534.9487 Ops/s $\color{#d91a1a}-0.62\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.7595ms 1.3241ms 755.2305 Ops/s 1.0126 KOps/s $\textbf{\color{#d91a1a}-25.42\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.6255ms 5.0666ms 197.3696 Ops/s 178.5610 Ops/s $\textbf{\color{#35bf28}+10.53\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9971ms 1.8303ms 546.3627 Ops/s 535.9744 Ops/s $\color{#35bf28}+1.94\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.7667ms 1.2959ms 771.6410 Ops/s 1.0019 KOps/s $\textbf{\color{#d91a1a}-22.98\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.7388ms 5.1807ms 193.0241 Ops/s 44.7407 Ops/s $\textbf{\color{#35bf28}+331.43\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 13.5889ms 2.3140ms 432.1589 Ops/s 482.3534 Ops/s $\textbf{\color{#d91a1a}-10.41\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.4270ms 1.1256ms 888.4480 Ops/s 739.5860 Ops/s $\textbf{\color{#35bf28}+20.13\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 41.9660ms 39.1188ms 25.5632 Ops/s 25.4679 Ops/s $\color{#35bf28}+0.37\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.5407ms 18.2151ms 54.8996 Ops/s 54.0815 Ops/s $\color{#35bf28}+1.51\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.0089ms 39.7622ms 25.1495 Ops/s 24.5868 Ops/s $\color{#35bf28}+2.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.9725ms 18.4942ms 54.0711 Ops/s 52.3686 Ops/s $\color{#35bf28}+3.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 46.9486ms 42.0511ms 23.7806 Ops/s 23.6956 Ops/s $\color{#35bf28}+0.36\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.5972ms 20.0924ms 49.7701 Ops/s 48.8793 Ops/s $\color{#35bf28}+1.82\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8953ms 0.2229ms 4.4863 KOps/s 4.3290 KOps/s $\color{#35bf28}+3.63\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.7434ms 1.4019ms 713.3368 Ops/s 710.5648 Ops/s $\color{#35bf28}+0.39\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6762ms 2.2543ms 443.5958 Ops/s 440.1980 Ops/s $\color{#35bf28}+0.77\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0542ms 2.8801ms 347.2056 Ops/s 346.6363 Ops/s $\color{#35bf28}+0.16\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2526ms 0.1646ms 6.0770 KOps/s 5.9803 KOps/s $\color{#35bf28}+1.62\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3994ms 0.2543ms 3.9328 KOps/s 4.3310 KOps/s $\textbf{\color{#d91a1a}-9.19\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9422ms 1.7986ms 555.9987 Ops/s 555.3127 Ops/s $\color{#35bf28}+0.12\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5732ms 1.3763ms 726.5858 Ops/s 790.4390 Ops/s $\textbf{\color{#d91a1a}-8.08\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.8593ms 1.1489ms 870.3707 Ops/s 861.2764 Ops/s $\color{#35bf28}+1.06\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7266ms 3.5848ms 278.9570 Ops/s 271.9082 Ops/s $\color{#35bf28}+2.59\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.3721ms 5.7802ms 173.0053 Ops/s 176.5960 Ops/s $\color{#d91a1a}-2.03\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.0907ms 7.1134ms 140.5807 Ops/s 143.5128 Ops/s $\color{#d91a1a}-2.04\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4380ms 0.2754ms 3.6317 KOps/s 3.4807 KOps/s $\color{#35bf28}+4.34\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6718ms 1.5038ms 664.9724 Ops/s 664.4669 Ops/s $\color{#35bf28}+0.08\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.5946ms 2.4039ms 415.9880 Ops/s 417.4639 Ops/s $\color{#d91a1a}-0.35\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3848ms 3.1113ms 321.4081 Ops/s 321.6165 Ops/s $\color{#d91a1a}-0.06\%$
test_collector_without_rb[100-img_shape0-atari] 34.5658ms 33.0216ms 30.2832 Ops/s 29.9781 Ops/s $\color{#35bf28}+1.02\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.0397ms 64.6868ms 15.4591 Ops/s 15.3127 Ops/s $\color{#35bf28}+0.96\%$
test_collector_with_rb[100-img_shape0-atari] 38.3335ms 37.6584ms 26.5545 Ops/s 26.3977 Ops/s $\color{#35bf28}+0.59\%$
test_collector_with_rb[200-img_shape1-large_batch] 96.9552ms 75.5378ms 13.2384 Ops/s 13.5347 Ops/s $\color{#d91a1a}-2.19\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 56.0094ms 55.6320ms 17.9753 Ops/s 17.9927 Ops/s $\color{#d91a1a}-0.10\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1111s 0.1109s 9.0200 Ops/s 9.0193 Ops/s $+0.01\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 58.0584ms 57.6895ms 17.3342 Ops/s 17.3643 Ops/s $\color{#d91a1a}-0.17\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1153s 0.1148s 8.7093 Ops/s 8.7474 Ops/s $\color{#d91a1a}-0.44\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. llm/ LLM-related PR, triggers LLM CI tests Modules sota-implementations/ WeightUpdate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant