Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Jan 19, 2026

Summary

  • Ignore test_llm.py in benchmark workflows (downloads Qwen models from HuggingFace which is not suitable for CI benchmarks)
  • Skip reduce-overhead + backward test combinations which cause segfaults due to torch.compile CUDA pool allocator issues

Test plan

  • The GPU benchmark CI should now pass without the test_llm.py errors and segfaults
  • The skipped tests will be marked as SKIPPED rather than causing crashes

- Ignore test_llm.py in benchmark workflows (downloads models which is
  not suitable for CI)
- Skip reduce-overhead + backward test combinations which cause
  segfaults due to torch.compile CUDA pool allocator issues
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 19, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3347

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 15 Pending

As of commit a9b6bad with merge base a7e4b69 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 19, 2026
@vmoens vmoens added bug Something isn't working CI Has to do with CI setup (e.g. wheels & builds, tests...) labels Jan 19, 2026
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 153. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.3684μs 78.3870μs 12.7572 KOps/s 12.7899 KOps/s $\color{#d91a1a}-0.26\%$
test_tensor_to_bytestream_speed[torch.save] 0.1367ms 0.1359ms 7.3609 KOps/s 7.3524 KOps/s $\color{#35bf28}+0.12\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1011s 0.1009s 9.9074 Ops/s 9.5740 Ops/s $\color{#35bf28}+3.48\%$
test_tensor_to_bytestream_speed[numpy] 2.4414μs 2.4269μs 412.0448 KOps/s 412.8870 KOps/s $\color{#d91a1a}-0.20\%$
test_tensor_to_bytestream_speed[safetensors] 36.8024μs 36.5372μs 27.3694 KOps/s 26.1575 KOps/s $\color{#35bf28}+4.63\%$
test_simple 0.6420s 0.5538s 1.8058 Ops/s 1.8016 Ops/s $\color{#35bf28}+0.23\%$
test_transformed 1.2020s 1.1132s 0.8983 Ops/s 0.8957 Ops/s $\color{#35bf28}+0.29\%$
test_serial 1.6152s 1.6136s 0.6197 Ops/s 0.5998 Ops/s $\color{#35bf28}+3.32\%$
test_parallel 1.1579s 1.0823s 0.9239 Ops/s 0.9473 Ops/s $\color{#d91a1a}-2.47\%$
test_step_mdp_speed[True-True-True-True-True] 0.2962ms 43.7043μs 22.8810 KOps/s 23.1934 KOps/s $\color{#d91a1a}-1.35\%$
test_step_mdp_speed[True-True-True-True-False] 56.7110μs 24.6780μs 40.5220 KOps/s 41.4840 KOps/s $\color{#d91a1a}-2.32\%$
test_step_mdp_speed[True-True-True-False-True] 55.7710μs 24.0691μs 41.5471 KOps/s 41.1160 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[True-True-True-False-False] 43.5700μs 13.4973μs 74.0889 KOps/s 75.6162 KOps/s $\color{#d91a1a}-2.02\%$
test_step_mdp_speed[True-True-False-True-True] 80.3220μs 46.4003μs 21.5516 KOps/s 21.3453 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-True-False-True-False] 66.9910μs 26.6774μs 37.4850 KOps/s 37.6685 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[True-True-False-False-True] 69.3910μs 26.9927μs 37.0471 KOps/s 36.3738 KOps/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[True-True-False-False-False] 41.7010μs 16.0691μs 62.2314 KOps/s 62.2358 KOps/s $-0.01\%$
test_step_mdp_speed[True-False-True-True-True] 83.3110μs 49.7588μs 20.0970 KOps/s 20.2143 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[True-False-True-True-False] 58.1910μs 29.6286μs 33.7512 KOps/s 34.2072 KOps/s $\color{#d91a1a}-1.33\%$
test_step_mdp_speed[True-False-True-False-True] 51.7210μs 26.6412μs 37.5358 KOps/s 37.0920 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[True-False-True-False-False] 45.3910μs 16.0982μs 62.1186 KOps/s 63.1402 KOps/s $\color{#d91a1a}-1.62\%$
test_step_mdp_speed[True-False-False-True-True] 0.1130ms 50.9821μs 19.6147 KOps/s 19.4336 KOps/s $\color{#35bf28}+0.93\%$
test_step_mdp_speed[True-False-False-True-False] 68.1120μs 31.9493μs 31.2996 KOps/s 31.6314 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[True-False-False-False-True] 72.6020μs 29.0699μs 34.3998 KOps/s 34.2564 KOps/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[True-False-False-False-False] 59.2410μs 18.5360μs 53.9490 KOps/s 55.1971 KOps/s $\color{#d91a1a}-2.26\%$
test_step_mdp_speed[False-True-True-True-True] 81.7220μs 49.1199μs 20.3584 KOps/s 20.6055 KOps/s $\color{#d91a1a}-1.20\%$
test_step_mdp_speed[False-True-True-True-False] 60.8720μs 29.1354μs 34.3225 KOps/s 34.7999 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-True-True-False-True] 60.6620μs 30.2268μs 33.0833 KOps/s 32.5789 KOps/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[False-True-True-False-False] 42.9100μs 17.4705μs 57.2393 KOps/s 57.1411 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[False-True-False-True-True] 2.8266ms 51.8057μs 19.3029 KOps/s 19.7842 KOps/s $\color{#d91a1a}-2.43\%$
test_step_mdp_speed[False-True-False-True-False] 67.9510μs 32.0585μs 31.1930 KOps/s 31.6878 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[False-True-False-False-True] 75.4210μs 33.0275μs 30.2778 KOps/s 30.8347 KOps/s $\color{#d91a1a}-1.81\%$
test_step_mdp_speed[False-True-False-False-False] 0.1129ms 20.0678μs 49.8311 KOps/s 50.6412 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-False-True-True-True] 81.7520μs 53.9175μs 18.5469 KOps/s 18.6979 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[False-False-True-True-False] 72.9820μs 34.3627μs 29.1013 KOps/s 28.9160 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[False-False-True-False-True] 74.9920μs 33.1197μs 30.1935 KOps/s 30.3349 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-True-False-False] 50.1110μs 20.1075μs 49.7327 KOps/s 49.7484 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[False-False-False-True-True] 0.1042ms 55.1630μs 18.1281 KOps/s 18.0394 KOps/s $\color{#35bf28}+0.49\%$
test_step_mdp_speed[False-False-False-True-False] 80.1210μs 36.8470μs 27.1392 KOps/s 27.1697 KOps/s $\color{#d91a1a}-0.11\%$
test_step_mdp_speed[False-False-False-False-True] 63.7810μs 34.7120μs 28.8085 KOps/s 28.9517 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[False-False-False-False-False] 52.3210μs 22.4478μs 44.5477 KOps/s 44.5703 KOps/s $\color{#d91a1a}-0.05\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8418s 0.7490s 1.3350 Ops/s 1.3604 Ops/s $\color{#d91a1a}-1.87\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7095s 0.6145s 1.6275 Ops/s 1.6368 Ops/s $\color{#d91a1a}-0.57\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7046s 1.6302s 0.6134 Ops/s 0.6141 Ops/s $\color{#d91a1a}-0.11\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4963s 1.4213s 0.7036 Ops/s 0.7086 Ops/s $\color{#d91a1a}-0.71\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9568s 1.8709s 0.5345 Ops/s 0.5336 Ops/s $\color{#35bf28}+0.17\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7356s 1.6635s 0.6011 Ops/s 0.6047 Ops/s $\color{#d91a1a}-0.59\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6969s 4.5672s 0.2190 Ops/s 0.2169 Ops/s $\color{#35bf28}+0.93\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5329s 4.4090s 0.2268 Ops/s 0.2284 Ops/s $\color{#d91a1a}-0.69\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.1226s 1.9412s 0.5152 Ops/s 0.5211 Ops/s $\color{#d91a1a}-1.14\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7116s 1.6344s 0.6118 Ops/s 0.6117 Ops/s $\color{#35bf28}+0.02\%$
test_values[generalized_advantage_estimate-True-True] 10.0904ms 9.6392ms 103.7433 Ops/s 103.7470 Ops/s $-0.00\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.4847ms 18.0409ms 55.4296 Ops/s 87.1845 Ops/s $\textbf{\color{#d91a1a}-36.42\%}$
test_values[td0_return_estimate-False-False] 0.2447ms 0.1284ms 7.7861 KOps/s 8.0026 KOps/s $\color{#d91a1a}-2.71\%$
test_values[td1_return_estimate-False-False] 25.8982ms 25.5428ms 39.1500 Ops/s 39.3461 Ops/s $\color{#d91a1a}-0.50\%$
test_values[vec_td1_return_estimate-False-False] 18.3691ms 18.0031ms 55.5460 Ops/s 87.2408 Ops/s $\textbf{\color{#d91a1a}-36.33\%}$
test_values[td_lambda_return_estimate-True-False] 38.3509ms 37.5066ms 26.6619 Ops/s 26.6240 Ops/s $\color{#35bf28}+0.14\%$
test_values[vec_td_lambda_return_estimate-True-False] 20.2606ms 18.3109ms 54.6122 Ops/s 87.4322 Ops/s $\textbf{\color{#d91a1a}-37.54\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.5131ms 8.3835ms 119.2815 Ops/s 118.0889 Ops/s $\color{#35bf28}+1.01\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.6933ms 1.4977ms 667.6694 Ops/s 664.6017 Ops/s $\color{#35bf28}+0.46\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4801ms 0.4045ms 2.4720 KOps/s 2.4854 KOps/s $\color{#d91a1a}-0.54\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.0342ms 30.8302ms 32.4357 Ops/s 52.7094 Ops/s $\textbf{\color{#d91a1a}-38.46\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8249ms 1.7176ms 582.2035 Ops/s 583.6819 Ops/s $\color{#d91a1a}-0.25\%$
test_dqn_speed[False-None] 1.5825ms 1.3477ms 742.0145 Ops/s 738.9007 Ops/s $\color{#35bf28}+0.42\%$
test_dqn_speed[False-backward] 1.9987ms 1.8769ms 532.7934 Ops/s 538.7475 Ops/s $\color{#d91a1a}-1.11\%$
test_dqn_speed[True-None] 0.6913ms 0.5270ms 1.8974 KOps/s 1.9350 KOps/s $\color{#d91a1a}-1.94\%$
test_dqn_speed[True-backward] 1.0431ms 0.9647ms 1.0366 KOps/s 922.8514 Ops/s $\textbf{\color{#35bf28}+12.32\%}$
test_dqn_speed[reduce-overhead-None] 0.5941ms 0.5103ms 1.9597 KOps/s 1.8710 KOps/s $\color{#35bf28}+4.74\%$
test_ddpg_speed[False-None] 3.0233ms 2.7592ms 362.4298 Ops/s 364.8385 Ops/s $\color{#d91a1a}-0.66\%$
test_ddpg_speed[False-backward] 4.0745ms 3.9589ms 252.5943 Ops/s 254.5332 Ops/s $\color{#d91a1a}-0.76\%$
test_ddpg_speed[True-None] 1.4841ms 1.3635ms 733.3855 Ops/s 713.4390 Ops/s $\color{#35bf28}+2.80\%$
test_ddpg_speed[True-backward] 2.3991ms 2.3472ms 426.0344 Ops/s 386.2531 Ops/s $\textbf{\color{#35bf28}+10.30\%}$
test_ddpg_speed[reduce-overhead-None] 1.4894ms 1.3524ms 739.4267 Ops/s 735.8682 Ops/s $\color{#35bf28}+0.48\%$
test_sac_speed[False-None] 8.3322ms 7.7953ms 128.2831 Ops/s 127.9239 Ops/s $\color{#35bf28}+0.28\%$
test_sac_speed[False-backward] 11.1650ms 10.9740ms 91.1242 Ops/s 90.6399 Ops/s $\color{#35bf28}+0.53\%$
test_sac_speed[True-None] 2.2575ms 2.1169ms 472.3807 Ops/s 463.5934 Ops/s $\color{#35bf28}+1.90\%$
test_sac_speed[True-backward] 4.1498ms 4.0059ms 249.6333 Ops/s 248.0142 Ops/s $\color{#35bf28}+0.65\%$
test_sac_speed[reduce-overhead-None] 2.3079ms 2.0880ms 478.9222 Ops/s 466.9217 Ops/s $\color{#35bf28}+2.57\%$
test_redq_speed[False-None] 15.2894ms 10.3622ms 96.5042 Ops/s 97.7759 Ops/s $\color{#d91a1a}-1.30\%$
test_redq_speed[False-backward] 17.8147ms 17.1261ms 58.3904 Ops/s 56.3173 Ops/s $\color{#35bf28}+3.68\%$
test_redq_speed[True-None] 4.7417ms 4.4788ms 223.2763 Ops/s 220.1933 Ops/s $\color{#35bf28}+1.40\%$
test_redq_speed[True-backward] 9.9995ms 9.6371ms 103.7655 Ops/s 104.2880 Ops/s $\color{#d91a1a}-0.50\%$
test_redq_speed[reduce-overhead-None] 4.6709ms 4.4541ms 224.5146 Ops/s 220.2449 Ops/s $\color{#35bf28}+1.94\%$
test_redq_deprec_speed[False-None] 11.5355ms 10.9346ms 91.4529 Ops/s 93.8910 Ops/s $\color{#d91a1a}-2.60\%$
test_redq_deprec_speed[False-backward] 16.1417ms 15.7859ms 63.3479 Ops/s 65.0519 Ops/s $\color{#d91a1a}-2.62\%$
test_redq_deprec_speed[True-None] 3.9020ms 3.7011ms 270.1903 Ops/s 266.3421 Ops/s $\color{#35bf28}+1.44\%$
test_redq_deprec_speed[True-backward] 8.1308ms 7.8639ms 127.1627 Ops/s 126.8080 Ops/s $\color{#35bf28}+0.28\%$
test_redq_deprec_speed[reduce-overhead-None] 3.8274ms 3.6855ms 271.3357 Ops/s 281.4211 Ops/s $\color{#d91a1a}-3.58\%$
test_td3_speed[False-None] 8.1187ms 7.7564ms 128.9259 Ops/s 128.2313 Ops/s $\color{#35bf28}+0.54\%$
test_td3_speed[False-backward] 10.9295ms 10.6159ms 94.1987 Ops/s 94.0771 Ops/s $\color{#35bf28}+0.13\%$
test_td3_speed[True-None] 1.9209ms 1.8053ms 553.9205 Ops/s 547.4127 Ops/s $\color{#35bf28}+1.19\%$
test_td3_speed[True-backward] 3.6980ms 3.5985ms 277.8973 Ops/s 259.3278 Ops/s $\textbf{\color{#35bf28}+7.16\%}$
test_td3_speed[reduce-overhead-None] 1.8084ms 1.7715ms 564.4931 Ops/s 558.9315 Ops/s $\color{#35bf28}+1.00\%$
test_cql_speed[False-None] 28.1497ms 25.5467ms 39.1440 Ops/s 39.7274 Ops/s $\color{#d91a1a}-1.47\%$
test_cql_speed[False-backward] 38.1276ms 34.6778ms 28.8369 Ops/s 29.0658 Ops/s $\color{#d91a1a}-0.79\%$
test_cql_speed[True-None] 15.2574ms 12.3339ms 81.0775 Ops/s 80.5211 Ops/s $\color{#35bf28}+0.69\%$
test_cql_speed[True-backward] 18.2206ms 17.8507ms 56.0202 Ops/s 55.8905 Ops/s $\color{#35bf28}+0.23\%$
test_cql_speed[reduce-overhead-None] 12.5935ms 12.2904ms 81.3643 Ops/s 80.8453 Ops/s $\color{#35bf28}+0.64\%$
test_a2c_speed[False-None] 5.6428ms 5.3862ms 185.6600 Ops/s 189.4827 Ops/s $\color{#d91a1a}-2.02\%$
test_a2c_speed[False-backward] 12.0185ms 11.7618ms 85.0209 Ops/s 85.5678 Ops/s $\color{#d91a1a}-0.64\%$
test_a2c_speed[True-None] 3.9244ms 3.7323ms 267.9292 Ops/s 270.1015 Ops/s $\color{#d91a1a}-0.80\%$
test_a2c_speed[True-backward] 8.9199ms 8.5538ms 116.9074 Ops/s 114.2244 Ops/s $\color{#35bf28}+2.35\%$
test_a2c_speed[reduce-overhead-None] 3.8471ms 3.7091ms 269.6082 Ops/s 271.1929 Ops/s $\color{#d91a1a}-0.58\%$
test_ppo_speed[False-None] 6.5226ms 5.8844ms 169.9411 Ops/s 171.8638 Ops/s $\color{#d91a1a}-1.12\%$
test_ppo_speed[False-backward] 12.8438ms 12.3913ms 80.7020 Ops/s 80.9857 Ops/s $\color{#d91a1a}-0.35\%$
test_ppo_speed[True-None] 3.8812ms 3.6187ms 276.3418 Ops/s 278.0913 Ops/s $\color{#d91a1a}-0.63\%$
test_ppo_speed[True-backward] 8.5288ms 8.3647ms 119.5495 Ops/s 118.7740 Ops/s $\color{#35bf28}+0.65\%$
test_ppo_speed[reduce-overhead-None] 3.7158ms 3.5650ms 280.5026 Ops/s 280.0398 Ops/s $\color{#35bf28}+0.17\%$
test_reinforce_speed[False-None] 4.9051ms 4.5225ms 221.1188 Ops/s 221.3967 Ops/s $\color{#d91a1a}-0.13\%$
test_reinforce_speed[False-backward] 7.6331ms 7.2951ms 137.0780 Ops/s 137.8708 Ops/s $\color{#d91a1a}-0.58\%$
test_reinforce_speed[True-None] 3.0526ms 2.8518ms 350.6529 Ops/s 338.7216 Ops/s $\color{#35bf28}+3.52\%$
test_reinforce_speed[True-backward] 7.9880ms 7.7826ms 128.4912 Ops/s 122.1860 Ops/s $\textbf{\color{#35bf28}+5.16\%}$
test_reinforce_speed[reduce-overhead-None] 3.0658ms 2.8435ms 351.6756 Ops/s 344.3290 Ops/s $\color{#35bf28}+2.13\%$
test_iql_speed[False-None] 20.6121ms 19.7569ms 50.6153 Ops/s 49.7594 Ops/s $\color{#35bf28}+1.72\%$
test_iql_speed[False-backward] 35.2516ms 30.5010ms 32.7858 Ops/s 32.2292 Ops/s $\color{#35bf28}+1.73\%$
test_iql_speed[True-None] 8.9107ms 8.5476ms 116.9913 Ops/s 116.9933 Ops/s $-0.00\%$
test_iql_speed[True-backward] 17.1021ms 16.6646ms 60.0075 Ops/s 59.8905 Ops/s $\color{#35bf28}+0.20\%$
test_iql_speed[reduce-overhead-None] 8.8051ms 8.5126ms 117.4729 Ops/s 114.6674 Ops/s $\color{#35bf28}+2.45\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.5020ms 5.8802ms 170.0628 Ops/s 168.8470 Ops/s $\color{#35bf28}+0.72\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5361ms 0.3130ms 3.1952 KOps/s 3.1978 KOps/s $\color{#d91a1a}-0.08\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5251ms 0.3279ms 3.0499 KOps/s 3.3378 KOps/s $\textbf{\color{#d91a1a}-8.63\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.7759ms 5.5540ms 180.0489 Ops/s 175.8605 Ops/s $\color{#35bf28}+2.38\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6867ms 0.3314ms 3.0175 KOps/s 3.1701 KOps/s $\color{#d91a1a}-4.81\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5426ms 0.3240ms 3.0864 KOps/s 3.4551 KOps/s $\textbf{\color{#d91a1a}-10.67\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5631ms 1.3327ms 750.3821 Ops/s 748.2673 Ops/s $\color{#35bf28}+0.28\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4773ms 1.2318ms 811.8215 Ops/s 792.4305 Ops/s $\color{#35bf28}+2.45\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.6218ms 5.8699ms 170.3614 Ops/s 169.8888 Ops/s $\color{#35bf28}+0.28\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0356ms 0.4777ms 2.0936 KOps/s 2.1374 KOps/s $\color{#d91a1a}-2.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6851ms 0.4581ms 2.1831 KOps/s 2.4089 KOps/s $\textbf{\color{#d91a1a}-9.37\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7175ms 5.6158ms 178.0687 Ops/s 173.8374 Ops/s $\color{#35bf28}+2.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7024ms 0.3203ms 3.1219 KOps/s 2.7642 KOps/s $\textbf{\color{#35bf28}+12.94\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5118ms 0.3097ms 3.2294 KOps/s 2.8582 KOps/s $\textbf{\color{#35bf28}+12.99\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.7728ms 5.5648ms 179.7023 Ops/s 177.9616 Ops/s $\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7417ms 0.3081ms 3.2461 KOps/s 3.0490 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5494ms 0.2981ms 3.3541 KOps/s 3.2818 KOps/s $\color{#35bf28}+2.20\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2963ms 5.7073ms 175.2132 Ops/s 170.6989 Ops/s $\color{#35bf28}+2.64\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.4956ms 0.4738ms 2.1106 KOps/s 2.0465 KOps/s $\color{#35bf28}+3.13\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6314ms 0.4449ms 2.2477 KOps/s 2.0341 KOps/s $\textbf{\color{#35bf28}+10.50\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5727s 16.2752ms 61.4431 Ops/s 202.4618 Ops/s $\textbf{\color{#d91a1a}-69.65\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.6229ms 1.8275ms 547.1914 Ops/s 518.9210 Ops/s $\textbf{\color{#35bf28}+5.45\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 9.9171ms 1.2292ms 813.5657 Ops/s 925.0353 Ops/s $\textbf{\color{#d91a1a}-12.05\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.7004ms 4.8962ms 204.2385 Ops/s 201.8365 Ops/s $\color{#35bf28}+1.19\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.7845ms 1.6834ms 594.0399 Ops/s 577.1695 Ops/s $\color{#35bf28}+2.92\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.9694ms 0.8582ms 1.1652 KOps/s 794.7496 Ops/s $\textbf{\color{#35bf28}+46.61\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5082s 15.2356ms 65.6359 Ops/s 58.7044 Ops/s $\textbf{\color{#35bf28}+11.81\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 11.8744ms 2.0203ms 494.9723 Ops/s 536.7543 Ops/s $\textbf{\color{#d91a1a}-7.78\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.0285ms 1.0584ms 944.7806 Ops/s 955.8455 Ops/s $\color{#d91a1a}-1.16\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 36.6127ms 33.4824ms 29.8665 Ops/s 29.5633 Ops/s $\color{#35bf28}+1.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.0122ms 17.4053ms 57.4539 Ops/s 58.1923 Ops/s $\color{#d91a1a}-1.27\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 37.7876ms 34.6819ms 28.8334 Ops/s 28.8369 Ops/s $\color{#d91a1a}-0.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.0181ms 17.5610ms 56.9443 Ops/s 56.2631 Ops/s $\color{#35bf28}+1.21\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 37.3319ms 35.9621ms 27.8070 Ops/s 27.1577 Ops/s $\color{#35bf28}+2.39\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 19.9758ms 18.7628ms 53.2968 Ops/s 52.0499 Ops/s $\color{#35bf28}+2.40\%$

@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 148. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 83.0325μs 82.0969μs 12.1807 KOps/s 12.6731 KOps/s $\color{#d91a1a}-3.89\%$
test_tensor_to_bytestream_speed[torch.save] 0.1394ms 0.1370ms 7.2996 KOps/s 7.3855 KOps/s $\color{#d91a1a}-1.16\%$
test_tensor_to_bytestream_speed[untyped_storage] 98.3264ms 98.0389ms 10.2000 Ops/s 10.0754 Ops/s $\color{#35bf28}+1.24\%$
test_tensor_to_bytestream_speed[numpy] 2.3656μs 2.3615μs 423.4584 KOps/s 424.9207 KOps/s $\color{#d91a1a}-0.34\%$
test_tensor_to_bytestream_speed[safetensors] 38.0479μs 37.8619μs 26.4118 KOps/s 27.0731 KOps/s $\color{#d91a1a}-2.44\%$
test_simple 0.8041s 0.7960s 1.2563 Ops/s 1.2675 Ops/s $\color{#d91a1a}-0.89\%$
test_transformed 1.5122s 1.4194s 0.7045 Ops/s 0.7150 Ops/s $\color{#d91a1a}-1.46\%$
test_serial 2.3262s 2.2470s 0.4450 Ops/s 0.4323 Ops/s $\color{#35bf28}+2.96\%$
test_parallel 2.0240s 1.9393s 0.5156 Ops/s 0.5168 Ops/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-True-True-True-True] 0.1956ms 43.3692μs 23.0578 KOps/s 22.8300 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[True-True-True-True-False] 61.2320μs 24.3025μs 41.1479 KOps/s 40.8817 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[True-True-True-False-True] 60.4220μs 24.5569μs 40.7217 KOps/s 40.6849 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[True-True-True-False-False] 38.9510μs 13.5015μs 74.0661 KOps/s 74.1726 KOps/s $\color{#d91a1a}-0.14\%$
test_step_mdp_speed[True-True-False-True-True] 85.2020μs 47.4357μs 21.0812 KOps/s 21.1438 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[True-True-False-True-False] 61.5020μs 27.3129μs 36.6127 KOps/s 36.5435 KOps/s $\color{#35bf28}+0.19\%$
test_step_mdp_speed[True-True-False-False-True] 0.4524ms 27.7850μs 35.9906 KOps/s 36.0745 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-True-False-False-False] 45.1910μs 16.3174μs 61.2842 KOps/s 61.5395 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[True-False-True-True-True] 89.0820μs 49.3938μs 20.2455 KOps/s 19.6659 KOps/s $\color{#35bf28}+2.95\%$
test_step_mdp_speed[True-False-True-True-False] 66.5420μs 30.1364μs 33.1825 KOps/s 33.2361 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[True-False-True-False-True] 62.5720μs 27.4366μs 36.4477 KOps/s 36.2102 KOps/s $\color{#35bf28}+0.66\%$
test_step_mdp_speed[True-False-True-False-False] 49.9010μs 16.4437μs 60.8134 KOps/s 61.6872 KOps/s $\color{#d91a1a}-1.42\%$
test_step_mdp_speed[True-False-False-True-True] 94.9520μs 52.5101μs 19.0440 KOps/s 19.0627 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-False-False-True-False] 71.5110μs 32.3985μs 30.8657 KOps/s 30.5393 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[True-False-False-False-True] 72.7420μs 29.8641μs 33.4850 KOps/s 33.1282 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[True-False-False-False-False] 49.5510μs 18.7594μs 53.3067 KOps/s 52.7993 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[False-True-True-True-True] 93.3520μs 50.0304μs 19.9878 KOps/s 20.2848 KOps/s $\color{#d91a1a}-1.46\%$
test_step_mdp_speed[False-True-True-True-False] 84.4930μs 29.9946μs 33.3393 KOps/s 33.4389 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-True-True-False-True] 77.4020μs 31.7037μs 31.5421 KOps/s 31.6146 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-True-True-False-False] 51.1810μs 18.0262μs 55.4748 KOps/s 55.5547 KOps/s $\color{#d91a1a}-0.14\%$
test_step_mdp_speed[False-True-False-True-True] 2.7609ms 53.0752μs 18.8412 KOps/s 18.9105 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[False-True-False-True-False] 73.7320μs 33.1765μs 30.1418 KOps/s 30.4146 KOps/s $\color{#d91a1a}-0.90\%$
test_step_mdp_speed[False-True-False-False-True] 70.2120μs 33.5305μs 29.8236 KOps/s 29.8937 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-True-False-False-False] 53.3420μs 20.5375μs 48.6914 KOps/s 48.7118 KOps/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[False-False-True-True-True] 0.1015ms 55.2817μs 18.0892 KOps/s 18.0656 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-False-True-True-False] 69.5610μs 35.1649μs 28.4375 KOps/s 27.7447 KOps/s $\color{#35bf28}+2.50\%$
test_step_mdp_speed[False-False-True-False-True] 70.7820μs 33.8317μs 29.5581 KOps/s 29.7201 KOps/s $\color{#d91a1a}-0.55\%$
test_step_mdp_speed[False-False-True-False-False] 63.1020μs 20.7141μs 48.2763 KOps/s 49.3581 KOps/s $\color{#d91a1a}-2.19\%$
test_step_mdp_speed[False-False-False-True-True] 92.3530μs 56.3600μs 17.7431 KOps/s 17.9291 KOps/s $\color{#d91a1a}-1.04\%$
test_step_mdp_speed[False-False-False-True-False] 77.6920μs 37.8961μs 26.3879 KOps/s 26.3026 KOps/s $\color{#35bf28}+0.32\%$
test_step_mdp_speed[False-False-False-False-True] 77.1220μs 36.1631μs 27.6525 KOps/s 28.2189 KOps/s $\color{#d91a1a}-2.01\%$
test_step_mdp_speed[False-False-False-False-False] 58.0910μs 23.0341μs 43.4139 KOps/s 43.0368 KOps/s $\color{#35bf28}+0.88\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8433s 0.7457s 1.3411 Ops/s 1.3457 Ops/s $\color{#d91a1a}-0.35\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7128s 0.6165s 1.6220 Ops/s 1.6308 Ops/s $\color{#d91a1a}-0.54\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7012s 1.6290s 0.6139 Ops/s 0.6182 Ops/s $\color{#d91a1a}-0.71\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4861s 1.4141s 0.7072 Ops/s 0.7130 Ops/s $\color{#d91a1a}-0.82\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9425s 1.8676s 0.5355 Ops/s 0.5362 Ops/s $\color{#d91a1a}-0.14\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7242s 1.6503s 0.6059 Ops/s 0.6082 Ops/s $\color{#d91a1a}-0.36\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7693s 4.5878s 0.2180 Ops/s 0.2186 Ops/s $\color{#d91a1a}-0.29\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6109s 4.3991s 0.2273 Ops/s 0.2303 Ops/s $\color{#d91a1a}-1.31\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0175s 1.9194s 0.5210 Ops/s 0.5125 Ops/s $\color{#35bf28}+1.65\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7370s 1.6452s 0.6078 Ops/s 0.6158 Ops/s $\color{#d91a1a}-1.29\%$
test_values[generalized_advantage_estimate-True-True] 19.9749ms 19.3409ms 51.7040 Ops/s 52.0872 Ops/s $\color{#d91a1a}-0.74\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1451s 3.8082ms 262.5938 Ops/s 263.4353 Ops/s $\color{#d91a1a}-0.32\%$
test_values[td0_return_estimate-False-False] 0.1063ms 81.1272μs 12.3263 KOps/s 12.4780 KOps/s $\color{#d91a1a}-1.22\%$
test_values[td1_return_estimate-False-False] 48.1218ms 46.4836ms 21.5130 Ops/s 21.5114 Ops/s $+0.01\%$
test_values[vec_td1_return_estimate-False-False] 1.2912ms 1.0673ms 936.9345 Ops/s 940.0740 Ops/s $\color{#d91a1a}-0.33\%$
test_values[td_lambda_return_estimate-True-False] 79.1476ms 76.3640ms 13.0952 Ops/s 13.2490 Ops/s $\color{#d91a1a}-1.16\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2308ms 1.0576ms 945.5093 Ops/s 945.2617 Ops/s $\color{#35bf28}+0.03\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.0343ms 19.5702ms 51.0980 Ops/s 51.1526 Ops/s $\color{#d91a1a}-0.11\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9981ms 0.7361ms 1.3584 KOps/s 1.3773 KOps/s $\color{#d91a1a}-1.37\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7306ms 0.6529ms 1.5316 KOps/s 1.5282 KOps/s $\color{#35bf28}+0.22\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5145ms 1.4660ms 682.1065 Ops/s 683.6231 Ops/s $\color{#d91a1a}-0.22\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7314ms 0.6692ms 1.4944 KOps/s 1.4943 KOps/s $+0.01\%$
test_dqn_speed[False-None] 1.5892ms 1.4793ms 676.0129 Ops/s 671.9114 Ops/s $\color{#35bf28}+0.61\%$
test_dqn_speed[False-backward] 2.1882ms 2.0922ms 477.9611 Ops/s 474.6568 Ops/s $\color{#35bf28}+0.70\%$
test_dqn_speed[True-None] 0.6646ms 0.5611ms 1.7823 KOps/s 1.7803 KOps/s $\color{#35bf28}+0.11\%$
test_dqn_speed[True-backward] 1.2312ms 1.1984ms 834.4389 Ops/s 842.3035 Ops/s $\color{#d91a1a}-0.93\%$
test_dqn_speed[reduce-overhead-None] 0.6686ms 0.6002ms 1.6662 KOps/s 1.6693 KOps/s $\color{#d91a1a}-0.19\%$
test_ddpg_speed[False-None] 3.1758ms 2.8125ms 355.5525 Ops/s 355.3379 Ops/s $\color{#35bf28}+0.06\%$
test_ddpg_speed[False-backward] 4.4361ms 4.1305ms 242.0988 Ops/s 239.0913 Ops/s $\color{#35bf28}+1.26\%$
test_ddpg_speed[True-None] 1.4887ms 1.3222ms 756.3248 Ops/s 757.7114 Ops/s $\color{#d91a1a}-0.18\%$
test_ddpg_speed[True-backward] 2.6464ms 2.5236ms 396.2654 Ops/s 398.0890 Ops/s $\color{#d91a1a}-0.46\%$
test_ddpg_speed[reduce-overhead-None] 1.4504ms 1.3430ms 744.5878 Ops/s 740.7088 Ops/s $\color{#35bf28}+0.52\%$
test_sac_speed[False-None] 8.6759ms 8.0944ms 123.5422 Ops/s 122.8273 Ops/s $\color{#35bf28}+0.58\%$
test_sac_speed[False-backward] 11.7083ms 11.2345ms 89.0116 Ops/s 88.3556 Ops/s $\color{#35bf28}+0.74\%$
test_sac_speed[True-None] 1.9774ms 1.8387ms 543.8596 Ops/s 545.5991 Ops/s $\color{#d91a1a}-0.32\%$
test_sac_speed[True-backward] 3.8121ms 3.6148ms 276.6376 Ops/s 274.7630 Ops/s $\color{#35bf28}+0.68\%$
test_sac_speed[reduce-overhead-None] 18.2361ms 10.5248ms 95.0141 Ops/s 96.7909 Ops/s $\color{#d91a1a}-1.84\%$
test_redq_deprec_speed[False-None] 9.8145ms 9.3058ms 107.4597 Ops/s 109.1907 Ops/s $\color{#d91a1a}-1.59\%$
test_redq_deprec_speed[False-backward] 12.7681ms 12.3548ms 80.9400 Ops/s 80.8020 Ops/s $\color{#35bf28}+0.17\%$
test_redq_deprec_speed[True-None] 2.6594ms 2.5369ms 394.1754 Ops/s 395.8549 Ops/s $\color{#d91a1a}-0.42\%$
test_redq_deprec_speed[True-backward] 4.6756ms 4.3001ms 232.5550 Ops/s 229.2149 Ops/s $\color{#35bf28}+1.46\%$
test_redq_deprec_speed[reduce-overhead-None] 15.3100ms 9.4906ms 105.3679 Ops/s 89.3413 Ops/s $\textbf{\color{#35bf28}+17.94\%}$
test_td3_speed[False-None] 8.1062ms 7.9692ms 125.4834 Ops/s 124.0882 Ops/s $\color{#35bf28}+1.12\%$
test_td3_speed[False-backward] 10.8745ms 10.4267ms 95.9076 Ops/s 93.6889 Ops/s $\color{#35bf28}+2.37\%$
test_td3_speed[True-None] 1.7726ms 1.6954ms 589.8403 Ops/s 600.4412 Ops/s $\color{#d91a1a}-1.77\%$
test_td3_speed[True-backward] 3.4068ms 3.2940ms 303.5792 Ops/s 301.2653 Ops/s $\color{#35bf28}+0.77\%$
test_td3_speed[reduce-overhead-None] 54.1838ms 23.3875ms 42.7580 Ops/s 42.5285 Ops/s $\color{#35bf28}+0.54\%$
test_cql_speed[False-None] 17.6209ms 16.8732ms 59.2656 Ops/s 59.4103 Ops/s $\color{#d91a1a}-0.24\%$
test_cql_speed[False-backward] 22.6659ms 21.8898ms 45.6834 Ops/s 44.8188 Ops/s $\color{#35bf28}+1.93\%$
test_cql_speed[True-None] 3.4513ms 3.3462ms 298.8450 Ops/s 283.0212 Ops/s $\textbf{\color{#35bf28}+5.59\%}$
test_cql_speed[True-backward] 6.0451ms 5.6294ms 177.6385 Ops/s 175.1251 Ops/s $\color{#35bf28}+1.44\%$
test_cql_speed[reduce-overhead-None] 18.3345ms 11.5748ms 86.3944 Ops/s 88.7103 Ops/s $\color{#d91a1a}-2.61\%$
test_a2c_speed[False-None] 3.9374ms 3.1446ms 318.0013 Ops/s 314.3781 Ops/s $\color{#35bf28}+1.15\%$
test_a2c_speed[False-backward] 6.5960ms 6.1891ms 161.5745 Ops/s 161.5781 Ops/s $-0.00\%$
test_a2c_speed[True-None] 1.5561ms 1.3291ms 752.3610 Ops/s 753.6383 Ops/s $\color{#d91a1a}-0.17\%$
test_a2c_speed[True-backward] 3.1884ms 3.1059ms 321.9707 Ops/s 320.1535 Ops/s $\color{#35bf28}+0.57\%$
test_a2c_speed[reduce-overhead-None] 1.0618ms 0.9350ms 1.0695 KOps/s 1.0684 KOps/s $\color{#35bf28}+0.10\%$
test_ppo_speed[False-None] 3.7905ms 3.6832ms 271.5058 Ops/s 269.9444 Ops/s $\color{#35bf28}+0.58\%$
test_ppo_speed[False-backward] 7.3068ms 6.9100ms 144.7188 Ops/s 145.0350 Ops/s $\color{#d91a1a}-0.22\%$
test_ppo_speed[True-None] 1.7512ms 1.3851ms 721.9510 Ops/s 696.0739 Ops/s $\color{#35bf28}+3.72\%$
test_ppo_speed[True-backward] 3.2232ms 3.1286ms 319.6356 Ops/s 314.1696 Ops/s $\color{#35bf28}+1.74\%$
test_ppo_speed[reduce-overhead-None] 1.0749ms 0.9992ms 1.0008 KOps/s 983.0878 Ops/s $\color{#35bf28}+1.81\%$
test_reinforce_speed[False-None] 2.3341ms 2.2112ms 452.2369 Ops/s 451.4724 Ops/s $\color{#35bf28}+0.17\%$
test_reinforce_speed[False-backward] 3.2371ms 3.1740ms 315.0626 Ops/s 297.5054 Ops/s $\textbf{\color{#35bf28}+5.90\%}$
test_reinforce_speed[True-None] 1.3633ms 1.2554ms 796.5667 Ops/s 772.5642 Ops/s $\color{#35bf28}+3.11\%$
test_reinforce_speed[True-backward] 3.0769ms 2.9510ms 338.8719 Ops/s 320.1627 Ops/s $\textbf{\color{#35bf28}+5.84\%}$
test_reinforce_speed[reduce-overhead-None] 0.4610s 10.0107ms 99.8928 Ops/s 96.0405 Ops/s $\color{#35bf28}+4.01\%$
test_iql_speed[False-None] 9.8191ms 9.1834ms 108.8920 Ops/s 108.4149 Ops/s $\color{#35bf28}+0.44\%$
test_iql_speed[False-backward] 13.1555ms 12.7067ms 78.6984 Ops/s 75.6513 Ops/s $\color{#35bf28}+4.03\%$
test_iql_speed[True-None] 2.3161ms 2.2065ms 453.2062 Ops/s 448.7361 Ops/s $\color{#35bf28}+1.00\%$
test_iql_speed[True-backward] 4.9376ms 4.7875ms 208.8794 Ops/s 199.7905 Ops/s $\color{#35bf28}+4.55\%$
test_iql_speed[reduce-overhead-None] 0.7046s 12.2426ms 81.6820 Ops/s 100.4684 Ops/s $\textbf{\color{#d91a1a}-18.70\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1898ms 5.8201ms 171.8170 Ops/s 170.3949 Ops/s $\color{#35bf28}+0.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6975ms 0.3446ms 2.9017 KOps/s 3.3911 KOps/s $\textbf{\color{#d91a1a}-14.43\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7005ms 0.2552ms 3.9187 KOps/s 3.2355 KOps/s $\textbf{\color{#35bf28}+21.12\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9664ms 5.6266ms 177.7261 Ops/s 179.5062 Ops/s $\color{#d91a1a}-0.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5455ms 0.2954ms 3.3855 KOps/s 3.6921 KOps/s $\textbf{\color{#d91a1a}-8.30\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5710ms 0.2971ms 3.3664 KOps/s 3.0746 KOps/s $\textbf{\color{#35bf28}+9.49\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4900ms 1.2562ms 796.0234 Ops/s 831.0067 Ops/s $\color{#d91a1a}-4.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4359ms 1.1940ms 837.5069 Ops/s 774.2491 Ops/s $\textbf{\color{#35bf28}+8.17\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.1935ms 5.9116ms 169.1602 Ops/s 174.4881 Ops/s $\color{#d91a1a}-3.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2241ms 0.4795ms 2.0856 KOps/s 2.3579 KOps/s $\textbf{\color{#d91a1a}-11.55\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6653ms 0.3995ms 2.5034 KOps/s 2.2221 KOps/s $\textbf{\color{#35bf28}+12.66\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7803ms 5.6032ms 178.4684 Ops/s 177.3833 Ops/s $\color{#35bf28}+0.61\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9976ms 0.2734ms 3.6570 KOps/s 3.5672 KOps/s $\color{#35bf28}+2.52\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4698ms 0.2582ms 3.8723 KOps/s 3.1953 KOps/s $\textbf{\color{#35bf28}+21.19\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9952ms 5.6150ms 178.0956 Ops/s 178.4091 Ops/s $\color{#d91a1a}-0.18\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9476ms 0.3844ms 2.6011 KOps/s 3.6656 KOps/s $\textbf{\color{#d91a1a}-29.04\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6190ms 0.3711ms 2.6949 KOps/s 3.4868 KOps/s $\textbf{\color{#d91a1a}-22.71\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9047ms 5.7646ms 173.4731 Ops/s 175.0079 Ops/s $\color{#d91a1a}-0.88\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9118ms 0.4434ms 2.2555 KOps/s 2.2741 KOps/s $\color{#d91a1a}-0.82\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5911ms 0.4009ms 2.4943 KOps/s 2.4717 KOps/s $\color{#35bf28}+0.91\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3265ms 4.8997ms 204.0931 Ops/s 49.6041 Ops/s $\textbf{\color{#35bf28}+311.44\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.2866ms 2.0228ms 494.3526 Ops/s 512.1018 Ops/s $\color{#d91a1a}-3.47\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1393ms 0.9035ms 1.1068 KOps/s 781.4756 Ops/s $\textbf{\color{#35bf28}+41.63\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5834s 16.5901ms 60.2771 Ops/s 201.5808 Ops/s $\textbf{\color{#d91a1a}-70.10\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9346ms 1.7663ms 566.1553 Ops/s 543.6343 Ops/s $\color{#35bf28}+4.14\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1526ms 0.8944ms 1.1181 KOps/s 779.3225 Ops/s $\textbf{\color{#35bf28}+43.47\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.3644ms 5.1930ms 192.5665 Ops/s 196.0629 Ops/s $\color{#d91a1a}-1.78\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 13.0715ms 1.9877ms 503.0825 Ops/s 518.8001 Ops/s $\color{#d91a1a}-3.03\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.4003ms 1.2190ms 820.3607 Ops/s 894.4131 Ops/s $\textbf{\color{#d91a1a}-8.28\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.3049ms 33.5816ms 29.7782 Ops/s 29.0069 Ops/s $\color{#35bf28}+2.66\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3616ms 17.5493ms 56.9824 Ops/s 57.3557 Ops/s $\color{#d91a1a}-0.65\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 38.2970ms 34.7771ms 28.7545 Ops/s 29.0290 Ops/s $\color{#d91a1a}-0.95\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.0766ms 17.4496ms 57.3080 Ops/s 56.3648 Ops/s $\color{#35bf28}+1.67\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 37.9983ms 36.4533ms 27.4324 Ops/s 27.5454 Ops/s $\color{#d91a1a}-0.41\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 19.9540ms 18.8533ms 53.0410 Ops/s 53.4229 Ops/s $\color{#d91a1a}-0.71\%$

@vmoens vmoens merged commit 0982fcf into main Jan 19, 2026
105 of 109 checks passed
@vmoens vmoens deleted the fix-gpu-bench branch January 19, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants