Skip to content

[Performance] Add _skip_maybe_reset flag to bypass auto-reset in step_and_maybe_reset#3560

Open
vmoens wants to merge 1 commit intogh/vmoens/241/basefrom
gh/vmoens/241/head
Open

[Performance] Add _skip_maybe_reset flag to bypass auto-reset in step_and_maybe_reset#3560
vmoens wants to merge 1 commit intogh/vmoens/241/basefrom
gh/vmoens/241/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 23, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3560

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit aba1d20 with merge base 0a1aea6 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.9848μs 81.5420μs 12.2636 KOps/s 12.4927 KOps/s $\color{#d91a1a}-1.83\%$
test_tensor_to_bytestream_speed[torch.save] 0.1401ms 0.1391ms 7.1916 KOps/s 6.9324 KOps/s $\color{#35bf28}+3.74\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1233s 0.1231s 8.1217 Ops/s 8.1619 Ops/s $\color{#d91a1a}-0.49\%$
test_tensor_to_bytestream_speed[numpy] 2.4780μs 2.4713μs 404.6414 KOps/s 395.0210 KOps/s $\color{#35bf28}+2.44\%$
test_tensor_to_bytestream_speed[safetensors] 38.8348μs 38.6497μs 25.8734 KOps/s 25.9372 KOps/s $\color{#d91a1a}-0.25\%$
test_simple 0.5560s 0.5514s 1.8135 Ops/s 1.7220 Ops/s $\textbf{\color{#35bf28}+5.31\%}$
test_transformed 1.0960s 1.0943s 0.9138 Ops/s 0.8940 Ops/s $\color{#35bf28}+2.21\%$
test_serial 1.6881s 1.6840s 0.5938 Ops/s 0.5860 Ops/s $\color{#35bf28}+1.33\%$
test_parallel 1.1570s 1.0573s 0.9458 Ops/s 0.9508 Ops/s $\color{#d91a1a}-0.52\%$
test_step_mdp_speed[True-True-True-True-True] 0.1668ms 41.0679μs 24.3499 KOps/s 23.8629 KOps/s $\color{#35bf28}+2.04\%$
test_step_mdp_speed[True-True-True-True-False] 54.1110μs 23.6761μs 42.2368 KOps/s 43.3213 KOps/s $\color{#d91a1a}-2.50\%$
test_step_mdp_speed[True-True-True-False-True] 94.3420μs 23.9637μs 41.7299 KOps/s 42.6630 KOps/s $\color{#d91a1a}-2.19\%$
test_step_mdp_speed[True-True-True-False-False] 39.8210μs 13.1404μs 76.1014 KOps/s 78.7653 KOps/s $\color{#d91a1a}-3.38\%$
test_step_mdp_speed[True-True-False-True-True] 0.1179ms 46.0142μs 21.7324 KOps/s 22.8084 KOps/s $\color{#d91a1a}-4.72\%$
test_step_mdp_speed[True-True-False-True-False] 53.5110μs 26.3198μs 37.9942 KOps/s 39.7702 KOps/s $\color{#d91a1a}-4.47\%$
test_step_mdp_speed[True-True-False-False-True] 62.3820μs 26.7492μs 37.3842 KOps/s 37.9795 KOps/s $\color{#d91a1a}-1.57\%$
test_step_mdp_speed[True-True-False-False-False] 57.7810μs 15.9600μs 62.6567 KOps/s 65.0950 KOps/s $\color{#d91a1a}-3.75\%$
test_step_mdp_speed[True-False-True-True-True] 95.5320μs 48.7684μs 20.5051 KOps/s 21.5345 KOps/s $\color{#d91a1a}-4.78\%$
test_step_mdp_speed[True-False-True-True-False] 61.7310μs 29.3040μs 34.1250 KOps/s 35.6445 KOps/s $\color{#d91a1a}-4.26\%$
test_step_mdp_speed[True-False-True-False-True] 95.3920μs 26.9025μs 37.1713 KOps/s 38.8712 KOps/s $\color{#d91a1a}-4.37\%$
test_step_mdp_speed[True-False-True-False-False] 42.1710μs 15.8677μs 63.0209 KOps/s 64.8986 KOps/s $\color{#d91a1a}-2.89\%$
test_step_mdp_speed[True-False-False-True-True] 82.1910μs 51.0036μs 19.6065 KOps/s 20.3497 KOps/s $\color{#d91a1a}-3.65\%$
test_step_mdp_speed[True-False-False-True-False] 78.9010μs 32.0297μs 31.2211 KOps/s 32.7587 KOps/s $\color{#d91a1a}-4.69\%$
test_step_mdp_speed[True-False-False-False-True] 84.6110μs 29.4429μs 33.9640 KOps/s 35.5852 KOps/s $\color{#d91a1a}-4.56\%$
test_step_mdp_speed[True-False-False-False-False] 95.4810μs 18.5197μs 53.9964 KOps/s 55.7958 KOps/s $\color{#d91a1a}-3.22\%$
test_step_mdp_speed[False-True-True-True-True] 0.1045ms 48.0179μs 20.8256 KOps/s 21.4973 KOps/s $\color{#d91a1a}-3.12\%$
test_step_mdp_speed[False-True-True-True-False] 57.4510μs 29.1167μs 34.3445 KOps/s 35.8828 KOps/s $\color{#d91a1a}-4.29\%$
test_step_mdp_speed[False-True-True-False-True] 2.5584ms 31.5330μs 31.7128 KOps/s 33.7982 KOps/s $\textbf{\color{#d91a1a}-6.17\%}$
test_step_mdp_speed[False-True-True-False-False] 50.2210μs 17.9429μs 55.7323 KOps/s 58.5374 KOps/s $\color{#d91a1a}-4.79\%$
test_step_mdp_speed[False-True-False-True-True] 80.7520μs 50.9024μs 19.6454 KOps/s 21.1105 KOps/s $\textbf{\color{#d91a1a}-6.94\%}$
test_step_mdp_speed[False-True-False-True-False] 0.1074ms 32.0556μs 31.1958 KOps/s 33.0032 KOps/s $\textbf{\color{#d91a1a}-5.48\%}$
test_step_mdp_speed[False-True-False-False-True] 57.4910μs 34.0309μs 29.3851 KOps/s 31.7887 KOps/s $\textbf{\color{#d91a1a}-7.56\%}$
test_step_mdp_speed[False-True-False-False-False] 59.4410μs 20.1981μs 49.5096 KOps/s 51.3076 KOps/s $\color{#d91a1a}-3.50\%$
test_step_mdp_speed[False-False-True-True-True] 93.7020μs 53.8050μs 18.5856 KOps/s 19.3341 KOps/s $\color{#d91a1a}-3.87\%$
test_step_mdp_speed[False-False-True-True-False] 63.5110μs 34.8426μs 28.7005 KOps/s 29.8742 KOps/s $\color{#d91a1a}-3.93\%$
test_step_mdp_speed[False-False-True-False-True] 0.1137ms 33.4293μs 29.9139 KOps/s 31.5456 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_step_mdp_speed[False-False-True-False-False] 50.1910μs 20.1999μs 49.5053 KOps/s 50.9637 KOps/s $\color{#d91a1a}-2.86\%$
test_step_mdp_speed[False-False-False-True-True] 89.5210μs 56.3065μs 17.7600 KOps/s 18.6632 KOps/s $\color{#d91a1a}-4.84\%$
test_step_mdp_speed[False-False-False-True-False] 68.1710μs 37.4180μs 26.7251 KOps/s 28.0033 KOps/s $\color{#d91a1a}-4.56\%$
test_step_mdp_speed[False-False-False-False-True] 69.8510μs 35.5055μs 28.1647 KOps/s 29.3303 KOps/s $\color{#d91a1a}-3.97\%$
test_step_mdp_speed[False-False-False-False-False] 96.7010μs 23.1067μs 43.2774 KOps/s 46.0760 KOps/s $\textbf{\color{#d91a1a}-6.07\%}$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7144s 0.7113s 1.4059 Ops/s 1.3474 Ops/s $\color{#35bf28}+4.34\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7111s 0.6031s 1.6580 Ops/s 1.6475 Ops/s $\color{#35bf28}+0.64\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7266s 1.6274s 0.6145 Ops/s 0.6085 Ops/s $\color{#35bf28}+0.98\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4951s 1.4064s 0.7110 Ops/s 0.7013 Ops/s $\color{#35bf28}+1.38\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9670s 1.8779s 0.5325 Ops/s 0.5291 Ops/s $\color{#35bf28}+0.64\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7526s 1.6618s 0.6018 Ops/s 0.6014 Ops/s $\color{#35bf28}+0.05\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6615s 4.5738s 0.2186 Ops/s 0.2194 Ops/s $\color{#d91a1a}-0.33\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.3737s 4.2911s 0.2330 Ops/s 0.2279 Ops/s $\color{#35bf28}+2.24\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9439s 1.8727s 0.5340 Ops/s 0.5275 Ops/s $\color{#35bf28}+1.22\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.8007s 1.6373s 0.6108 Ops/s 0.6186 Ops/s $\color{#d91a1a}-1.27\%$
test_values[generalized_advantage_estimate-True-True] 10.4674ms 10.3071ms 97.0201 Ops/s 96.5607 Ops/s $\color{#35bf28}+0.48\%$
test_values[vec_generalized_advantage_estimate-True-True] 18.9397ms 17.9421ms 55.7350 Ops/s 55.2761 Ops/s $\color{#35bf28}+0.83\%$
test_values[td0_return_estimate-False-False] 0.2141ms 0.1258ms 7.9482 KOps/s 7.6070 KOps/s $\color{#35bf28}+4.49\%$
test_values[td1_return_estimate-False-False] 27.7205ms 27.3082ms 36.6190 Ops/s 36.0681 Ops/s $\color{#35bf28}+1.53\%$
test_values[vec_td1_return_estimate-False-False] 18.8061ms 18.2135ms 54.9044 Ops/s 54.0437 Ops/s $\color{#35bf28}+1.59\%$
test_values[td_lambda_return_estimate-True-False] 40.9556ms 40.7534ms 24.5378 Ops/s 24.3324 Ops/s $\color{#35bf28}+0.84\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.6741ms 18.2117ms 54.9096 Ops/s 54.1366 Ops/s $\color{#35bf28}+1.43\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.3428ms 9.2251ms 108.3999 Ops/s 109.8685 Ops/s $\color{#d91a1a}-1.34\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7454ms 1.5637ms 639.5011 Ops/s 626.1433 Ops/s $\color{#35bf28}+2.13\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5438ms 0.4233ms 2.3624 KOps/s 2.3415 KOps/s $\color{#35bf28}+0.89\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 30.4127ms 29.8131ms 33.5423 Ops/s 28.3638 Ops/s $\textbf{\color{#35bf28}+18.26\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9326ms 1.7932ms 557.6712 Ops/s 554.1874 Ops/s $\color{#35bf28}+0.63\%$
test_dqn_speed[False-None] 1.5829ms 1.4369ms 695.9655 Ops/s 701.3209 Ops/s $\color{#d91a1a}-0.76\%$
test_dqn_speed[False-backward] 2.0148ms 1.9523ms 512.2151 Ops/s 515.0989 Ops/s $\color{#d91a1a}-0.56\%$
test_dqn_speed[True-None] 0.7014ms 0.5490ms 1.8216 KOps/s 1.7412 KOps/s $\color{#35bf28}+4.61\%$
test_dqn_speed[True-backward] 1.0732ms 1.0171ms 983.1785 Ops/s 957.2890 Ops/s $\color{#35bf28}+2.70\%$
test_dqn_speed[reduce-overhead-None] 0.9403ms 0.5474ms 1.8268 KOps/s 1.7879 KOps/s $\color{#35bf28}+2.18\%$
test_ddpg_speed[False-None] 3.3145ms 2.9062ms 344.0876 Ops/s 348.9422 Ops/s $\color{#d91a1a}-1.39\%$
test_ddpg_speed[False-backward] 4.2374ms 4.1593ms 240.4271 Ops/s 243.1088 Ops/s $\color{#d91a1a}-1.10\%$
test_ddpg_speed[True-None] 1.8498ms 1.4542ms 687.6658 Ops/s 680.0374 Ops/s $\color{#35bf28}+1.12\%$
test_ddpg_speed[True-backward] 2.5453ms 2.4742ms 404.1640 Ops/s 391.5942 Ops/s $\color{#35bf28}+3.21\%$
test_ddpg_speed[reduce-overhead-None] 1.6135ms 1.4659ms 682.1592 Ops/s 685.5397 Ops/s $\color{#d91a1a}-0.49\%$
test_sac_speed[False-None] 8.9094ms 8.2628ms 121.0237 Ops/s 121.8032 Ops/s $\color{#d91a1a}-0.64\%$
test_sac_speed[False-backward] 11.8245ms 11.5578ms 86.5213 Ops/s 86.8328 Ops/s $\color{#d91a1a}-0.36\%$
test_sac_speed[True-None] 2.4471ms 2.2232ms 449.8012 Ops/s 445.8017 Ops/s $\color{#35bf28}+0.90\%$
test_sac_speed[True-backward] 4.7009ms 4.1856ms 238.9125 Ops/s 214.7470 Ops/s $\textbf{\color{#35bf28}+11.25\%}$
test_sac_speed[reduce-overhead-None] 2.6367ms 2.2177ms 450.9208 Ops/s 442.5775 Ops/s $\color{#35bf28}+1.89\%$
test_redq_speed[False-None] 11.6933ms 10.8466ms 92.1948 Ops/s 90.4609 Ops/s $\color{#35bf28}+1.92\%$
test_redq_speed[False-backward] 21.3497ms 18.7420ms 53.3561 Ops/s 53.5300 Ops/s $\color{#d91a1a}-0.32\%$
test_redq_speed[True-None] 5.0817ms 4.5852ms 218.0933 Ops/s 212.1106 Ops/s $\color{#35bf28}+2.82\%$
test_redq_speed[reduce-overhead-None] 4.9035ms 4.5527ms 219.6493 Ops/s 221.0881 Ops/s $\color{#d91a1a}-0.65\%$
test_redq_deprec_speed[False-None] 11.9812ms 11.4227ms 87.5447 Ops/s 85.8973 Ops/s $\color{#35bf28}+1.92\%$
test_redq_deprec_speed[False-backward] 17.8752ms 16.3492ms 61.1650 Ops/s 59.5804 Ops/s $\color{#35bf28}+2.66\%$
test_redq_deprec_speed[True-None] 4.0977ms 3.7510ms 266.5939 Ops/s 263.4999 Ops/s $\color{#35bf28}+1.17\%$
test_redq_deprec_speed[True-backward] 8.0061ms 7.7145ms 129.6265 Ops/s 123.9994 Ops/s $\color{#35bf28}+4.54\%$
test_redq_deprec_speed[reduce-overhead-None] 4.1342ms 3.7016ms 270.1550 Ops/s 260.3523 Ops/s $\color{#35bf28}+3.77\%$
test_td3_speed[False-None] 8.3259ms 8.2243ms 121.5902 Ops/s 121.1529 Ops/s $\color{#35bf28}+0.36\%$
test_td3_speed[False-backward] 11.3617ms 11.1098ms 90.0105 Ops/s 90.2568 Ops/s $\color{#d91a1a}-0.27\%$
test_td3_speed[True-None] 1.9236ms 1.8802ms 531.8462 Ops/s 532.5908 Ops/s $\color{#d91a1a}-0.14\%$
test_td3_speed[True-backward] 3.8861ms 3.6760ms 272.0321 Ops/s 272.0295 Ops/s $+0.00\%$
test_td3_speed[reduce-overhead-None] 1.8975ms 1.8453ms 541.9123 Ops/s 539.6771 Ops/s $\color{#35bf28}+0.41\%$
test_cql_speed[False-None] 31.9625ms 27.5571ms 36.2883 Ops/s 37.1357 Ops/s $\color{#d91a1a}-2.28\%$
test_cql_speed[False-backward] 37.6211ms 36.6963ms 27.2507 Ops/s 27.1589 Ops/s $\color{#35bf28}+0.34\%$
test_cql_speed[True-None] 13.6737ms 13.1021ms 76.3234 Ops/s 77.6418 Ops/s $\color{#d91a1a}-1.70\%$
test_cql_speed[True-backward] 18.9705ms 18.6165ms 53.7157 Ops/s 55.2597 Ops/s $\color{#d91a1a}-2.79\%$
test_cql_speed[reduce-overhead-None] 17.0186ms 13.0982ms 76.3463 Ops/s 77.5513 Ops/s $\color{#d91a1a}-1.55\%$
test_a2c_speed[False-None] 6.0164ms 5.6227ms 177.8504 Ops/s 178.3288 Ops/s $\color{#d91a1a}-0.27\%$
test_a2c_speed[False-backward] 12.8185ms 12.2104ms 81.8973 Ops/s 81.9290 Ops/s $\color{#d91a1a}-0.04\%$
test_a2c_speed[True-None] 4.4442ms 3.9079ms 255.8922 Ops/s 259.1394 Ops/s $\color{#d91a1a}-1.25\%$
test_a2c_speed[True-backward] 8.9413ms 8.7363ms 114.4655 Ops/s 113.7364 Ops/s $\color{#35bf28}+0.64\%$
test_a2c_speed[reduce-overhead-None] 4.2894ms 3.8950ms 256.7398 Ops/s 256.7407 Ops/s $-0.00\%$
test_ppo_speed[False-None] 6.4214ms 6.0484ms 165.3331 Ops/s 161.2031 Ops/s $\color{#35bf28}+2.56\%$
test_ppo_speed[False-backward] 13.2319ms 12.9075ms 77.4745 Ops/s 75.9554 Ops/s $\color{#35bf28}+2.00\%$
test_ppo_speed[True-None] 3.9904ms 3.8168ms 262.0019 Ops/s 260.6291 Ops/s $\color{#35bf28}+0.53\%$
test_ppo_speed[True-backward] 8.9585ms 8.7252ms 114.6099 Ops/s 110.8398 Ops/s $\color{#35bf28}+3.40\%$
test_ppo_speed[reduce-overhead-None] 4.0682ms 3.7828ms 264.3517 Ops/s 264.0851 Ops/s $\color{#35bf28}+0.10\%$
test_reinforce_speed[False-None] 4.9789ms 4.7155ms 212.0666 Ops/s 214.8739 Ops/s $\color{#d91a1a}-1.31\%$
test_reinforce_speed[False-backward] 7.9383ms 7.6556ms 130.6226 Ops/s 131.7556 Ops/s $\color{#d91a1a}-0.86\%$
test_reinforce_speed[True-None] 3.4493ms 3.0200ms 331.1229 Ops/s 333.7912 Ops/s $\color{#d91a1a}-0.80\%$
test_reinforce_speed[True-backward] 8.1997ms 7.9728ms 125.4271 Ops/s 119.2332 Ops/s $\textbf{\color{#35bf28}+5.19\%}$
test_reinforce_speed[reduce-overhead-None] 3.2070ms 2.9943ms 333.9625 Ops/s 333.6264 Ops/s $\color{#35bf28}+0.10\%$
test_iql_speed[False-None] 21.2184ms 20.8190ms 48.0330 Ops/s 47.6478 Ops/s $\color{#35bf28}+0.81\%$
test_iql_speed[False-backward] 36.2802ms 31.9557ms 31.2933 Ops/s 31.2621 Ops/s $\color{#35bf28}+0.10\%$
test_iql_speed[True-None] 9.2320ms 8.8373ms 113.1569 Ops/s 99.2283 Ops/s $\textbf{\color{#35bf28}+14.04\%}$
test_iql_speed[True-backward] 17.6912ms 17.2408ms 58.0019 Ops/s 55.8637 Ops/s $\color{#35bf28}+3.83\%$
test_iql_speed[reduce-overhead-None] 9.1750ms 8.8747ms 112.6799 Ops/s 108.2409 Ops/s $\color{#35bf28}+4.10\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2578ms 6.0229ms 166.0335 Ops/s 165.2549 Ops/s $\color{#35bf28}+0.47\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.1724ms 0.3658ms 2.7341 KOps/s 2.5307 KOps/s $\textbf{\color{#35bf28}+8.04\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5974ms 0.3520ms 2.8408 KOps/s 2.9705 KOps/s $\color{#d91a1a}-4.37\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0904ms 5.8078ms 172.1820 Ops/s 171.1770 Ops/s $\color{#35bf28}+0.59\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1304ms 0.3399ms 2.9418 KOps/s 2.6606 KOps/s $\textbf{\color{#35bf28}+10.57\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6638ms 0.3256ms 3.0710 KOps/s 2.7385 KOps/s $\textbf{\color{#35bf28}+12.14\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7794ms 1.4466ms 691.2899 Ops/s 679.3898 Ops/s $\color{#35bf28}+1.75\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6393ms 1.3635ms 733.3823 Ops/s 716.1992 Ops/s $\color{#35bf28}+2.40\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.5029ms 6.0851ms 164.3362 Ops/s 165.6255 Ops/s $\color{#d91a1a}-0.78\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9257ms 0.5288ms 1.8911 KOps/s 1.9487 KOps/s $\color{#d91a1a}-2.95\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7776ms 0.5110ms 1.9571 KOps/s 2.0805 KOps/s $\textbf{\color{#d91a1a}-5.93\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9999ms 5.8432ms 171.1383 Ops/s 169.0870 Ops/s $\color{#35bf28}+1.21\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.8632ms 0.3423ms 2.9218 KOps/s 3.1961 KOps/s $\textbf{\color{#d91a1a}-8.58\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6005ms 0.3652ms 2.7385 KOps/s 2.9074 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0549ms 5.7823ms 172.9401 Ops/s 170.8234 Ops/s $\color{#35bf28}+1.24\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0208ms 0.3411ms 2.9314 KOps/s 3.0477 KOps/s $\color{#d91a1a}-3.82\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5535ms 0.3418ms 2.9261 KOps/s 2.8930 KOps/s $\color{#35bf28}+1.14\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0886ms 5.9456ms 168.1904 Ops/s 165.5206 Ops/s $\color{#35bf28}+1.61\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7258ms 0.5258ms 1.9020 KOps/s 2.1277 KOps/s $\textbf{\color{#d91a1a}-10.61\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 7.2177ms 0.5251ms 1.9045 KOps/s 2.3035 KOps/s $\textbf{\color{#d91a1a}-17.32\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.6660ms 5.2917ms 188.9740 Ops/s 192.9879 Ops/s $\color{#d91a1a}-2.08\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.1195ms 2.0229ms 494.3427 Ops/s 478.0683 Ops/s $\color{#35bf28}+3.40\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.4925ms 0.9765ms 1.0240 KOps/s 766.4525 Ops/s $\textbf{\color{#35bf28}+33.61\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6494s 18.4391ms 54.2326 Ops/s 36.3413 Ops/s $\textbf{\color{#35bf28}+49.23\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.9765ms 1.8922ms 528.4747 Ops/s 511.2541 Ops/s $\color{#35bf28}+3.37\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 9.5423ms 1.2803ms 781.0863 Ops/s 806.5900 Ops/s $\color{#d91a1a}-3.16\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.2639ms 5.5171ms 181.2550 Ops/s 188.6284 Ops/s $\color{#d91a1a}-3.91\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.0546ms 2.0448ms 489.0495 Ops/s 526.2388 Ops/s $\textbf{\color{#d91a1a}-7.07\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.3317ms 1.0572ms 945.9272 Ops/s 925.3513 Ops/s $\color{#35bf28}+2.22\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 41.2534ms 38.1778ms 26.1932 Ops/s 25.3792 Ops/s $\color{#35bf28}+3.21\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.8374ms 18.3237ms 54.5741 Ops/s 53.4679 Ops/s $\color{#35bf28}+2.07\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 43.9768ms 39.1689ms 25.5305 Ops/s 24.4870 Ops/s $\color{#35bf28}+4.26\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.8826ms 18.6117ms 53.7296 Ops/s 52.4370 Ops/s $\color{#35bf28}+2.47\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 42.4471ms 40.9722ms 24.4068 Ops/s 23.2249 Ops/s $\textbf{\color{#35bf28}+5.09\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.1408ms 20.1070ms 49.7338 Ops/s 48.7621 Ops/s $\color{#35bf28}+1.99\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8749ms 0.2209ms 4.5275 KOps/s 4.2790 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.8136ms 1.5371ms 650.5760 Ops/s 634.9699 Ops/s $\color{#35bf28}+2.46\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.9052ms 2.5062ms 399.0176 Ops/s 374.8167 Ops/s $\textbf{\color{#35bf28}+6.46\%}$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.5662ms 3.1644ms 316.0115 Ops/s 308.6064 Ops/s $\color{#35bf28}+2.40\%$
test_storage_write_contiguous[50-img_shape0-small] 0.4154ms 0.1368ms 7.3074 KOps/s 7.1290 KOps/s $\color{#35bf28}+2.50\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3230ms 0.1861ms 5.3720 KOps/s 5.2573 KOps/s $\color{#35bf28}+2.18\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.2781ms 1.9115ms 523.1390 Ops/s 526.2274 Ops/s $\color{#d91a1a}-0.59\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.7387ms 1.4248ms 701.8670 Ops/s 705.7421 Ops/s $\color{#d91a1a}-0.55\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3358ms 1.1285ms 886.1029 Ops/s 884.0787 Ops/s $\color{#35bf28}+0.23\%$
test_collector_stack_then_write[100-img_shape1-atari] 4.1143ms 3.6716ms 272.3608 Ops/s 269.2155 Ops/s $\color{#35bf28}+1.17\%$
test_collector_stack_then_write[100-img_shape2-large_img] 12.0197ms 6.1059ms 163.7763 Ops/s 166.4702 Ops/s $\color{#d91a1a}-1.62\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.4204ms 7.4633ms 133.9895 Ops/s 133.6489 Ops/s $\color{#35bf28}+0.25\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4483ms 0.2760ms 3.6226 KOps/s 3.4691 KOps/s $\color{#35bf28}+4.42\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.8600ms 1.6398ms 609.8471 Ops/s 601.3924 Ops/s $\color{#35bf28}+1.41\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 3.2262ms 2.6334ms 379.7422 Ops/s 359.8658 Ops/s $\textbf{\color{#35bf28}+5.52\%}$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.8062ms 3.3929ms 294.7369 Ops/s 289.6547 Ops/s $\color{#35bf28}+1.75\%$
test_collector_without_rb[100-img_shape0-atari] 34.5450ms 33.9971ms 29.4143 Ops/s 29.2369 Ops/s $\color{#35bf28}+0.61\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.0800ms 66.5561ms 15.0249 Ops/s 14.7586 Ops/s $\color{#35bf28}+1.80\%$
test_collector_with_rb[100-img_shape0-atari] 39.3009ms 38.6187ms 25.8942 Ops/s 25.8331 Ops/s $\color{#35bf28}+0.24\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.9006ms 75.5987ms 13.2277 Ops/s 13.0317 Ops/s $\color{#35bf28}+1.50\%$

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 83.1385μs 81.5006μs 12.2699 KOps/s 12.4701 KOps/s $\color{#d91a1a}-1.61\%$
test_tensor_to_bytestream_speed[torch.save] 0.1428ms 0.1419ms 7.0464 KOps/s 7.1215 KOps/s $\color{#d91a1a}-1.05\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1204s 0.1199s 8.3400 Ops/s 8.0446 Ops/s $\color{#35bf28}+3.67\%$
test_tensor_to_bytestream_speed[numpy] 2.5350μs 2.5228μs 396.3775 KOps/s 407.6290 KOps/s $\color{#d91a1a}-2.76\%$
test_tensor_to_bytestream_speed[safetensors] 38.4013μs 37.4170μs 26.7258 KOps/s 26.4363 KOps/s $\color{#35bf28}+1.10\%$
test_simple 0.8159s 0.8050s 1.2422 Ops/s 1.2233 Ops/s $\color{#35bf28}+1.55\%$
test_transformed 1.4178s 1.4004s 0.7141 Ops/s 0.7079 Ops/s $\color{#35bf28}+0.88\%$
test_serial 2.3380s 2.3324s 0.4287 Ops/s 0.4238 Ops/s $\color{#35bf28}+1.17\%$
test_parallel 1.9186s 1.8541s 0.5394 Ops/s 0.5459 Ops/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[True-True-True-True-True] 0.2405ms 42.6652μs 23.4383 KOps/s 23.6662 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[True-True-True-True-False] 43.4410μs 23.1734μs 43.1530 KOps/s 43.7711 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[True-True-True-False-True] 53.9910μs 23.9510μs 41.7520 KOps/s 41.3991 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[True-True-True-False-False] 43.4200μs 12.8826μs 77.6241 KOps/s 78.6405 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-True-False-True-True] 93.9320μs 45.5505μs 21.9537 KOps/s 22.6274 KOps/s $\color{#d91a1a}-2.98\%$
test_step_mdp_speed[True-True-False-True-False] 55.7010μs 25.9906μs 38.4755 KOps/s 40.1029 KOps/s $\color{#d91a1a}-4.06\%$
test_step_mdp_speed[True-True-False-False-True] 62.0010μs 27.0411μs 36.9807 KOps/s 38.9001 KOps/s $\color{#d91a1a}-4.93\%$
test_step_mdp_speed[True-True-False-False-False] 39.7100μs 15.5282μs 64.3989 KOps/s 65.7513 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-False-True-True-True] 92.9120μs 46.8787μs 21.3317 KOps/s 21.5121 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[True-False-True-True-False] 59.0810μs 28.0440μs 35.6582 KOps/s 35.3078 KOps/s $\color{#35bf28}+0.99\%$
test_step_mdp_speed[True-False-True-False-True] 50.5210μs 26.3241μs 37.9880 KOps/s 38.8483 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[True-False-True-False-False] 47.1500μs 15.3711μs 65.0570 KOps/s 65.6824 KOps/s $\color{#d91a1a}-0.95\%$
test_step_mdp_speed[True-False-False-True-True] 0.1081ms 49.6751μs 20.1308 KOps/s 20.4770 KOps/s $\color{#d91a1a}-1.69\%$
test_step_mdp_speed[True-False-False-True-False] 80.5620μs 30.6268μs 32.6511 KOps/s 32.6996 KOps/s $\color{#d91a1a}-0.15\%$
test_step_mdp_speed[True-False-False-False-True] 64.7710μs 28.4440μs 35.1567 KOps/s 35.1510 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[True-False-False-False-False] 47.1910μs 17.7238μs 56.4212 KOps/s 55.6720 KOps/s $\color{#35bf28}+1.35\%$
test_step_mdp_speed[False-True-True-True-True] 80.2610μs 46.9304μs 21.3082 KOps/s 21.3538 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-True-True-True-False] 57.4710μs 28.2787μs 35.3623 KOps/s 35.4964 KOps/s $\color{#d91a1a}-0.38\%$
test_step_mdp_speed[False-True-True-False-True] 2.4382ms 30.8866μs 32.3765 KOps/s 33.7529 KOps/s $\color{#d91a1a}-4.08\%$
test_step_mdp_speed[False-True-True-False-False] 58.9710μs 17.0091μs 58.7919 KOps/s 59.0824 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[False-True-False-True-True] 88.8010μs 50.3180μs 19.8736 KOps/s 20.6255 KOps/s $\color{#d91a1a}-3.65\%$
test_step_mdp_speed[False-True-False-True-False] 57.3310μs 30.4860μs 32.8019 KOps/s 33.2495 KOps/s $\color{#d91a1a}-1.35\%$
test_step_mdp_speed[False-True-False-False-True] 56.8810μs 31.9479μs 31.3010 KOps/s 31.2330 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[False-True-False-False-False] 55.3710μs 19.5731μs 51.0907 KOps/s 51.7866 KOps/s $\color{#d91a1a}-1.34\%$
test_step_mdp_speed[False-False-True-True-True] 82.3210μs 51.9445μs 19.2513 KOps/s 19.3672 KOps/s $\color{#d91a1a}-0.60\%$
test_step_mdp_speed[False-False-True-True-False] 58.7410μs 33.3013μs 30.0289 KOps/s 30.5781 KOps/s $\color{#d91a1a}-1.80\%$
test_step_mdp_speed[False-False-True-False-True] 73.5320μs 32.3517μs 30.9103 KOps/s 31.3128 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[False-False-True-False-False] 46.8610μs 19.7817μs 50.5518 KOps/s 51.6601 KOps/s $\color{#d91a1a}-2.15\%$
test_step_mdp_speed[False-False-False-True-True] 88.1720μs 53.1943μs 18.7990 KOps/s 18.6970 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[False-False-False-True-False] 61.4510μs 35.5049μs 28.1651 KOps/s 28.1255 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[False-False-False-False-True] 80.0210μs 33.7517μs 29.6281 KOps/s 29.5959 KOps/s $\color{#35bf28}+0.11\%$
test_step_mdp_speed[False-False-False-False-False] 86.2610μs 21.7087μs 46.0645 KOps/s 45.5640 KOps/s $\color{#35bf28}+1.10\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7557s 0.7454s 1.3415 Ops/s 1.3360 Ops/s $\color{#35bf28}+0.41\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7370s 0.6345s 1.5760 Ops/s 1.6295 Ops/s $\color{#d91a1a}-3.29\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7874s 1.6882s 0.5923 Ops/s 0.6060 Ops/s $\color{#d91a1a}-2.25\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5574s 1.4656s 0.6823 Ops/s 0.6988 Ops/s $\color{#d91a1a}-2.35\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0297s 1.9301s 0.5181 Ops/s 0.5202 Ops/s $\color{#d91a1a}-0.39\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7586s 1.6778s 0.5960 Ops/s 0.5963 Ops/s $\color{#d91a1a}-0.04\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7772s 4.6279s 0.2161 Ops/s 0.2160 Ops/s $\color{#35bf28}+0.05\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5482s 4.4649s 0.2240 Ops/s 0.2256 Ops/s $\color{#d91a1a}-0.72\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9563s 1.8831s 0.5310 Ops/s 0.5330 Ops/s $\color{#d91a1a}-0.36\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6996s 1.5996s 0.6251 Ops/s 0.6240 Ops/s $\color{#35bf28}+0.19\%$
test_values[generalized_advantage_estimate-True-True] 21.3194ms 20.8821ms 47.8879 Ops/s 48.7839 Ops/s $\color{#d91a1a}-1.84\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1281s 3.4900ms 286.5335 Ops/s 281.6171 Ops/s $\color{#35bf28}+1.75\%$
test_values[td0_return_estimate-False-False] 0.1100ms 83.7689μs 11.9376 KOps/s 12.1238 KOps/s $\color{#d91a1a}-1.54\%$
test_values[td1_return_estimate-False-False] 52.5008ms 49.9453ms 20.0219 Ops/s 20.7409 Ops/s $\color{#d91a1a}-3.47\%$
test_values[vec_td1_return_estimate-False-False] 1.3698ms 1.1004ms 908.7562 Ops/s 914.8068 Ops/s $\color{#d91a1a}-0.66\%$
test_values[td_lambda_return_estimate-True-False] 85.7212ms 81.9522ms 12.2022 Ops/s 12.6101 Ops/s $\color{#d91a1a}-3.23\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3574ms 1.0960ms 912.3918 Ops/s 915.1423 Ops/s $\color{#d91a1a}-0.30\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.2060ms 21.0217ms 47.5698 Ops/s 49.3973 Ops/s $\color{#d91a1a}-3.70\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0822ms 0.7678ms 1.3025 KOps/s 1.3102 KOps/s $\color{#d91a1a}-0.59\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7835ms 0.6871ms 1.4553 KOps/s 1.4163 KOps/s $\color{#35bf28}+2.75\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5506ms 1.4987ms 667.2422 Ops/s 664.1985 Ops/s $\color{#35bf28}+0.46\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7925ms 0.7042ms 1.4200 KOps/s 1.4357 KOps/s $\color{#d91a1a}-1.10\%$
test_dqn_speed[False-None] 1.7665ms 1.6170ms 618.4409 Ops/s 627.8133 Ops/s $\color{#d91a1a}-1.49\%$
test_dqn_speed[False-backward] 2.3488ms 2.2581ms 442.8540 Ops/s 445.6435 Ops/s $\color{#d91a1a}-0.63\%$
test_dqn_speed[True-None] 0.6826ms 0.5931ms 1.6861 KOps/s 1.6944 KOps/s $\color{#d91a1a}-0.49\%$
test_dqn_speed[True-backward] 1.2862ms 1.2332ms 810.9074 Ops/s 796.9822 Ops/s $\color{#35bf28}+1.75\%$
test_dqn_speed[reduce-overhead-None] 0.6671ms 0.6050ms 1.6528 KOps/s 1.5847 KOps/s $\color{#35bf28}+4.29\%$
test_ddpg_speed[False-None] 3.4684ms 3.0885ms 323.7845 Ops/s 332.0451 Ops/s $\color{#d91a1a}-2.49\%$
test_ddpg_speed[False-backward] 4.8804ms 4.4635ms 224.0413 Ops/s 225.1448 Ops/s $\color{#d91a1a}-0.49\%$
test_ddpg_speed[True-None] 1.5220ms 1.3802ms 724.5573 Ops/s 732.2351 Ops/s $\color{#d91a1a}-1.05\%$
test_ddpg_speed[True-backward] 2.5579ms 2.5154ms 397.5493 Ops/s 390.0002 Ops/s $\color{#35bf28}+1.94\%$
test_ddpg_speed[reduce-overhead-None] 1.4475ms 1.3781ms 725.6331 Ops/s 713.9245 Ops/s $\color{#35bf28}+1.64\%$
test_sac_speed[False-None] 8.9701ms 8.5860ms 116.4692 Ops/s 117.4381 Ops/s $\color{#d91a1a}-0.82\%$
test_sac_speed[False-backward] 12.3433ms 11.8999ms 84.0343 Ops/s 84.4302 Ops/s $\color{#d91a1a}-0.47\%$
test_sac_speed[True-None] 2.0765ms 1.8800ms 531.9170 Ops/s 533.2166 Ops/s $\color{#d91a1a}-0.24\%$
test_sac_speed[True-backward] 3.7303ms 3.6613ms 273.1240 Ops/s 271.6886 Ops/s $\color{#35bf28}+0.53\%$
test_sac_speed[reduce-overhead-None] 16.7088ms 10.1502ms 98.5207 Ops/s 98.3303 Ops/s $\color{#35bf28}+0.19\%$
test_redq_deprec_speed[False-None] 10.5161ms 9.6137ms 104.0180 Ops/s 103.6348 Ops/s $\color{#35bf28}+0.37\%$
test_redq_deprec_speed[False-backward] 14.1706ms 13.0716ms 76.5017 Ops/s 76.6229 Ops/s $\color{#d91a1a}-0.16\%$
test_redq_deprec_speed[True-None] 2.6619ms 2.5728ms 388.6848 Ops/s 383.0442 Ops/s $\color{#35bf28}+1.47\%$
test_redq_deprec_speed[True-backward] 4.5713ms 4.2715ms 234.1124 Ops/s 233.6874 Ops/s $\color{#35bf28}+0.18\%$
test_redq_deprec_speed[reduce-overhead-None] 14.6136ms 9.6570ms 103.5523 Ops/s 103.6972 Ops/s $\color{#d91a1a}-0.14\%$
test_td3_speed[False-None] 8.5438ms 8.4237ms 118.7121 Ops/s 119.0333 Ops/s $\color{#d91a1a}-0.27\%$
test_td3_speed[False-backward] 11.5493ms 11.1071ms 90.0326 Ops/s 90.9403 Ops/s $\color{#d91a1a}-1.00\%$
test_td3_speed[True-None] 1.7362ms 1.6621ms 601.6371 Ops/s 576.2446 Ops/s $\color{#35bf28}+4.41\%$
test_td3_speed[True-backward] 3.2502ms 3.1661ms 315.8418 Ops/s 314.6931 Ops/s $\color{#35bf28}+0.37\%$
test_td3_speed[reduce-overhead-None] 84.2311ms 25.7979ms 38.7629 Ops/s 37.8999 Ops/s $\color{#35bf28}+2.28\%$
test_cql_speed[False-None] 18.2195ms 17.9318ms 55.7668 Ops/s 54.3315 Ops/s $\color{#35bf28}+2.64\%$
test_cql_speed[False-backward] 23.7387ms 23.2958ms 42.9262 Ops/s 42.3853 Ops/s $\color{#35bf28}+1.28\%$
test_cql_speed[True-None] 3.5404ms 3.3376ms 299.6126 Ops/s 298.9628 Ops/s $\color{#35bf28}+0.22\%$
test_cql_speed[True-backward] 5.8488ms 5.4646ms 182.9975 Ops/s 181.7242 Ops/s $\color{#35bf28}+0.70\%$
test_cql_speed[reduce-overhead-None] 19.0456ms 11.9025ms 84.0159 Ops/s 82.6856 Ops/s $\color{#35bf28}+1.61\%$
test_a2c_speed[False-None] 3.4710ms 3.3847ms 295.4498 Ops/s 296.1130 Ops/s $\color{#d91a1a}-0.22\%$
test_a2c_speed[False-backward] 6.8545ms 6.3635ms 157.1457 Ops/s 152.3281 Ops/s $\color{#35bf28}+3.16\%$
test_a2c_speed[True-None] 1.6089ms 1.4033ms 712.6160 Ops/s 709.7726 Ops/s $\color{#35bf28}+0.40\%$
test_a2c_speed[True-backward] 3.1286ms 3.0325ms 329.7627 Ops/s 308.2206 Ops/s $\textbf{\color{#35bf28}+6.99\%}$
test_a2c_speed[reduce-overhead-None] 1.1567ms 1.0509ms 951.6033 Ops/s 957.1123 Ops/s $\color{#d91a1a}-0.58\%$
test_ppo_speed[False-None] 4.4034ms 4.0615ms 246.2153 Ops/s 250.1731 Ops/s $\color{#d91a1a}-1.58\%$
test_ppo_speed[False-backward] 7.7548ms 7.3245ms 136.5281 Ops/s 134.7924 Ops/s $\color{#35bf28}+1.29\%$
test_ppo_speed[True-None] 1.9517ms 1.5227ms 656.7477 Ops/s 654.3025 Ops/s $\color{#35bf28}+0.37\%$
test_ppo_speed[True-backward] 3.2274ms 3.1833ms 314.1397 Ops/s 295.7119 Ops/s $\textbf{\color{#35bf28}+6.23\%}$
test_ppo_speed[reduce-overhead-None] 1.2804ms 1.1105ms 900.4734 Ops/s 890.6279 Ops/s $\color{#35bf28}+1.11\%$
test_reinforce_speed[False-None] 2.7494ms 2.4552ms 407.2998 Ops/s 417.0241 Ops/s $\color{#d91a1a}-2.33\%$
test_reinforce_speed[False-backward] 4.0659ms 3.5035ms 285.4304 Ops/s 292.4985 Ops/s $\color{#d91a1a}-2.42\%$
test_reinforce_speed[True-None] 1.5634ms 1.3867ms 721.1116 Ops/s 733.2089 Ops/s $\color{#d91a1a}-1.65\%$
test_reinforce_speed[True-backward] 3.5980ms 3.0796ms 324.7163 Ops/s 324.3300 Ops/s $\color{#35bf28}+0.12\%$
test_reinforce_speed[reduce-overhead-None] 0.6670s 10.5176ms 95.0791 Ops/s 112.8671 Ops/s $\textbf{\color{#d91a1a}-15.76\%}$
test_iql_speed[False-None] 10.4424ms 9.8705ms 101.3121 Ops/s 102.1841 Ops/s $\color{#d91a1a}-0.85\%$
test_iql_speed[False-backward] 14.1699ms 13.7012ms 72.9862 Ops/s 73.9143 Ops/s $\color{#d91a1a}-1.26\%$
test_iql_speed[True-None] 2.4800ms 2.2733ms 439.8882 Ops/s 433.5256 Ops/s $\color{#35bf28}+1.47\%$
test_iql_speed[True-backward] 5.3056ms 4.8360ms 206.7835 Ops/s 199.5077 Ops/s $\color{#35bf28}+3.65\%$
test_iql_speed[reduce-overhead-None] 16.3138ms 10.0324ms 99.6772 Ops/s 100.1626 Ops/s $\color{#d91a1a}-0.48\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1486ms 5.9330ms 168.5500 Ops/s 166.2497 Ops/s $\color{#35bf28}+1.38\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8002ms 0.3789ms 2.6389 KOps/s 2.6215 KOps/s $\color{#35bf28}+0.66\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7305ms 0.3627ms 2.7569 KOps/s 2.6787 KOps/s $\color{#35bf28}+2.92\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9668ms 5.7642ms 173.4842 Ops/s 172.6990 Ops/s $\color{#35bf28}+0.45\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.3081ms 0.3732ms 2.6796 KOps/s 2.8816 KOps/s $\textbf{\color{#d91a1a}-7.01\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6503ms 0.3539ms 2.8260 KOps/s 2.9859 KOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5872ms 1.3100ms 763.3433 Ops/s 678.8185 Ops/s $\textbf{\color{#35bf28}+12.45\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5260ms 1.2263ms 815.4289 Ops/s 758.8821 Ops/s $\textbf{\color{#35bf28}+7.45\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0264ms 5.9003ms 169.4823 Ops/s 167.7356 Ops/s $\color{#35bf28}+1.04\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9233ms 0.4922ms 2.0315 KOps/s 2.1371 KOps/s $\color{#d91a1a}-4.94\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6638ms 0.4379ms 2.2836 KOps/s 1.9808 KOps/s $\textbf{\color{#35bf28}+15.29\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9611ms 5.7985ms 172.4585 Ops/s 171.1523 Ops/s $\color{#35bf28}+0.76\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0442ms 0.3416ms 2.9274 KOps/s 2.8018 KOps/s $\color{#35bf28}+4.48\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5425ms 0.3149ms 3.1754 KOps/s 2.7668 KOps/s $\textbf{\color{#35bf28}+14.77\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0027ms 5.7356ms 174.3487 Ops/s 174.5731 Ops/s $\color{#d91a1a}-0.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1850ms 0.3974ms 2.5162 KOps/s 3.1921 KOps/s $\textbf{\color{#d91a1a}-21.17\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6101ms 0.3895ms 2.5673 KOps/s 3.2238 KOps/s $\textbf{\color{#d91a1a}-20.36\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1479ms 5.9713ms 167.4684 Ops/s 167.5928 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9884ms 0.4648ms 2.1515 KOps/s 1.8650 KOps/s $\textbf{\color{#35bf28}+15.36\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6903ms 0.4784ms 2.0905 KOps/s 1.9527 KOps/s $\textbf{\color{#35bf28}+7.06\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.5426ms 5.0581ms 197.7026 Ops/s 36.0925 Ops/s $\textbf{\color{#35bf28}+447.77\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.2154ms 2.0833ms 480.0180 Ops/s 511.3197 Ops/s $\textbf{\color{#d91a1a}-6.12\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.5302ms 1.0379ms 963.4990 Ops/s 727.8069 Ops/s $\textbf{\color{#35bf28}+32.38\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.8783ms 5.0990ms 196.1152 Ops/s 194.9859 Ops/s $\color{#35bf28}+0.58\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.0978ms 1.9056ms 524.7657 Ops/s 503.6732 Ops/s $\color{#35bf28}+4.19\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.4125ms 1.0051ms 994.9074 Ops/s 716.3584 Ops/s $\textbf{\color{#35bf28}+38.88\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.7128s 19.4882ms 51.3131 Ops/s 187.5825 Ops/s $\textbf{\color{#d91a1a}-72.65\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 12.7513ms 2.2342ms 447.5923 Ops/s 464.2981 Ops/s $\color{#d91a1a}-3.60\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.4684ms 1.1779ms 848.9349 Ops/s 841.0059 Ops/s $\color{#35bf28}+0.94\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 41.4882ms 39.5045ms 25.3136 Ops/s 25.9579 Ops/s $\color{#d91a1a}-2.48\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9928ms 18.5344ms 53.9538 Ops/s 55.1862 Ops/s $\color{#d91a1a}-2.23\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.2446ms 40.3051ms 24.8108 Ops/s 24.9445 Ops/s $\color{#d91a1a}-0.54\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.2331ms 18.6531ms 53.6105 Ops/s 53.5958 Ops/s $\color{#35bf28}+0.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 43.5454ms 41.6466ms 24.0115 Ops/s 23.8775 Ops/s $\color{#35bf28}+0.56\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.2802ms 20.1447ms 49.6409 Ops/s 48.9858 Ops/s $\color{#35bf28}+1.34\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9379ms 0.2321ms 4.3092 KOps/s 4.3137 KOps/s $\color{#d91a1a}-0.10\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.9009ms 1.4357ms 696.5212 Ops/s 710.2852 Ops/s $\color{#d91a1a}-1.94\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6906ms 2.4771ms 403.6923 Ops/s 412.0793 Ops/s $\color{#d91a1a}-2.04\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.4146ms 3.1563ms 316.8250 Ops/s 340.6237 Ops/s $\textbf{\color{#d91a1a}-6.99\%}$
test_storage_write_contiguous[50-img_shape0-small] 0.3370ms 0.1670ms 5.9864 KOps/s 6.1036 KOps/s $\color{#d91a1a}-1.92\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3726ms 0.2319ms 4.3129 KOps/s 3.5055 KOps/s $\textbf{\color{#35bf28}+23.03\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1013ms 1.9142ms 522.4037 Ops/s 528.8546 Ops/s $\color{#d91a1a}-1.22\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6678ms 1.4590ms 685.3954 Ops/s 735.5493 Ops/s $\textbf{\color{#d91a1a}-6.82\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.3330ms 1.1538ms 866.7148 Ops/s 858.0837 Ops/s $\color{#35bf28}+1.01\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.6857ms 3.7817ms 264.4336 Ops/s 270.7174 Ops/s $\color{#d91a1a}-2.32\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.3811ms 5.8600ms 170.6473 Ops/s 164.1216 Ops/s $\color{#35bf28}+3.98\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.6926ms 7.1713ms 139.4441 Ops/s 137.7755 Ops/s $\color{#35bf28}+1.21\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.7245ms 0.2852ms 3.5069 KOps/s 3.5808 KOps/s $\color{#d91a1a}-2.06\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.9999ms 1.5466ms 646.5817 Ops/s 640.6801 Ops/s $\color{#35bf28}+0.92\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 3.0157ms 2.4260ms 412.2053 Ops/s 382.2278 Ops/s $\textbf{\color{#35bf28}+7.84\%}$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.6543ms 3.2764ms 305.2103 Ops/s 308.2788 Ops/s $\color{#d91a1a}-1.00\%$
test_collector_without_rb[100-img_shape0-atari] 34.4610ms 33.7208ms 29.6553 Ops/s 29.9863 Ops/s $\color{#d91a1a}-1.10\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.7161ms 65.5094ms 15.2650 Ops/s 15.2307 Ops/s $\color{#35bf28}+0.23\%$
test_collector_with_rb[100-img_shape0-atari] 38.8090ms 38.0419ms 26.2868 Ops/s 26.0193 Ops/s $\color{#35bf28}+1.03\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.5724ms 74.4618ms 13.4297 Ops/s 13.4693 Ops/s $\color{#d91a1a}-0.29\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.8173ms 58.3348ms 17.1424 Ops/s 17.8625 Ops/s $\color{#d91a1a}-4.03\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1169s 0.1156s 8.6510 Ops/s 8.9043 Ops/s $\color{#d91a1a}-2.84\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 60.5829ms 59.8044ms 16.7212 Ops/s 17.0941 Ops/s $\color{#d91a1a}-2.18\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1203s 0.1188s 8.4167 Ops/s 8.6533 Ops/s $\color{#d91a1a}-2.73\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance Performance issue or suggestion for improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant