Skip to content

[Performance] Add out= parameter to _StepMDP for output buffer reuse#3561

Open
vmoens wants to merge 1 commit intogh/vmoens/242/basefrom
gh/vmoens/242/head
Open

[Performance] Add out= parameter to _StepMDP for output buffer reuse#3561
vmoens wants to merge 1 commit intogh/vmoens/242/basefrom
gh/vmoens/242/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 23, 2026

Stack from ghstack (oldest at bottom):

_StepMDP.call now accepts an optional out parameter. When provided,
the output TensorDict is reused instead of allocating a new one each call.
This enables callers (collectors, rollout loops) to pre-allocate a buffer
and avoid per-step TensorDict creation overhead.

Also fixes _exclude return type annotation and ensures it returns the
pre-provided out buffer even when no new keys are set.

Made-with: Cursor

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3561

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 25ab629 with merge base 0a1aea6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.8752μs 78.6439μs 12.7155 KOps/s 12.7280 KOps/s $\color{#d91a1a}-0.10\%$
test_tensor_to_bytestream_speed[torch.save] 0.1419ms 0.1410ms 7.0928 KOps/s 7.2902 KOps/s $\color{#d91a1a}-2.71\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1141s 0.1135s 8.8134 Ops/s 8.9491 Ops/s $\color{#d91a1a}-1.52\%$
test_tensor_to_bytestream_speed[numpy] 2.5025μs 2.4899μs 401.6302 KOps/s 410.7562 KOps/s $\color{#d91a1a}-2.22\%$
test_tensor_to_bytestream_speed[safetensors] 38.7851μs 38.5081μs 25.9686 KOps/s 27.6605 KOps/s $\textbf{\color{#d91a1a}-6.12\%}$
test_simple 0.5435s 0.5400s 1.8518 Ops/s 1.7530 Ops/s $\textbf{\color{#35bf28}+5.64\%}$
test_transformed 1.0840s 1.0747s 0.9305 Ops/s 0.9127 Ops/s $\color{#35bf28}+1.95\%$
test_serial 1.6791s 1.6669s 0.5999 Ops/s 0.5937 Ops/s $\color{#35bf28}+1.04\%$
test_parallel 1.1383s 1.0361s 0.9652 Ops/s 0.9653 Ops/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[True-True-True-True-True] 0.2459ms 41.9468μs 23.8397 KOps/s 24.3065 KOps/s $\color{#d91a1a}-1.92\%$
test_step_mdp_speed[True-True-True-True-False] 54.8710μs 22.8858μs 43.6952 KOps/s 44.2194 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[True-True-True-False-True] 56.5600μs 23.4365μs 42.6685 KOps/s 42.9172 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[True-True-True-False-False] 41.4800μs 12.7016μs 78.7305 KOps/s 79.7327 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[True-True-False-True-True] 73.4210μs 44.1110μs 22.6701 KOps/s 22.6301 KOps/s $\color{#35bf28}+0.18\%$
test_step_mdp_speed[True-True-False-True-False] 54.7710μs 24.9724μs 40.0442 KOps/s 39.6918 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[True-True-False-False-True] 52.9800μs 25.9803μs 38.4908 KOps/s 39.0207 KOps/s $\color{#d91a1a}-1.36\%$
test_step_mdp_speed[True-True-False-False-False] 39.9100μs 15.2963μs 65.3752 KOps/s 65.3078 KOps/s $\color{#35bf28}+0.10\%$
test_step_mdp_speed[True-False-True-True-True] 89.2810μs 46.6604μs 21.4315 KOps/s 21.4295 KOps/s $+0.01\%$
test_step_mdp_speed[True-False-True-True-False] 58.6000μs 27.4822μs 36.3872 KOps/s 35.9326 KOps/s $\color{#35bf28}+1.27\%$
test_step_mdp_speed[True-False-True-False-True] 55.2610μs 26.6663μs 37.5006 KOps/s 39.1142 KOps/s $\color{#d91a1a}-4.13\%$
test_step_mdp_speed[True-False-True-False-False] 42.9410μs 15.1011μs 66.2205 KOps/s 65.7985 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-False-False-True-True] 77.7110μs 48.9897μs 20.4124 KOps/s 20.4399 KOps/s $\color{#d91a1a}-0.13\%$
test_step_mdp_speed[True-False-False-True-False] 60.6010μs 29.8330μs 33.5200 KOps/s 33.0023 KOps/s $\color{#35bf28}+1.57\%$
test_step_mdp_speed[True-False-False-False-True] 60.0710μs 28.6909μs 34.8543 KOps/s 35.3569 KOps/s $\color{#d91a1a}-1.42\%$
test_step_mdp_speed[True-False-False-False-False] 43.7910μs 17.7795μs 56.2444 KOps/s 57.6078 KOps/s $\color{#d91a1a}-2.37\%$
test_step_mdp_speed[False-True-True-True-True] 0.1007ms 46.2003μs 21.6449 KOps/s 21.6965 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-True-True-True-False] 61.8100μs 27.5213μs 36.3354 KOps/s 36.8202 KOps/s $\color{#d91a1a}-1.32\%$
test_step_mdp_speed[False-True-True-False-True] 2.4307ms 30.0212μs 33.3098 KOps/s 33.5609 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-True-True-False-False] 49.4610μs 16.8811μs 59.2378 KOps/s 59.7867 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[False-True-False-True-True] 83.4510μs 48.6854μs 20.5400 KOps/s 20.3904 KOps/s $\color{#35bf28}+0.73\%$
test_step_mdp_speed[False-True-False-True-False] 59.2410μs 30.0120μs 33.3200 KOps/s 32.8179 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[False-True-False-False-True] 69.1910μs 31.7131μs 31.5328 KOps/s 31.6589 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[False-True-False-False-False] 58.7010μs 19.3797μs 51.6005 KOps/s 51.5515 KOps/s $\color{#35bf28}+0.10\%$
test_step_mdp_speed[False-False-True-True-True] 91.1620μs 51.6341μs 19.3670 KOps/s 19.8596 KOps/s $\color{#d91a1a}-2.48\%$
test_step_mdp_speed[False-False-True-True-False] 65.9610μs 33.2897μs 30.0393 KOps/s 30.9154 KOps/s $\color{#d91a1a}-2.83\%$
test_step_mdp_speed[False-False-True-False-True] 58.9300μs 32.1524μs 31.1019 KOps/s 31.3379 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-False-True-False-False] 57.1010μs 19.2172μs 52.0367 KOps/s 51.3743 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[False-False-False-True-True] 87.3010μs 53.6802μs 18.6288 KOps/s 19.0522 KOps/s $\color{#d91a1a}-2.22\%$
test_step_mdp_speed[False-False-False-True-False] 65.5210μs 36.0162μs 27.7653 KOps/s 28.7360 KOps/s $\color{#d91a1a}-3.38\%$
test_step_mdp_speed[False-False-False-False-True] 67.7110μs 33.6136μs 29.7499 KOps/s 29.9209 KOps/s $\color{#d91a1a}-0.57\%$
test_step_mdp_speed[False-False-False-False-False] 52.9000μs 21.5252μs 46.4572 KOps/s 46.7138 KOps/s $\color{#d91a1a}-0.55\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7216s 0.7129s 1.4027 Ops/s 1.3649 Ops/s $\color{#35bf28}+2.77\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6969s 0.5957s 1.6787 Ops/s 1.6607 Ops/s $\color{#35bf28}+1.08\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6996s 1.6204s 0.6171 Ops/s 0.6156 Ops/s $\color{#35bf28}+0.26\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4748s 1.3882s 0.7204 Ops/s 0.7108 Ops/s $\color{#35bf28}+1.35\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9536s 1.8621s 0.5370 Ops/s 0.5311 Ops/s $\color{#35bf28}+1.11\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7287s 1.6447s 0.6080 Ops/s 0.6047 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6457s 4.5403s 0.2202 Ops/s 0.2171 Ops/s $\color{#35bf28}+1.44\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6478s 4.4614s 0.2241 Ops/s 0.2261 Ops/s $\color{#d91a1a}-0.87\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9151s 1.8405s 0.5433 Ops/s 0.5315 Ops/s $\color{#35bf28}+2.23\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7311s 1.6174s 0.6183 Ops/s 0.6369 Ops/s $\color{#d91a1a}-2.93\%$
test_values[generalized_advantage_estimate-True-True] 11.0760ms 10.1437ms 98.5831 Ops/s 101.0909 Ops/s $\color{#d91a1a}-2.48\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.7675ms 17.9777ms 55.6244 Ops/s 56.0261 Ops/s $\color{#d91a1a}-0.72\%$
test_values[td0_return_estimate-False-False] 0.2053ms 0.1396ms 7.1650 KOps/s 8.0350 KOps/s $\textbf{\color{#d91a1a}-10.83\%}$
test_values[td1_return_estimate-False-False] 27.1020ms 26.7452ms 37.3899 Ops/s 37.5606 Ops/s $\color{#d91a1a}-0.45\%$
test_values[vec_td1_return_estimate-False-False] 18.5596ms 18.0316ms 55.4582 Ops/s 55.4986 Ops/s $\color{#d91a1a}-0.07\%$
test_values[td_lambda_return_estimate-True-False] 40.3722ms 39.6507ms 25.2202 Ops/s 24.9965 Ops/s $\color{#35bf28}+0.90\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.6595ms 17.9621ms 55.6729 Ops/s 56.0655 Ops/s $\color{#d91a1a}-0.70\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.9240ms 8.7899ms 113.7667 Ops/s 114.0728 Ops/s $\color{#d91a1a}-0.27\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7036ms 1.5068ms 663.6583 Ops/s 646.6056 Ops/s $\color{#35bf28}+2.64\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5188ms 0.4148ms 2.4107 KOps/s 2.3335 KOps/s $\color{#35bf28}+3.31\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 38.1771ms 35.6664ms 28.0376 Ops/s 28.9325 Ops/s $\color{#d91a1a}-3.09\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9103ms 1.7766ms 562.8725 Ops/s 566.7262 Ops/s $\color{#d91a1a}-0.68\%$
test_dqn_speed[False-None] 1.4696ms 1.3750ms 727.2937 Ops/s 720.9457 Ops/s $\color{#35bf28}+0.88\%$
test_dqn_speed[False-backward] 1.9880ms 1.8869ms 529.9633 Ops/s 520.7588 Ops/s $\color{#35bf28}+1.77\%$
test_dqn_speed[True-None] 0.7064ms 0.5542ms 1.8045 KOps/s 1.8333 KOps/s $\color{#d91a1a}-1.57\%$
test_dqn_speed[True-backward] 1.2144ms 0.9930ms 1.0070 KOps/s 984.3845 Ops/s $\color{#35bf28}+2.30\%$
test_dqn_speed[reduce-overhead-None] 0.7841ms 0.5311ms 1.8828 KOps/s 1.8531 KOps/s $\color{#35bf28}+1.60\%$
test_ddpg_speed[False-None] 3.3503ms 2.8296ms 353.4102 Ops/s 358.1395 Ops/s $\color{#d91a1a}-1.32\%$
test_ddpg_speed[False-backward] 4.3116ms 4.0729ms 245.5276 Ops/s 247.4715 Ops/s $\color{#d91a1a}-0.79\%$
test_ddpg_speed[True-None] 2.0871ms 1.4278ms 700.3589 Ops/s 697.7319 Ops/s $\color{#35bf28}+0.38\%$
test_ddpg_speed[True-backward] 2.4168ms 2.3752ms 421.0200 Ops/s 406.2668 Ops/s $\color{#35bf28}+3.63\%$
test_ddpg_speed[reduce-overhead-None] 1.7805ms 1.3910ms 718.9057 Ops/s 703.0539 Ops/s $\color{#35bf28}+2.25\%$
test_sac_speed[False-None] 8.6309ms 7.9966ms 125.0528 Ops/s 125.2696 Ops/s $\color{#d91a1a}-0.17\%$
test_sac_speed[False-backward] 11.6983ms 11.1156ms 89.9640 Ops/s 88.7671 Ops/s $\color{#35bf28}+1.35\%$
test_sac_speed[True-None] 2.4560ms 2.1372ms 467.9076 Ops/s 460.1716 Ops/s $\color{#35bf28}+1.68\%$
test_sac_speed[True-backward] 4.0586ms 3.9731ms 251.6936 Ops/s 213.5678 Ops/s $\textbf{\color{#35bf28}+17.85\%}$
test_sac_speed[reduce-overhead-None] 2.5742ms 2.1248ms 470.6312 Ops/s 466.8412 Ops/s $\color{#35bf28}+0.81\%$
test_redq_speed[False-None] 14.4316ms 10.4524ms 95.6720 Ops/s 95.3862 Ops/s $\color{#35bf28}+0.30\%$
test_redq_speed[False-backward] 22.6107ms 17.9135ms 55.8237 Ops/s 55.6466 Ops/s $\color{#35bf28}+0.32\%$
test_redq_speed[True-None] 4.6665ms 4.3196ms 231.5048 Ops/s 220.7558 Ops/s $\color{#35bf28}+4.87\%$
test_redq_speed[reduce-overhead-None] 4.7185ms 4.3252ms 231.2048 Ops/s 231.3630 Ops/s $\color{#d91a1a}-0.07\%$
test_redq_deprec_speed[False-None] 11.9154ms 10.8736ms 91.9659 Ops/s 91.3037 Ops/s $\color{#35bf28}+0.73\%$
test_redq_deprec_speed[False-backward] 16.2317ms 15.5719ms 64.2183 Ops/s 63.9449 Ops/s $\color{#35bf28}+0.43\%$
test_redq_deprec_speed[True-None] 3.9874ms 3.4902ms 286.5157 Ops/s 268.7167 Ops/s $\textbf{\color{#35bf28}+6.62\%}$
test_redq_deprec_speed[True-backward] 7.1781ms 6.9817ms 143.2317 Ops/s 126.2311 Ops/s $\textbf{\color{#35bf28}+13.47\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.6777ms 3.4777ms 287.5458 Ops/s 271.0572 Ops/s $\textbf{\color{#35bf28}+6.08\%}$
test_td3_speed[False-None] 8.0889ms 7.9350ms 126.0240 Ops/s 123.8964 Ops/s $\color{#35bf28}+1.72\%$
test_td3_speed[False-backward] 11.4356ms 10.8401ms 92.2499 Ops/s 91.5463 Ops/s $\color{#35bf28}+0.77\%$
test_td3_speed[True-None] 1.8745ms 1.7799ms 561.8336 Ops/s 556.8255 Ops/s $\color{#35bf28}+0.90\%$
test_td3_speed[True-backward] 3.7202ms 3.5106ms 284.8538 Ops/s 246.1045 Ops/s $\textbf{\color{#35bf28}+15.75\%}$
test_td3_speed[reduce-overhead-None] 1.8615ms 1.7664ms 566.1169 Ops/s 566.9728 Ops/s $\color{#d91a1a}-0.15\%$
test_cql_speed[False-None] 28.4744ms 26.0053ms 38.4538 Ops/s 38.8467 Ops/s $\color{#d91a1a}-1.01\%$
test_cql_speed[False-backward] 35.8859ms 35.0606ms 28.5220 Ops/s 28.6702 Ops/s $\color{#d91a1a}-0.52\%$
test_cql_speed[True-None] 12.4634ms 12.1380ms 82.3858 Ops/s 82.3784 Ops/s $+0.01\%$
test_cql_speed[True-backward] 17.8846ms 17.4900ms 57.1756 Ops/s 57.9686 Ops/s $\color{#d91a1a}-1.37\%$
test_cql_speed[reduce-overhead-None] 12.5748ms 12.0719ms 82.8372 Ops/s 81.6441 Ops/s $\color{#35bf28}+1.46\%$
test_a2c_speed[False-None] 5.6357ms 5.3664ms 186.3436 Ops/s 185.7596 Ops/s $\color{#35bf28}+0.31\%$
test_a2c_speed[False-backward] 12.1337ms 11.6380ms 85.9254 Ops/s 86.5593 Ops/s $\color{#d91a1a}-0.73\%$
test_a2c_speed[True-None] 3.9293ms 3.7302ms 268.0847 Ops/s 266.5511 Ops/s $\color{#35bf28}+0.58\%$
test_a2c_speed[True-backward] 8.6795ms 8.4682ms 118.0893 Ops/s 116.7657 Ops/s $\color{#35bf28}+1.13\%$
test_a2c_speed[reduce-overhead-None] 4.1514ms 3.7209ms 268.7546 Ops/s 264.5665 Ops/s $\color{#35bf28}+1.58\%$
test_ppo_speed[False-None] 6.3477ms 5.8902ms 169.7747 Ops/s 165.8569 Ops/s $\color{#35bf28}+2.36\%$
test_ppo_speed[False-backward] 12.8943ms 12.4744ms 80.1641 Ops/s 79.7430 Ops/s $\color{#35bf28}+0.53\%$
test_ppo_speed[True-None] 3.8156ms 3.6452ms 274.3364 Ops/s 271.7299 Ops/s $\color{#35bf28}+0.96\%$
test_ppo_speed[True-backward] 8.5729ms 8.3817ms 119.3076 Ops/s 117.3006 Ops/s $\color{#35bf28}+1.71\%$
test_ppo_speed[reduce-overhead-None] 4.1404ms 3.6499ms 273.9809 Ops/s 270.7897 Ops/s $\color{#35bf28}+1.18\%$
test_reinforce_speed[False-None] 4.7890ms 4.5163ms 221.4201 Ops/s 217.9198 Ops/s $\color{#35bf28}+1.61\%$
test_reinforce_speed[False-backward] 7.4699ms 7.2801ms 137.3614 Ops/s 134.6479 Ops/s $\color{#35bf28}+2.02\%$
test_reinforce_speed[True-None] 3.0252ms 2.8708ms 348.3384 Ops/s 336.3573 Ops/s $\color{#35bf28}+3.56\%$
test_reinforce_speed[True-backward] 7.8749ms 7.6118ms 131.3750 Ops/s 128.7701 Ops/s $\color{#35bf28}+2.02\%$
test_reinforce_speed[reduce-overhead-None] 3.1436ms 2.8457ms 351.4055 Ops/s 348.0734 Ops/s $\color{#35bf28}+0.96\%$
test_iql_speed[False-None] 20.6643ms 19.7728ms 50.5746 Ops/s 49.2549 Ops/s $\color{#35bf28}+2.68\%$
test_iql_speed[False-backward] 30.5998ms 29.9584ms 33.3796 Ops/s 32.5966 Ops/s $\color{#35bf28}+2.40\%$
test_iql_speed[True-None] 8.9292ms 8.3732ms 119.4282 Ops/s 118.4554 Ops/s $\color{#35bf28}+0.82\%$
test_iql_speed[True-backward] 16.7099ms 16.3787ms 61.0549 Ops/s 61.1542 Ops/s $\color{#d91a1a}-0.16\%$
test_iql_speed[reduce-overhead-None] 8.8921ms 8.4168ms 118.8101 Ops/s 117.3418 Ops/s $\color{#35bf28}+1.25\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0307ms 5.8831ms 169.9796 Ops/s 170.6088 Ops/s $\color{#d91a1a}-0.37\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.0493ms 0.4098ms 2.4403 KOps/s 3.3689 KOps/s $\textbf{\color{#d91a1a}-27.56\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7738ms 0.3894ms 2.5680 KOps/s 3.6271 KOps/s $\textbf{\color{#d91a1a}-29.20\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8449ms 5.6448ms 177.1557 Ops/s 176.0157 Ops/s $\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.4995ms 0.3959ms 2.5257 KOps/s 3.0575 KOps/s $\textbf{\color{#d91a1a}-17.39\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6002ms 0.3812ms 2.6231 KOps/s 3.0626 KOps/s $\textbf{\color{#d91a1a}-14.35\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 7.6662ms 1.5053ms 664.3338 Ops/s 781.1236 Ops/s $\textbf{\color{#d91a1a}-14.95\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7018ms 1.4072ms 710.6497 Ops/s 844.2770 Ops/s $\textbf{\color{#d91a1a}-15.83\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.9375ms 5.8552ms 170.7889 Ops/s 171.3029 Ops/s $\color{#d91a1a}-0.30\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2297ms 0.5518ms 1.8123 KOps/s 2.0994 KOps/s $\textbf{\color{#d91a1a}-13.67\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8661ms 0.5323ms 1.8788 KOps/s 2.2872 KOps/s $\textbf{\color{#d91a1a}-17.86\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0062ms 5.6313ms 177.5778 Ops/s 175.2823 Ops/s $\color{#35bf28}+1.31\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0989ms 0.2811ms 3.5580 KOps/s 2.8748 KOps/s $\textbf{\color{#35bf28}+23.77\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6849ms 0.2618ms 3.8197 KOps/s 3.0140 KOps/s $\textbf{\color{#35bf28}+26.73\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8003ms 5.5862ms 179.0127 Ops/s 177.5743 Ops/s $\color{#35bf28}+0.81\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8045ms 0.2805ms 3.5650 KOps/s 3.1086 KOps/s $\textbf{\color{#35bf28}+14.68\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4753ms 0.2600ms 3.8460 KOps/s 3.3414 KOps/s $\textbf{\color{#35bf28}+15.10\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8873ms 5.7360ms 174.3365 Ops/s 171.6097 Ops/s $\color{#35bf28}+1.59\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3551ms 0.4894ms 2.0431 KOps/s 1.9360 KOps/s $\textbf{\color{#35bf28}+5.53\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7300ms 0.5174ms 1.9329 KOps/s 2.1934 KOps/s $\textbf{\color{#d91a1a}-11.88\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3784ms 4.9375ms 202.5299 Ops/s 197.7105 Ops/s $\color{#35bf28}+2.44\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.0596ms 1.9728ms 506.9010 Ops/s 468.7439 Ops/s $\textbf{\color{#35bf28}+8.14\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.2636ms 0.9077ms 1.1016 KOps/s 1.1043 KOps/s $\color{#d91a1a}-0.24\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6488s 17.9017ms 55.8607 Ops/s 38.1170 Ops/s $\textbf{\color{#35bf28}+46.55\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 12.7425ms 1.9381ms 515.9775 Ops/s 547.7474 Ops/s $\textbf{\color{#d91a1a}-5.80\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.0825ms 1.1855ms 843.5564 Ops/s 752.9170 Ops/s $\textbf{\color{#35bf28}+12.04\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.7308ms 5.1607ms 193.7730 Ops/s 192.6829 Ops/s $\color{#35bf28}+0.57\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 5.7648ms 1.9222ms 520.2285 Ops/s 489.0429 Ops/s $\textbf{\color{#35bf28}+6.38\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.1927ms 1.0352ms 966.0147 Ops/s 939.5700 Ops/s $\color{#35bf28}+2.81\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.1698ms 37.3343ms 26.7850 Ops/s 26.1420 Ops/s $\color{#35bf28}+2.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.9265ms 17.7964ms 56.1911 Ops/s 55.6880 Ops/s $\color{#35bf28}+0.90\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 43.0282ms 38.6966ms 25.8421 Ops/s 25.3296 Ops/s $\color{#35bf28}+2.02\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.4218ms 18.0740ms 55.3282 Ops/s 32.3032 Ops/s $\textbf{\color{#35bf28}+71.28\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.5559ms 40.1089ms 24.9321 Ops/s 23.9116 Ops/s $\color{#35bf28}+4.27\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.4558ms 19.4758ms 51.3457 Ops/s 50.4919 Ops/s $\color{#35bf28}+1.69\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8345ms 0.2169ms 4.6095 KOps/s 4.3346 KOps/s $\textbf{\color{#35bf28}+6.34\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.6879ms 1.3760ms 726.7412 Ops/s 712.1419 Ops/s $\color{#35bf28}+2.05\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7469ms 2.3689ms 422.1331 Ops/s 432.7927 Ops/s $\color{#d91a1a}-2.46\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0503ms 2.8551ms 350.2447 Ops/s 341.6514 Ops/s $\color{#35bf28}+2.52\%$
test_storage_write_contiguous[50-img_shape0-small] 0.4672ms 0.1340ms 7.4638 KOps/s 7.4796 KOps/s $\color{#d91a1a}-0.21\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3379ms 0.1837ms 5.4436 KOps/s 5.2025 KOps/s $\color{#35bf28}+4.63\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.2013ms 1.7595ms 568.3362 Ops/s 563.7643 Ops/s $\color{#35bf28}+0.81\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4519ms 1.2895ms 775.4954 Ops/s 768.0659 Ops/s $\color{#35bf28}+0.97\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2596ms 1.0939ms 914.1980 Ops/s 911.3042 Ops/s $\color{#35bf28}+0.32\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.6256ms 3.4421ms 290.5168 Ops/s 287.3819 Ops/s $\color{#35bf28}+1.09\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.0010ms 5.7701ms 173.3084 Ops/s 175.5667 Ops/s $\color{#d91a1a}-1.29\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.3081ms 7.0767ms 141.3087 Ops/s 138.4352 Ops/s $\color{#35bf28}+2.08\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4487ms 0.2818ms 3.5491 KOps/s 3.4469 KOps/s $\color{#35bf28}+2.97\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6734ms 1.4695ms 680.5260 Ops/s 656.6624 Ops/s $\color{#35bf28}+3.63\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.7176ms 2.4893ms 401.7258 Ops/s 409.3907 Ops/s $\color{#d91a1a}-1.87\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.1941ms 3.0536ms 327.4871 Ops/s 318.0906 Ops/s $\color{#35bf28}+2.95\%$
test_collector_without_rb[100-img_shape0-atari] 33.1210ms 32.1798ms 31.0754 Ops/s 30.7102 Ops/s $\color{#35bf28}+1.19\%$
test_collector_without_rb[200-img_shape1-large_batch] 0.6458s 0.1002s 9.9752 Ops/s 15.6426 Ops/s $\textbf{\color{#d91a1a}-36.23\%}$
test_collector_with_rb[100-img_shape0-atari] 37.7109ms 37.1889ms 26.8897 Ops/s 26.8459 Ops/s $\color{#35bf28}+0.16\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.1472ms 73.2302ms 13.6556 Ops/s 13.6888 Ops/s $\color{#d91a1a}-0.24\%$

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.2061μs 81.3948μs 12.2858 KOps/s 12.2276 KOps/s $\color{#35bf28}+0.48\%$
test_tensor_to_bytestream_speed[torch.save] 0.1506ms 0.1487ms 6.7248 KOps/s 7.0173 KOps/s $\color{#d91a1a}-4.17\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1188s 0.1185s 8.4361 Ops/s 8.2795 Ops/s $\color{#35bf28}+1.89\%$
test_tensor_to_bytestream_speed[numpy] 2.5125μs 2.5084μs 398.6571 KOps/s 395.6212 KOps/s $\color{#35bf28}+0.77\%$
test_tensor_to_bytestream_speed[safetensors] 37.1004μs 36.8819μs 27.1135 KOps/s 27.0615 KOps/s $\color{#35bf28}+0.19\%$
test_simple 0.8132s 0.8002s 1.2497 Ops/s 1.2090 Ops/s $\color{#35bf28}+3.37\%$
test_transformed 1.3998s 1.3987s 0.7149 Ops/s 0.7080 Ops/s $\color{#35bf28}+0.99\%$
test_serial 2.4574s 2.3700s 0.4219 Ops/s 0.4192 Ops/s $\color{#35bf28}+0.66\%$
test_parallel 1.9333s 1.8792s 0.5321 Ops/s 0.5487 Ops/s $\color{#d91a1a}-3.01\%$
test_step_mdp_speed[True-True-True-True-True] 0.1688ms 42.0165μs 23.8002 KOps/s 23.6964 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[True-True-True-True-False] 57.8740μs 22.9129μs 43.6435 KOps/s 43.0469 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[True-True-True-False-True] 59.8130μs 23.1203μs 43.2520 KOps/s 42.8717 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[True-True-True-False-False] 42.1330μs 12.7176μs 78.6310 KOps/s 78.0322 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-True-False-True-True] 78.2850μs 44.2376μs 22.6052 KOps/s 22.1760 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[True-True-False-True-False] 58.6640μs 25.5491μs 39.1403 KOps/s 38.8396 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-True-False-False-True] 54.0530μs 25.7454μs 38.8419 KOps/s 38.2686 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[True-True-False-False-False] 52.8230μs 15.3020μs 65.3511 KOps/s 64.3435 KOps/s $\color{#35bf28}+1.57\%$
test_step_mdp_speed[True-False-True-True-True] 79.7550μs 46.0832μs 21.6999 KOps/s 21.5996 KOps/s $\color{#35bf28}+0.46\%$
test_step_mdp_speed[True-False-True-True-False] 58.7840μs 28.2841μs 35.3555 KOps/s 35.6327 KOps/s $\color{#d91a1a}-0.78\%$
test_step_mdp_speed[True-False-True-False-True] 59.2930μs 25.5939μs 39.0717 KOps/s 37.9379 KOps/s $\color{#35bf28}+2.99\%$
test_step_mdp_speed[True-False-True-False-False] 47.0730μs 15.2494μs 65.5762 KOps/s 65.2903 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[True-False-False-True-True] 79.6450μs 49.1063μs 20.3640 KOps/s 20.4559 KOps/s $\color{#d91a1a}-0.45\%$
test_step_mdp_speed[True-False-False-True-False] 63.1640μs 30.2491μs 33.0589 KOps/s 32.1612 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[True-False-False-False-True] 59.2040μs 28.3910μs 35.2225 KOps/s 34.8156 KOps/s $\color{#35bf28}+1.17\%$
test_step_mdp_speed[True-False-False-False-False] 47.3130μs 17.6171μs 56.7632 KOps/s 55.8781 KOps/s $\color{#35bf28}+1.58\%$
test_step_mdp_speed[False-True-True-True-True] 81.9240μs 46.9618μs 21.2939 KOps/s 21.1186 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[False-True-True-True-False] 63.3230μs 27.8784μs 35.8701 KOps/s 35.2236 KOps/s $\color{#35bf28}+1.84\%$
test_step_mdp_speed[False-True-True-False-True] 2.5122ms 29.7716μs 33.5891 KOps/s 33.1040 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[False-True-True-False-False] 48.4630μs 17.1018μs 58.4733 KOps/s 58.7743 KOps/s $\color{#d91a1a}-0.51\%$
test_step_mdp_speed[False-True-False-True-True] 79.7940μs 49.0506μs 20.3871 KOps/s 20.2514 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-True-False-True-False] 64.1740μs 30.6387μs 32.6384 KOps/s 33.1177 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[False-True-False-False-True] 61.6640μs 31.7842μs 31.4622 KOps/s 31.2170 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[False-True-False-False-False] 60.3040μs 19.3638μs 51.6427 KOps/s 52.2289 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[False-False-True-True-True] 84.9650μs 51.6157μs 19.3739 KOps/s 19.1052 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[False-False-True-True-False] 64.5240μs 33.6370μs 29.7292 KOps/s 30.4645 KOps/s $\color{#d91a1a}-2.41\%$
test_step_mdp_speed[False-False-True-False-True] 71.1540μs 32.0997μs 31.1529 KOps/s 30.9651 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[False-False-True-False-False] 49.8930μs 19.3643μs 51.6415 KOps/s 51.8894 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[False-False-False-True-True] 91.2550μs 55.1574μs 18.1299 KOps/s 18.6104 KOps/s $\color{#d91a1a}-2.58\%$
test_step_mdp_speed[False-False-False-True-False] 78.6450μs 36.0440μs 27.7439 KOps/s 28.2113 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[False-False-False-False-True] 0.1047ms 34.2862μs 29.1662 KOps/s 29.0991 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[False-False-False-False-False] 50.2530μs 22.0058μs 45.4425 KOps/s 45.6486 KOps/s $\color{#d91a1a}-0.45\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8527s 0.7514s 1.3309 Ops/s 1.3405 Ops/s $\color{#d91a1a}-0.72\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7103s 0.6084s 1.6438 Ops/s 1.6404 Ops/s $\color{#35bf28}+0.20\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7432s 1.6553s 0.6041 Ops/s 0.6055 Ops/s $\color{#d91a1a}-0.23\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5092s 1.4264s 0.7011 Ops/s 0.7006 Ops/s $\color{#35bf28}+0.06\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9813s 1.8968s 0.5272 Ops/s 0.5274 Ops/s $\color{#d91a1a}-0.04\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7579s 1.6787s 0.5957 Ops/s 0.5987 Ops/s $\color{#d91a1a}-0.51\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7145s 4.5968s 0.2175 Ops/s 0.2159 Ops/s $\color{#35bf28}+0.74\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4923s 4.4425s 0.2251 Ops/s 0.2251 Ops/s $-0.01\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9674s 1.8967s 0.5272 Ops/s 0.5292 Ops/s $\color{#d91a1a}-0.36\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7285s 1.6273s 0.6145 Ops/s 0.6258 Ops/s $\color{#d91a1a}-1.80\%$
test_values[generalized_advantage_estimate-True-True] 23.1400ms 21.2675ms 47.0200 Ops/s 46.7034 Ops/s $\color{#35bf28}+0.68\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1337s 3.6082ms 277.1493 Ops/s 280.2886 Ops/s $\color{#d91a1a}-1.12\%$
test_values[td0_return_estimate-False-False] 0.1084ms 84.6432μs 11.8143 KOps/s 11.8446 KOps/s $\color{#d91a1a}-0.26\%$
test_values[td1_return_estimate-False-False] 50.9713ms 49.8219ms 20.0715 Ops/s 20.1528 Ops/s $\color{#d91a1a}-0.40\%$
test_values[vec_td1_return_estimate-False-False] 1.3829ms 1.1152ms 896.6628 Ops/s 907.0700 Ops/s $\color{#d91a1a}-1.15\%$
test_values[td_lambda_return_estimate-True-False] 84.1277ms 81.9853ms 12.1973 Ops/s 12.3752 Ops/s $\color{#d91a1a}-1.44\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3377ms 1.1006ms 908.6202 Ops/s 909.8099 Ops/s $\color{#d91a1a}-0.13\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.2771ms 21.1250ms 47.3373 Ops/s 47.3050 Ops/s $\color{#35bf28}+0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0496ms 0.7731ms 1.2935 KOps/s 1.2938 KOps/s $\color{#d91a1a}-0.03\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8390ms 0.6916ms 1.4460 KOps/s 1.4441 KOps/s $\color{#35bf28}+0.13\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5354ms 1.5044ms 664.6974 Ops/s 665.7213 Ops/s $\color{#d91a1a}-0.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7852ms 0.7081ms 1.4123 KOps/s 1.4130 KOps/s $\color{#d91a1a}-0.05\%$
test_dqn_speed[False-None] 1.7124ms 1.6258ms 615.0795 Ops/s 621.5379 Ops/s $\color{#d91a1a}-1.04\%$
test_dqn_speed[False-backward] 2.6423ms 2.2752ms 439.5141 Ops/s 439.6583 Ops/s $\color{#d91a1a}-0.03\%$
test_dqn_speed[True-None] 0.7681ms 0.5925ms 1.6879 KOps/s 1.6765 KOps/s $\color{#35bf28}+0.68\%$
test_dqn_speed[True-backward] 1.3881ms 1.2438ms 803.9649 Ops/s 809.9802 Ops/s $\color{#d91a1a}-0.74\%$
test_dqn_speed[reduce-overhead-None] 0.7677ms 0.6365ms 1.5711 KOps/s 1.5894 KOps/s $\color{#d91a1a}-1.15\%$
test_ddpg_speed[False-None] 3.4718ms 3.1772ms 314.7428 Ops/s 326.7278 Ops/s $\color{#d91a1a}-3.67\%$
test_ddpg_speed[False-backward] 5.0646ms 4.6026ms 217.2676 Ops/s 222.2544 Ops/s $\color{#d91a1a}-2.24\%$
test_ddpg_speed[True-None] 1.6105ms 1.4069ms 710.8025 Ops/s 732.1398 Ops/s $\color{#d91a1a}-2.91\%$
test_ddpg_speed[True-backward] 2.6384ms 2.5527ms 391.7399 Ops/s 391.7254 Ops/s $+0.00\%$
test_ddpg_speed[reduce-overhead-None] 1.6330ms 1.4127ms 707.8465 Ops/s 714.1584 Ops/s $\color{#d91a1a}-0.88\%$
test_sac_speed[False-None] 8.9446ms 8.5939ms 116.3614 Ops/s 116.0574 Ops/s $\color{#35bf28}+0.26\%$
test_sac_speed[False-backward] 12.2541ms 11.8827ms 84.1561 Ops/s 84.0896 Ops/s $\color{#35bf28}+0.08\%$
test_sac_speed[True-None] 2.2441ms 1.8948ms 527.7499 Ops/s 534.1525 Ops/s $\color{#d91a1a}-1.20\%$
test_sac_speed[True-backward] 3.8486ms 3.7457ms 266.9703 Ops/s 271.9216 Ops/s $\color{#d91a1a}-1.82\%$
test_sac_speed[reduce-overhead-None] 16.6397ms 10.1162ms 98.8511 Ops/s 98.7586 Ops/s $\color{#35bf28}+0.09\%$
test_redq_deprec_speed[False-None] 10.5189ms 9.6618ms 103.5005 Ops/s 103.6336 Ops/s $\color{#d91a1a}-0.13\%$
test_redq_deprec_speed[False-backward] 13.6767ms 13.1281ms 76.1725 Ops/s 76.4335 Ops/s $\color{#d91a1a}-0.34\%$
test_redq_deprec_speed[True-None] 2.8770ms 2.6321ms 379.9308 Ops/s 381.4294 Ops/s $\color{#d91a1a}-0.39\%$
test_redq_deprec_speed[True-backward] 4.7404ms 4.3007ms 232.5223 Ops/s 229.2037 Ops/s $\color{#35bf28}+1.45\%$
test_redq_deprec_speed[reduce-overhead-None] 14.5449ms 9.6486ms 103.6421 Ops/s 103.0209 Ops/s $\color{#35bf28}+0.60\%$
test_td3_speed[False-None] 8.6628ms 8.4954ms 117.7105 Ops/s 117.8534 Ops/s $\color{#d91a1a}-0.12\%$
test_td3_speed[False-backward] 12.0296ms 11.1770ms 89.4693 Ops/s 89.7715 Ops/s $\color{#d91a1a}-0.34\%$
test_td3_speed[True-None] 1.7568ms 1.7207ms 581.1545 Ops/s 607.5491 Ops/s $\color{#d91a1a}-4.34\%$
test_td3_speed[True-backward] 3.2956ms 3.2006ms 312.4406 Ops/s 310.9313 Ops/s $\color{#35bf28}+0.49\%$
test_td3_speed[reduce-overhead-None] 98.6499ms 25.9290ms 38.5669 Ops/s 38.1199 Ops/s $\color{#35bf28}+1.17\%$
test_cql_speed[False-None] 18.2812ms 17.9987ms 55.5595 Ops/s 55.6628 Ops/s $\color{#d91a1a}-0.19\%$
test_cql_speed[False-backward] 24.4449ms 23.8655ms 41.9015 Ops/s 42.0936 Ops/s $\color{#d91a1a}-0.46\%$
test_cql_speed[True-None] 3.4891ms 3.3650ms 297.1789 Ops/s 298.9559 Ops/s $\color{#d91a1a}-0.59\%$
test_cql_speed[True-backward] 5.8022ms 5.6194ms 177.9535 Ops/s 175.4869 Ops/s $\color{#35bf28}+1.41\%$
test_cql_speed[reduce-overhead-None] 17.9892ms 11.9073ms 83.9819 Ops/s 83.1174 Ops/s $\color{#35bf28}+1.04\%$
test_a2c_speed[False-None] 3.5866ms 3.4054ms 293.6556 Ops/s 291.2374 Ops/s $\color{#35bf28}+0.83\%$
test_a2c_speed[False-backward] 7.3889ms 6.6500ms 150.3764 Ops/s 149.5127 Ops/s $\color{#35bf28}+0.58\%$
test_a2c_speed[True-None] 1.5435ms 1.3825ms 723.3134 Ops/s 697.8725 Ops/s $\color{#35bf28}+3.65\%$
test_a2c_speed[True-backward] 3.2817ms 3.2303ms 309.5673 Ops/s 321.9697 Ops/s $\color{#d91a1a}-3.85\%$
test_a2c_speed[reduce-overhead-None] 1.2047ms 1.0543ms 948.5247 Ops/s 944.4109 Ops/s $\color{#35bf28}+0.44\%$
test_ppo_speed[False-None] 4.1914ms 4.0769ms 245.2830 Ops/s 237.7262 Ops/s $\color{#35bf28}+3.18\%$
test_ppo_speed[False-backward] 7.9712ms 7.5877ms 131.7921 Ops/s 135.3200 Ops/s $\color{#d91a1a}-2.61\%$
test_ppo_speed[True-None] 1.6346ms 1.5312ms 653.0878 Ops/s 653.8251 Ops/s $\color{#d91a1a}-0.11\%$
test_ppo_speed[True-backward] 3.4717ms 3.4261ms 291.8779 Ops/s 310.5376 Ops/s $\textbf{\color{#d91a1a}-6.01\%}$
test_ppo_speed[reduce-overhead-None] 1.1779ms 1.1029ms 906.7205 Ops/s 900.8899 Ops/s $\color{#35bf28}+0.65\%$
test_reinforce_speed[False-None] 3.2088ms 2.4615ms 406.2490 Ops/s 410.2081 Ops/s $\color{#d91a1a}-0.97\%$
test_reinforce_speed[False-backward] 3.7660ms 3.6105ms 276.9692 Ops/s 276.4654 Ops/s $\color{#35bf28}+0.18\%$
test_reinforce_speed[True-None] 1.5068ms 1.3885ms 720.2065 Ops/s 735.6074 Ops/s $\color{#d91a1a}-2.09\%$
test_reinforce_speed[True-backward] 3.3597ms 3.2113ms 311.4044 Ops/s 323.8252 Ops/s $\color{#d91a1a}-3.84\%$
test_reinforce_speed[reduce-overhead-None] 15.8549ms 8.8956ms 112.4156 Ops/s 112.1263 Ops/s $\color{#35bf28}+0.26\%$
test_iql_speed[False-None] 10.4712ms 9.8516ms 101.5060 Ops/s 101.5894 Ops/s $\color{#d91a1a}-0.08\%$
test_iql_speed[False-backward] 14.7354ms 13.9830ms 71.5155 Ops/s 72.9793 Ops/s $\color{#d91a1a}-2.01\%$
test_iql_speed[True-None] 2.4591ms 2.2963ms 435.4738 Ops/s 434.0339 Ops/s $\color{#35bf28}+0.33\%$
test_iql_speed[True-backward] 5.4792ms 5.0681ms 197.3122 Ops/s 204.2989 Ops/s $\color{#d91a1a}-3.42\%$
test_iql_speed[reduce-overhead-None] 16.1719ms 10.0347ms 99.6542 Ops/s 99.2410 Ops/s $\color{#35bf28}+0.42\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3358ms 5.9394ms 168.3662 Ops/s 167.4721 Ops/s $\color{#35bf28}+0.53\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7269ms 0.3535ms 2.8289 KOps/s 2.7398 KOps/s $\color{#35bf28}+3.25\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5595ms 0.3095ms 3.2313 KOps/s 2.8734 KOps/s $\textbf{\color{#35bf28}+12.45\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1228ms 5.7385ms 174.2604 Ops/s 172.4803 Ops/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9937ms 0.3508ms 2.8508 KOps/s 3.2107 KOps/s $\textbf{\color{#d91a1a}-11.21\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6096ms 0.3015ms 3.3166 KOps/s 3.1084 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5588ms 1.3294ms 752.2010 Ops/s 779.6286 Ops/s $\color{#d91a1a}-3.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5536ms 1.2408ms 805.9571 Ops/s 831.3161 Ops/s $\color{#d91a1a}-3.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.2056ms 6.0293ms 165.8563 Ops/s 166.6352 Ops/s $\color{#d91a1a}-0.47\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2495ms 0.4422ms 2.2613 KOps/s 2.2782 KOps/s $\color{#d91a1a}-0.74\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6489ms 0.4258ms 2.3483 KOps/s 2.3466 KOps/s $\color{#35bf28}+0.07\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9107ms 5.8232ms 171.7282 Ops/s 171.9553 Ops/s $\color{#d91a1a}-0.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9083ms 0.2903ms 3.4443 KOps/s 2.9381 KOps/s $\textbf{\color{#35bf28}+17.23\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4747ms 0.2762ms 3.6210 KOps/s 2.6644 KOps/s $\textbf{\color{#35bf28}+35.90\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0220ms 5.7361ms 174.3338 Ops/s 173.8428 Ops/s $\color{#35bf28}+0.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6716ms 0.3211ms 3.1144 KOps/s 3.3974 KOps/s $\textbf{\color{#d91a1a}-8.33\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5389ms 0.3142ms 3.1827 KOps/s 3.3446 KOps/s $\color{#d91a1a}-4.84\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.9060ms 5.9217ms 168.8695 Ops/s 166.0937 Ops/s $\color{#35bf28}+1.67\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7794ms 0.4790ms 2.0875 KOps/s 2.2445 KOps/s $\textbf{\color{#d91a1a}-6.99\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8650ms 0.4831ms 2.0700 KOps/s 2.3560 KOps/s $\textbf{\color{#d91a1a}-12.14\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.9478s 23.9421ms 41.7674 Ops/s 194.8081 Ops/s $\textbf{\color{#d91a1a}-78.56\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.9898ms 1.9329ms 517.3542 Ops/s 552.7010 Ops/s $\textbf{\color{#d91a1a}-6.40\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.3539ms 1.3064ms 765.4332 Ops/s 805.3738 Ops/s $\color{#d91a1a}-4.96\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5465ms 5.0412ms 198.3665 Ops/s 160.5633 Ops/s $\textbf{\color{#35bf28}+23.54\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9159ms 1.8486ms 540.9478 Ops/s 477.7089 Ops/s $\textbf{\color{#35bf28}+13.24\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.3134ms 1.2703ms 787.2191 Ops/s 704.6573 Ops/s $\textbf{\color{#35bf28}+11.72\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.6584s 18.3065ms 54.6253 Ops/s 185.4539 Ops/s $\textbf{\color{#d91a1a}-70.55\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 6.2367ms 2.1421ms 466.8310 Ops/s 467.6496 Ops/s $\color{#d91a1a}-0.18\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.3919ms 1.1848ms 844.0400 Ops/s 55.4431 Ops/s $\textbf{\color{#35bf28}+1422.35\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 41.0489ms 38.9263ms 25.6896 Ops/s 25.7023 Ops/s $\color{#d91a1a}-0.05\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7881ms 18.1774ms 55.0135 Ops/s 54.4410 Ops/s $\color{#35bf28}+1.05\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.5686ms 40.1481ms 24.9078 Ops/s 24.3933 Ops/s $\color{#35bf28}+2.11\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.9574ms 18.7929ms 53.2117 Ops/s 53.2255 Ops/s $\color{#d91a1a}-0.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.5462ms 42.4405ms 23.5624 Ops/s 23.6306 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.7735ms 20.1644ms 49.5923 Ops/s 49.3862 Ops/s $\color{#35bf28}+0.42\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8743ms 0.2301ms 4.3461 KOps/s 4.5899 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.7055ms 1.3671ms 731.4710 Ops/s 737.7719 Ops/s $\color{#d91a1a}-0.85\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7545ms 2.3102ms 432.8572 Ops/s 432.4771 Ops/s $\color{#35bf28}+0.09\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1109ms 2.9199ms 342.4726 Ops/s 347.3819 Ops/s $\color{#d91a1a}-1.41\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5358ms 0.1701ms 5.8805 KOps/s 6.0187 KOps/s $\color{#d91a1a}-2.30\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3931ms 0.2329ms 4.2930 KOps/s 4.2681 KOps/s $\color{#35bf28}+0.58\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9500ms 1.7779ms 562.4643 Ops/s 543.3703 Ops/s $\color{#35bf28}+3.51\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5857ms 1.3812ms 723.9959 Ops/s 717.5430 Ops/s $\color{#35bf28}+0.90\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3175ms 1.1548ms 865.9871 Ops/s 870.0505 Ops/s $\color{#d91a1a}-0.47\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8167ms 3.6197ms 276.2695 Ops/s 278.8514 Ops/s $\color{#d91a1a}-0.93\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.9553ms 5.6735ms 176.2594 Ops/s 172.6535 Ops/s $\color{#35bf28}+2.09\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.1259ms 6.9307ms 144.2848 Ops/s 141.5608 Ops/s $\color{#35bf28}+1.92\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4404ms 0.2777ms 3.6009 KOps/s 3.6454 KOps/s $\color{#d91a1a}-1.22\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7215ms 1.5561ms 642.6174 Ops/s 656.0202 Ops/s $\color{#d91a1a}-2.04\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.7135ms 2.4281ms 411.8408 Ops/s 409.3859 Ops/s $\color{#35bf28}+0.60\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3980ms 3.1191ms 320.6074 Ops/s 323.8943 Ops/s $\color{#d91a1a}-1.01\%$
test_collector_without_rb[100-img_shape0-atari] 34.4576ms 33.4915ms 29.8584 Ops/s 30.2327 Ops/s $\color{#d91a1a}-1.24\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.7240ms 65.6280ms 15.2374 Ops/s 15.3268 Ops/s $\color{#d91a1a}-0.58\%$
test_collector_with_rb[100-img_shape0-atari] 38.8018ms 37.8771ms 26.4012 Ops/s 26.6543 Ops/s $\color{#d91a1a}-0.95\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.0888ms 75.1861ms 13.3003 Ops/s 13.4810 Ops/s $\color{#d91a1a}-1.34\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.2142ms 57.2455ms 17.4686 Ops/s 17.7656 Ops/s $\color{#d91a1a}-1.67\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1155s 0.1126s 8.8824 Ops/s 8.9986 Ops/s $\color{#d91a1a}-1.29\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 59.8349ms 58.2246ms 17.1749 Ops/s 17.1434 Ops/s $\color{#35bf28}+0.18\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1195s 0.1161s 8.6098 Ops/s 8.7055 Ops/s $\color{#d91a1a}-1.10\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance Performance issue or suggestion for improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant