Skip to content

[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565

Open
vmoens wants to merge 1 commit intogh/vmoens/246/basefrom
gh/vmoens/246/head
Open

[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565
vmoens wants to merge 1 commit intogh/vmoens/246/basefrom
gh/vmoens/246/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 23, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3565

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 2 Cancelled Jobs

As of commit c3891d4 with merge base 0a1aea6 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Mar 23, 2026
…n _trust_step_output is set

When _trust_step_output is True, EnvBase.step() skips _assert_tensordict_shape,
partial_steps handling, next_preset logic, and _step_proc_data. Similarly,
TransformedEnv._step() skips partial_steps, next_preset, and _complete_done.
This eliminates all per-step Python validation overhead for well-behaved envs.

Made-with: Cursor
ghstack-source-id: 52ff860
Pull-Request: #3565
@github-actions github-actions bot added Performance Performance issue or suggestion for improvement Transforms labels Mar 23, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026
@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}23$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.0360μs 77.2956μs 12.9373 KOps/s 12.8615 KOps/s $\color{#35bf28}+0.59\%$
test_tensor_to_bytestream_speed[torch.save] 0.1357ms 0.1353ms 7.3896 KOps/s 7.2603 KOps/s $\color{#35bf28}+1.78\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1024s 0.1022s 9.7841 Ops/s 8.8862 Ops/s $\textbf{\color{#35bf28}+10.11\%}$
test_tensor_to_bytestream_speed[numpy] 2.4645μs 2.4601μs 406.4833 KOps/s 381.8186 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_tensor_to_bytestream_speed[safetensors] 35.8222μs 35.5701μs 28.1135 KOps/s 27.9989 KOps/s $\color{#35bf28}+0.41\%$
test_simple 0.5319s 0.5310s 1.8832 Ops/s 1.7772 Ops/s $\textbf{\color{#35bf28}+5.96\%}$
test_transformed 1.0635s 1.0627s 0.9410 Ops/s 0.9078 Ops/s $\color{#35bf28}+3.66\%$
test_serial 1.6498s 1.6454s 0.6078 Ops/s 0.5914 Ops/s $\color{#35bf28}+2.76\%$
test_parallel 1.1341s 1.0227s 0.9778 Ops/s 0.9673 Ops/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[True-True-True-True-True] 0.1467ms 39.5398μs 25.2909 KOps/s 24.6533 KOps/s $\color{#35bf28}+2.59\%$
test_step_mdp_speed[True-True-True-True-False] 45.0810μs 22.4717μs 44.5005 KOps/s 45.2086 KOps/s $\color{#d91a1a}-1.57\%$
test_step_mdp_speed[True-True-True-False-True] 59.0120μs 23.1111μs 43.2693 KOps/s 43.9038 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[True-True-True-False-False] 41.2520μs 12.4785μs 80.1376 KOps/s 80.9800 KOps/s $\color{#d91a1a}-1.04\%$
test_step_mdp_speed[True-True-False-True-True] 73.4830μs 43.7955μs 22.8334 KOps/s 23.2810 KOps/s $\color{#d91a1a}-1.92\%$
test_step_mdp_speed[True-True-False-True-False] 53.3320μs 25.0948μs 39.8488 KOps/s 40.8125 KOps/s $\color{#d91a1a}-2.36\%$
test_step_mdp_speed[True-True-False-False-True] 57.5620μs 25.5842μs 39.0866 KOps/s 40.9111 KOps/s $\color{#d91a1a}-4.46\%$
test_step_mdp_speed[True-True-False-False-False] 41.8420μs 15.2432μs 65.6032 KOps/s 68.2603 KOps/s $\color{#d91a1a}-3.89\%$
test_step_mdp_speed[True-False-True-True-True] 79.7630μs 45.8779μs 21.7970 KOps/s 22.2015 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[True-False-True-True-False] 0.1028ms 27.2738μs 36.6653 KOps/s 36.8175 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[True-False-True-False-True] 47.7520μs 25.2481μs 39.6069 KOps/s 40.8739 KOps/s $\color{#d91a1a}-3.10\%$
test_step_mdp_speed[True-False-True-False-False] 43.4610μs 15.1835μs 65.8610 KOps/s 66.8377 KOps/s $\color{#d91a1a}-1.46\%$
test_step_mdp_speed[True-False-False-True-True] 81.3740μs 47.9988μs 20.8338 KOps/s 20.8920 KOps/s $\color{#d91a1a}-0.28\%$
test_step_mdp_speed[True-False-False-True-False] 57.0020μs 29.1633μs 34.2897 KOps/s 33.6786 KOps/s $\color{#35bf28}+1.81\%$
test_step_mdp_speed[True-False-False-False-True] 92.3030μs 27.2242μs 36.7320 KOps/s 36.1776 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-False-False-False-False] 0.1093ms 17.1022μs 58.4720 KOps/s 58.4498 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[False-True-True-True-True] 78.5420μs 45.0750μs 22.1853 KOps/s 21.9388 KOps/s $\color{#35bf28}+1.12\%$
test_step_mdp_speed[False-True-True-True-False] 58.8720μs 26.9821μs 37.0616 KOps/s 36.9611 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[False-True-True-False-True] 2.7233ms 30.6886μs 32.5854 KOps/s 34.6601 KOps/s $\textbf{\color{#d91a1a}-5.99\%}$
test_step_mdp_speed[False-True-True-False-False] 48.3420μs 16.5662μs 60.3640 KOps/s 60.8288 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[False-True-False-True-True] 83.5830μs 47.1559μs 21.2062 KOps/s 20.7307 KOps/s $\color{#35bf28}+2.29\%$
test_step_mdp_speed[False-True-False-True-False] 57.9830μs 29.1496μs 34.3058 KOps/s 33.8518 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[False-True-False-False-True] 60.9620μs 31.0605μs 32.1952 KOps/s 32.6126 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[False-True-False-False-False] 42.7820μs 18.6025μs 53.7561 KOps/s 54.6484 KOps/s $\color{#d91a1a}-1.63\%$
test_step_mdp_speed[False-False-True-True-True] 80.2440μs 50.1449μs 19.9422 KOps/s 20.1660 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[False-False-True-True-False] 0.1050ms 31.9546μs 31.2944 KOps/s 31.2553 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-False-True-False-True] 0.1510ms 31.0005μs 32.2576 KOps/s 32.0109 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[False-False-True-False-False] 47.2220μs 18.9433μs 52.7892 KOps/s 53.2686 KOps/s $\color{#d91a1a}-0.90\%$
test_step_mdp_speed[False-False-False-True-True] 84.0440μs 51.9639μs 19.2441 KOps/s 19.2911 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-False-False-True-False] 69.4430μs 34.5214μs 28.9675 KOps/s 29.0335 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-False-False-False-True] 62.3520μs 32.8375μs 30.4530 KOps/s 30.4102 KOps/s $\color{#35bf28}+0.14\%$
test_step_mdp_speed[False-False-False-False-False] 48.6220μs 21.3172μs 46.9104 KOps/s 47.7683 KOps/s $\color{#d91a1a}-1.80\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7099s 0.7066s 1.4153 Ops/s 1.3611 Ops/s $\color{#35bf28}+3.98\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7050s 0.5984s 1.6711 Ops/s 1.6686 Ops/s $\color{#35bf28}+0.15\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7004s 1.6118s 0.6204 Ops/s 0.6170 Ops/s $\color{#35bf28}+0.55\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4811s 1.3931s 0.7178 Ops/s 0.7124 Ops/s $\color{#35bf28}+0.76\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9296s 1.8436s 0.5424 Ops/s 0.5371 Ops/s $\color{#35bf28}+0.98\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7112s 1.6257s 0.6151 Ops/s 0.6059 Ops/s $\color{#35bf28}+1.53\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6208s 4.5175s 0.2214 Ops/s 0.2197 Ops/s $\color{#35bf28}+0.76\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4245s 4.3391s 0.2305 Ops/s 0.2238 Ops/s $\color{#35bf28}+3.00\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.8934s 1.8258s 0.5477 Ops/s 0.5437 Ops/s $\color{#35bf28}+0.73\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6555s 1.5760s 0.6345 Ops/s 0.6443 Ops/s $\color{#d91a1a}-1.51\%$
test_values[generalized_advantage_estimate-True-True] 10.2780ms 9.8836ms 101.1777 Ops/s 98.1313 Ops/s $\color{#35bf28}+3.10\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.9772ms 18.0985ms 55.2531 Ops/s 84.7634 Ops/s $\textbf{\color{#d91a1a}-34.81\%}$
test_values[td0_return_estimate-False-False] 0.2240ms 0.1200ms 8.3355 KOps/s 7.5522 KOps/s $\textbf{\color{#35bf28}+10.37\%}$
test_values[td1_return_estimate-False-False] 26.7133ms 25.9376ms 38.5541 Ops/s 36.3435 Ops/s $\textbf{\color{#35bf28}+6.08\%}$
test_values[vec_td1_return_estimate-False-False] 18.9337ms 18.1818ms 55.0000 Ops/s 84.4852 Ops/s $\textbf{\color{#d91a1a}-34.90\%}$
test_values[td_lambda_return_estimate-True-False] 40.1972ms 38.7847ms 25.7834 Ops/s 24.6701 Ops/s $\color{#35bf28}+4.51\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.6113ms 18.1629ms 55.0572 Ops/s 84.1066 Ops/s $\textbf{\color{#d91a1a}-34.54\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.1029ms 8.6240ms 115.9560 Ops/s 111.0967 Ops/s $\color{#35bf28}+4.37\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7749ms 1.5067ms 663.7153 Ops/s 640.5747 Ops/s $\color{#35bf28}+3.61\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8705ms 0.4091ms 2.4441 KOps/s 2.3810 KOps/s $\color{#35bf28}+2.65\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.9033ms 35.2492ms 28.3694 Ops/s 41.9416 Ops/s $\textbf{\color{#d91a1a}-32.36\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.2656ms 1.8088ms 552.8576 Ops/s 557.5894 Ops/s $\color{#d91a1a}-0.85\%$
test_dqn_speed[False-None] 1.8571ms 1.3881ms 720.4040 Ops/s 716.0646 Ops/s $\color{#35bf28}+0.61\%$
test_dqn_speed[False-backward] 2.1058ms 1.9084ms 523.9926 Ops/s 527.8297 Ops/s $\color{#d91a1a}-0.73\%$
test_dqn_speed[True-None] 0.7242ms 0.5595ms 1.7872 KOps/s 1.8238 KOps/s $\color{#d91a1a}-2.01\%$
test_dqn_speed[True-backward] 1.0645ms 1.0133ms 986.9081 Ops/s 820.8176 Ops/s $\textbf{\color{#35bf28}+20.23\%}$
test_dqn_speed[reduce-overhead-None] 0.9926ms 0.5422ms 1.8443 KOps/s 1.7479 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_ddpg_speed[False-None] 3.2383ms 2.8137ms 355.4064 Ops/s 352.1908 Ops/s $\color{#35bf28}+0.91\%$
test_ddpg_speed[False-backward] 4.1772ms 4.0236ms 248.5366 Ops/s 242.5218 Ops/s $\color{#35bf28}+2.48\%$
test_ddpg_speed[True-None] 1.5307ms 1.4476ms 690.7880 Ops/s 667.4503 Ops/s $\color{#35bf28}+3.50\%$
test_ddpg_speed[True-backward] 2.7617ms 2.4566ms 407.0587 Ops/s 376.7459 Ops/s $\textbf{\color{#35bf28}+8.05\%}$
test_ddpg_speed[reduce-overhead-None] 1.9526ms 1.4322ms 698.2137 Ops/s 682.1779 Ops/s $\color{#35bf28}+2.35\%$
test_sac_speed[False-None] 8.4961ms 7.9805ms 125.3058 Ops/s 124.4125 Ops/s $\color{#35bf28}+0.72\%$
test_sac_speed[False-backward] 11.7745ms 11.2205ms 89.1229 Ops/s 89.0867 Ops/s $\color{#35bf28}+0.04\%$
test_sac_speed[True-None] 2.4549ms 2.2354ms 447.3495 Ops/s 451.2892 Ops/s $\color{#d91a1a}-0.87\%$
test_sac_speed[True-backward] 4.7733ms 4.2494ms 235.3282 Ops/s 212.6774 Ops/s $\textbf{\color{#35bf28}+10.65\%}$
test_sac_speed[reduce-overhead-None] 2.7743ms 2.2274ms 448.9477 Ops/s 451.5284 Ops/s $\color{#d91a1a}-0.57\%$
test_redq_speed[False-None] 15.9048ms 10.9110ms 91.6506 Ops/s 93.7430 Ops/s $\color{#d91a1a}-2.23\%$
test_redq_speed[False-backward] 23.5952ms 18.9969ms 52.6401 Ops/s 54.7113 Ops/s $\color{#d91a1a}-3.79\%$
test_redq_speed[True-None] 5.0493ms 4.6277ms 216.0921 Ops/s 208.4760 Ops/s $\color{#35bf28}+3.65\%$
test_redq_speed[reduce-overhead-None] 5.1256ms 4.5732ms 218.6641 Ops/s 221.9872 Ops/s $\color{#d91a1a}-1.50\%$
test_redq_deprec_speed[False-None] 12.2940ms 11.4741ms 87.1525 Ops/s 88.4656 Ops/s $\color{#d91a1a}-1.48\%$
test_redq_deprec_speed[False-backward] 17.2423ms 16.5560ms 60.4009 Ops/s 61.3603 Ops/s $\color{#d91a1a}-1.56\%$
test_redq_deprec_speed[True-None] 4.0576ms 3.7702ms 265.2405 Ops/s 257.2156 Ops/s $\color{#35bf28}+3.12\%$
test_redq_deprec_speed[True-backward] 8.2329ms 7.6946ms 129.9620 Ops/s 121.0842 Ops/s $\textbf{\color{#35bf28}+7.33\%}$
test_redq_deprec_speed[reduce-overhead-None] 4.0614ms 3.7399ms 267.3838 Ops/s 260.2001 Ops/s $\color{#35bf28}+2.76\%$
test_td3_speed[False-None] 8.1089ms 7.9747ms 125.3966 Ops/s 125.3695 Ops/s $\color{#35bf28}+0.02\%$
test_td3_speed[False-backward] 11.1768ms 10.7942ms 92.6422 Ops/s 92.6161 Ops/s $\color{#35bf28}+0.03\%$
test_td3_speed[True-None] 1.9629ms 1.8861ms 530.1829 Ops/s 529.9087 Ops/s $\color{#35bf28}+0.05\%$
test_td3_speed[True-backward] 4.9094ms 4.1119ms 243.1970 Ops/s 268.0417 Ops/s $\textbf{\color{#d91a1a}-9.27\%}$
test_td3_speed[reduce-overhead-None] 1.9302ms 1.8512ms 540.1906 Ops/s 537.1889 Ops/s $\color{#35bf28}+0.56\%$
test_cql_speed[False-None] 30.3039ms 26.8536ms 37.2390 Ops/s 37.7056 Ops/s $\color{#d91a1a}-1.24\%$
test_cql_speed[False-backward] 37.5937ms 36.1350ms 27.6740 Ops/s 27.7327 Ops/s $\color{#d91a1a}-0.21\%$
test_cql_speed[True-None] 16.3288ms 13.2788ms 75.3078 Ops/s 78.2286 Ops/s $\color{#d91a1a}-3.73\%$
test_cql_speed[True-backward] 19.1662ms 18.6327ms 53.6690 Ops/s 55.2737 Ops/s $\color{#d91a1a}-2.90\%$
test_cql_speed[reduce-overhead-None] 13.5138ms 13.0048ms 76.8944 Ops/s 68.5498 Ops/s $\textbf{\color{#35bf28}+12.17\%}$
test_a2c_speed[False-None] 5.9799ms 5.5604ms 179.8425 Ops/s 182.1897 Ops/s $\color{#d91a1a}-1.29\%$
test_a2c_speed[False-backward] 12.6981ms 12.1679ms 82.1833 Ops/s 82.1485 Ops/s $\color{#35bf28}+0.04\%$
test_a2c_speed[True-None] 4.4424ms 3.9015ms 256.3129 Ops/s 251.7031 Ops/s $\color{#35bf28}+1.83\%$
test_a2c_speed[True-backward] 9.2606ms 9.0350ms 110.6802 Ops/s 112.7947 Ops/s $\color{#d91a1a}-1.87\%$
test_a2c_speed[reduce-overhead-None] 4.4932ms 3.8989ms 256.4815 Ops/s 257.7449 Ops/s $\color{#d91a1a}-0.49\%$
test_ppo_speed[False-None] 6.7859ms 6.0687ms 164.7792 Ops/s 165.7256 Ops/s $\color{#d91a1a}-0.57\%$
test_ppo_speed[False-backward] 13.5409ms 12.9402ms 77.2787 Ops/s 77.9799 Ops/s $\color{#d91a1a}-0.90\%$
test_ppo_speed[True-None] 4.3131ms 3.8266ms 261.3264 Ops/s 258.8925 Ops/s $\color{#35bf28}+0.94\%$
test_ppo_speed[True-backward] 9.1559ms 8.7773ms 113.9309 Ops/s 110.1940 Ops/s $\color{#35bf28}+3.39\%$
test_ppo_speed[reduce-overhead-None] 4.3433ms 3.8031ms 262.9404 Ops/s 264.0853 Ops/s $\color{#d91a1a}-0.43\%$
test_reinforce_speed[False-None] 5.0265ms 4.6935ms 213.0622 Ops/s 212.9458 Ops/s $\color{#35bf28}+0.05\%$
test_reinforce_speed[False-backward] 7.9582ms 7.5897ms 131.7578 Ops/s 132.1637 Ops/s $\color{#d91a1a}-0.31\%$
test_reinforce_speed[True-None] 3.5295ms 3.0216ms 330.9495 Ops/s 321.0837 Ops/s $\color{#35bf28}+3.07\%$
test_reinforce_speed[True-backward] 8.3867ms 8.0594ms 124.0781 Ops/s 123.5754 Ops/s $\color{#35bf28}+0.41\%$
test_reinforce_speed[reduce-overhead-None] 3.4990ms 2.9856ms 334.9427 Ops/s 328.6310 Ops/s $\color{#35bf28}+1.92\%$
test_iql_speed[False-None] 22.2322ms 20.6115ms 48.5166 Ops/s 48.5434 Ops/s $\color{#d91a1a}-0.06\%$
test_iql_speed[False-backward] 33.4416ms 31.4700ms 31.7763 Ops/s 31.7928 Ops/s $\color{#d91a1a}-0.05\%$
test_iql_speed[True-None] 9.5120ms 8.8427ms 113.0875 Ops/s 113.7191 Ops/s $\color{#d91a1a}-0.56\%$
test_iql_speed[True-backward] 18.1484ms 17.3439ms 57.6573 Ops/s 58.1855 Ops/s $\color{#d91a1a}-0.91\%$
test_iql_speed[reduce-overhead-None] 9.3146ms 8.8395ms 113.1281 Ops/s 109.9138 Ops/s $\color{#35bf28}+2.92\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0643ms 5.8261ms 171.6408 Ops/s 171.4327 Ops/s $\color{#35bf28}+0.12\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.1105ms 0.3371ms 2.9666 KOps/s 3.3417 KOps/s $\textbf{\color{#d91a1a}-11.23\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6901ms 0.3724ms 2.6850 KOps/s 3.5088 KOps/s $\textbf{\color{#d91a1a}-23.48\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1293ms 5.5972ms 178.6596 Ops/s 175.6584 Ops/s $\color{#35bf28}+1.71\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9356ms 0.3140ms 3.1846 KOps/s 3.3783 KOps/s $\textbf{\color{#d91a1a}-5.73\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6068ms 0.3116ms 3.2090 KOps/s 3.5963 KOps/s $\textbf{\color{#d91a1a}-10.77\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7389ms 1.2821ms 779.9815 Ops/s 734.5809 Ops/s $\textbf{\color{#35bf28}+6.18\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6657ms 1.1970ms 835.4051 Ops/s 793.2646 Ops/s $\textbf{\color{#35bf28}+5.31\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.0588ms 5.8635ms 170.5470 Ops/s 172.0745 Ops/s $\color{#d91a1a}-0.89\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8654ms 0.4628ms 2.1606 KOps/s 2.1526 KOps/s $\color{#35bf28}+0.37\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0325ms 0.4613ms 2.1679 KOps/s 2.2857 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1012ms 5.6726ms 176.2855 Ops/s 176.7177 Ops/s $\color{#d91a1a}-0.24\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0508ms 0.3004ms 3.3293 KOps/s 2.7503 KOps/s $\textbf{\color{#35bf28}+21.05\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5799ms 0.2724ms 3.6711 KOps/s 3.4341 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1686ms 5.6031ms 178.4712 Ops/s 178.2515 Ops/s $\color{#35bf28}+0.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0399ms 0.3318ms 3.0139 KOps/s 3.2985 KOps/s $\textbf{\color{#d91a1a}-8.63\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8652ms 0.3367ms 2.9698 KOps/s 2.8068 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2518ms 5.7737ms 173.1993 Ops/s 170.5598 Ops/s $\color{#35bf28}+1.55\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7230ms 0.4976ms 2.0097 KOps/s 1.8926 KOps/s $\textbf{\color{#35bf28}+6.19\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6537ms 0.4711ms 2.1226 KOps/s 1.8964 KOps/s $\textbf{\color{#35bf28}+11.92\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.6355ms 4.9671ms 201.3233 Ops/s 197.9394 Ops/s $\color{#35bf28}+1.71\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.0310ms 2.0351ms 491.3874 Ops/s 478.8809 Ops/s $\color{#35bf28}+2.61\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.1189ms 1.0897ms 917.6674 Ops/s 776.3732 Ops/s $\textbf{\color{#35bf28}+18.20\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6723s 18.3946ms 54.3637 Ops/s 37.4812 Ops/s $\textbf{\color{#35bf28}+45.04\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9035ms 1.7905ms 558.4962 Ops/s 559.4803 Ops/s $\color{#d91a1a}-0.18\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1024ms 0.8881ms 1.1260 KOps/s 1.0579 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.4163ms 5.2181ms 191.6417 Ops/s 191.4330 Ops/s $\color{#35bf28}+0.11\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.3107ms 2.0899ms 478.4842 Ops/s 520.5824 Ops/s $\textbf{\color{#d91a1a}-8.09\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 10.3368ms 1.3769ms 726.2874 Ops/s 736.6754 Ops/s $\color{#d91a1a}-1.41\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.9317ms 38.2444ms 26.1476 Ops/s 25.4429 Ops/s $\color{#35bf28}+2.77\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7396ms 18.0504ms 55.4006 Ops/s 53.1700 Ops/s $\color{#35bf28}+4.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 43.0846ms 39.3239ms 25.4298 Ops/s 24.3591 Ops/s $\color{#35bf28}+4.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.7028ms 18.1748ms 55.0212 Ops/s 52.4037 Ops/s $\color{#35bf28}+4.99\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 42.1047ms 40.2821ms 24.8249 Ops/s 24.0650 Ops/s $\color{#35bf28}+3.16\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.5331ms 19.5522ms 51.1453 Ops/s 49.8611 Ops/s $\color{#35bf28}+2.58\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8445ms 0.2147ms 4.6572 KOps/s 4.3734 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.9268ms 1.4721ms 679.2825 Ops/s 665.4939 Ops/s $\color{#35bf28}+2.07\%$
test_storage_write_lazystack[100-img_shape2-large_img] 3.1519ms 2.4442ms 409.1343 Ops/s 409.1744 Ops/s $-0.01\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.5426ms 2.9650ms 337.2688 Ops/s 320.6740 Ops/s $\textbf{\color{#35bf28}+5.17\%}$
test_storage_write_contiguous[50-img_shape0-small] 0.6085ms 0.1328ms 7.5326 KOps/s 7.5464 KOps/s $\color{#d91a1a}-0.18\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3259ms 0.1884ms 5.3082 KOps/s 5.1894 KOps/s $\color{#35bf28}+2.29\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.0614ms 1.8332ms 545.4980 Ops/s 534.7015 Ops/s $\color{#35bf28}+2.02\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6132ms 1.3638ms 733.2247 Ops/s 722.9541 Ops/s $\color{#35bf28}+1.42\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2455ms 1.0811ms 924.9761 Ops/s 914.4627 Ops/s $\color{#35bf28}+1.15\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8615ms 3.4912ms 286.4354 Ops/s 280.4766 Ops/s $\color{#35bf28}+2.12\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.2534ms 5.8792ms 170.0899 Ops/s 170.5760 Ops/s $\color{#d91a1a}-0.28\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.7612ms 7.3180ms 136.6494 Ops/s 139.7142 Ops/s $\color{#d91a1a}-2.19\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4364ms 0.2715ms 3.6836 KOps/s 3.5986 KOps/s $\color{#35bf28}+2.36\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 2.0086ms 1.5536ms 643.6631 Ops/s 628.4667 Ops/s $\color{#35bf28}+2.42\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 3.0763ms 2.5319ms 394.9541 Ops/s 390.6502 Ops/s $\color{#35bf28}+1.10\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5670ms 3.1614ms 316.3180 Ops/s 302.3802 Ops/s $\color{#35bf28}+4.61\%$
test_collector_without_rb[100-img_shape0-atari] 32.5709ms 31.5606ms 31.6850 Ops/s 31.0117 Ops/s $\color{#35bf28}+2.17\%$
test_collector_without_rb[200-img_shape1-large_batch] 0.6553s 98.9675ms 10.1043 Ops/s 15.6920 Ops/s $\textbf{\color{#d91a1a}-35.61\%}$
test_collector_with_rb[100-img_shape0-atari] 37.2861ms 36.5556ms 27.3556 Ops/s 27.1435 Ops/s $\color{#35bf28}+0.78\%$
test_collector_with_rb[200-img_shape1-large_batch] 72.2230ms 71.4820ms 13.9895 Ops/s 13.9051 Ops/s $\color{#35bf28}+0.61\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance Performance issue or suggestion for improvement Transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant