[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565
Open
vmoens wants to merge 1 commit intogh/vmoens/246/basefrom
Open
[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565vmoens wants to merge 1 commit intogh/vmoens/246/basefrom
vmoens wants to merge 1 commit intogh/vmoens/246/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3565
Note: Links to docs will display an error until the docs builds have been completed. ❌ 9 New Failures, 2 Cancelled JobsAs of commit c3891d4 with merge base 0a1aea6 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
vmoens
added a commit
that referenced
this pull request
Mar 23, 2026
…n _trust_step_output is set When _trust_step_output is True, EnvBase.step() skips _assert_tensordict_shape, partial_steps handling, next_preset logic, and _step_proc_data. Similarly, TransformedEnv._step() skips partial_steps, next_preset, and _complete_done. This eliminates all per-step Python validation overhead for well-behaved envs. Made-with: Cursor ghstack-source-id: 52ff860 Pull-Request: #3565
This was referenced Mar 23, 2026
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 79.0360μs | 77.2956μs | 12.9373 KOps/s | 12.8615 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1357ms | 0.1353ms | 7.3896 KOps/s | 7.2603 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1024s | 0.1022s | 9.7841 Ops/s | 8.8862 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.4645μs | 2.4601μs | 406.4833 KOps/s | 381.8186 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 35.8222μs | 35.5701μs | 28.1135 KOps/s | 27.9989 KOps/s | |
| test_simple | 0.5319s | 0.5310s | 1.8832 Ops/s | 1.7772 Ops/s | |
| test_transformed | 1.0635s | 1.0627s | 0.9410 Ops/s | 0.9078 Ops/s | |
| test_serial | 1.6498s | 1.6454s | 0.6078 Ops/s | 0.5914 Ops/s | |
| test_parallel | 1.1341s | 1.0227s | 0.9778 Ops/s | 0.9673 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.1467ms | 39.5398μs | 25.2909 KOps/s | 24.6533 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 45.0810μs | 22.4717μs | 44.5005 KOps/s | 45.2086 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 59.0120μs | 23.1111μs | 43.2693 KOps/s | 43.9038 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 41.2520μs | 12.4785μs | 80.1376 KOps/s | 80.9800 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 73.4830μs | 43.7955μs | 22.8334 KOps/s | 23.2810 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 53.3320μs | 25.0948μs | 39.8488 KOps/s | 40.8125 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 57.5620μs | 25.5842μs | 39.0866 KOps/s | 40.9111 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 41.8420μs | 15.2432μs | 65.6032 KOps/s | 68.2603 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 79.7630μs | 45.8779μs | 21.7970 KOps/s | 22.2015 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 0.1028ms | 27.2738μs | 36.6653 KOps/s | 36.8175 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 47.7520μs | 25.2481μs | 39.6069 KOps/s | 40.8739 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 43.4610μs | 15.1835μs | 65.8610 KOps/s | 66.8377 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 81.3740μs | 47.9988μs | 20.8338 KOps/s | 20.8920 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 57.0020μs | 29.1633μs | 34.2897 KOps/s | 33.6786 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 92.3030μs | 27.2242μs | 36.7320 KOps/s | 36.1776 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 0.1093ms | 17.1022μs | 58.4720 KOps/s | 58.4498 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 78.5420μs | 45.0750μs | 22.1853 KOps/s | 21.9388 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 58.8720μs | 26.9821μs | 37.0616 KOps/s | 36.9611 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.7233ms | 30.6886μs | 32.5854 KOps/s | 34.6601 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 48.3420μs | 16.5662μs | 60.3640 KOps/s | 60.8288 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 83.5830μs | 47.1559μs | 21.2062 KOps/s | 20.7307 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 57.9830μs | 29.1496μs | 34.3058 KOps/s | 33.8518 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 60.9620μs | 31.0605μs | 32.1952 KOps/s | 32.6126 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 42.7820μs | 18.6025μs | 53.7561 KOps/s | 54.6484 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 80.2440μs | 50.1449μs | 19.9422 KOps/s | 20.1660 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 0.1050ms | 31.9546μs | 31.2944 KOps/s | 31.2553 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 0.1510ms | 31.0005μs | 32.2576 KOps/s | 32.0109 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 47.2220μs | 18.9433μs | 52.7892 KOps/s | 53.2686 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 84.0440μs | 51.9639μs | 19.2441 KOps/s | 19.2911 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 69.4430μs | 34.5214μs | 28.9675 KOps/s | 29.0335 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 62.3520μs | 32.8375μs | 30.4530 KOps/s | 30.4102 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 48.6220μs | 21.3172μs | 46.9104 KOps/s | 47.7683 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7099s | 0.7066s | 1.4153 Ops/s | 1.3611 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7050s | 0.5984s | 1.6711 Ops/s | 1.6686 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7004s | 1.6118s | 0.6204 Ops/s | 0.6170 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.4811s | 1.3931s | 0.7178 Ops/s | 0.7124 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9296s | 1.8436s | 0.5424 Ops/s | 0.5371 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7112s | 1.6257s | 0.6151 Ops/s | 0.6059 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6208s | 4.5175s | 0.2214 Ops/s | 0.2197 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.4245s | 4.3391s | 0.2305 Ops/s | 0.2238 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.8934s | 1.8258s | 0.5477 Ops/s | 0.5437 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6555s | 1.5760s | 0.6345 Ops/s | 0.6443 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 10.2780ms | 9.8836ms | 101.1777 Ops/s | 98.1313 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 20.9772ms | 18.0985ms | 55.2531 Ops/s | 84.7634 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2240ms | 0.1200ms | 8.3355 KOps/s | 7.5522 KOps/s | |
| test_values[td1_return_estimate-False-False] | 26.7133ms | 25.9376ms | 38.5541 Ops/s | 36.3435 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 18.9337ms | 18.1818ms | 55.0000 Ops/s | 84.4852 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 40.1972ms | 38.7847ms | 25.7834 Ops/s | 24.6701 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 18.6113ms | 18.1629ms | 55.0572 Ops/s | 84.1066 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 9.1029ms | 8.6240ms | 115.9560 Ops/s | 111.0967 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7749ms | 1.5067ms | 663.7153 Ops/s | 640.5747 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.8705ms | 0.4091ms | 2.4441 KOps/s | 2.3810 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 35.9033ms | 35.2492ms | 28.3694 Ops/s | 41.9416 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 2.2656ms | 1.8088ms | 552.8576 Ops/s | 557.5894 Ops/s | |
| test_dqn_speed[False-None] | 1.8571ms | 1.3881ms | 720.4040 Ops/s | 716.0646 Ops/s | |
| test_dqn_speed[False-backward] | 2.1058ms | 1.9084ms | 523.9926 Ops/s | 527.8297 Ops/s | |
| test_dqn_speed[True-None] | 0.7242ms | 0.5595ms | 1.7872 KOps/s | 1.8238 KOps/s | |
| test_dqn_speed[True-backward] | 1.0645ms | 1.0133ms | 986.9081 Ops/s | 820.8176 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.9926ms | 0.5422ms | 1.8443 KOps/s | 1.7479 KOps/s | |
| test_ddpg_speed[False-None] | 3.2383ms | 2.8137ms | 355.4064 Ops/s | 352.1908 Ops/s | |
| test_ddpg_speed[False-backward] | 4.1772ms | 4.0236ms | 248.5366 Ops/s | 242.5218 Ops/s | |
| test_ddpg_speed[True-None] | 1.5307ms | 1.4476ms | 690.7880 Ops/s | 667.4503 Ops/s | |
| test_ddpg_speed[True-backward] | 2.7617ms | 2.4566ms | 407.0587 Ops/s | 376.7459 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.9526ms | 1.4322ms | 698.2137 Ops/s | 682.1779 Ops/s | |
| test_sac_speed[False-None] | 8.4961ms | 7.9805ms | 125.3058 Ops/s | 124.4125 Ops/s | |
| test_sac_speed[False-backward] | 11.7745ms | 11.2205ms | 89.1229 Ops/s | 89.0867 Ops/s | |
| test_sac_speed[True-None] | 2.4549ms | 2.2354ms | 447.3495 Ops/s | 451.2892 Ops/s | |
| test_sac_speed[True-backward] | 4.7733ms | 4.2494ms | 235.3282 Ops/s | 212.6774 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.7743ms | 2.2274ms | 448.9477 Ops/s | 451.5284 Ops/s | |
| test_redq_speed[False-None] | 15.9048ms | 10.9110ms | 91.6506 Ops/s | 93.7430 Ops/s | |
| test_redq_speed[False-backward] | 23.5952ms | 18.9969ms | 52.6401 Ops/s | 54.7113 Ops/s | |
| test_redq_speed[True-None] | 5.0493ms | 4.6277ms | 216.0921 Ops/s | 208.4760 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 5.1256ms | 4.5732ms | 218.6641 Ops/s | 221.9872 Ops/s | |
| test_redq_deprec_speed[False-None] | 12.2940ms | 11.4741ms | 87.1525 Ops/s | 88.4656 Ops/s | |
| test_redq_deprec_speed[False-backward] | 17.2423ms | 16.5560ms | 60.4009 Ops/s | 61.3603 Ops/s | |
| test_redq_deprec_speed[True-None] | 4.0576ms | 3.7702ms | 265.2405 Ops/s | 257.2156 Ops/s | |
| test_redq_deprec_speed[True-backward] | 8.2329ms | 7.6946ms | 129.9620 Ops/s | 121.0842 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 4.0614ms | 3.7399ms | 267.3838 Ops/s | 260.2001 Ops/s | |
| test_td3_speed[False-None] | 8.1089ms | 7.9747ms | 125.3966 Ops/s | 125.3695 Ops/s | |
| test_td3_speed[False-backward] | 11.1768ms | 10.7942ms | 92.6422 Ops/s | 92.6161 Ops/s | |
| test_td3_speed[True-None] | 1.9629ms | 1.8861ms | 530.1829 Ops/s | 529.9087 Ops/s | |
| test_td3_speed[True-backward] | 4.9094ms | 4.1119ms | 243.1970 Ops/s | 268.0417 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.9302ms | 1.8512ms | 540.1906 Ops/s | 537.1889 Ops/s | |
| test_cql_speed[False-None] | 30.3039ms | 26.8536ms | 37.2390 Ops/s | 37.7056 Ops/s | |
| test_cql_speed[False-backward] | 37.5937ms | 36.1350ms | 27.6740 Ops/s | 27.7327 Ops/s | |
| test_cql_speed[True-None] | 16.3288ms | 13.2788ms | 75.3078 Ops/s | 78.2286 Ops/s | |
| test_cql_speed[True-backward] | 19.1662ms | 18.6327ms | 53.6690 Ops/s | 55.2737 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 13.5138ms | 13.0048ms | 76.8944 Ops/s | 68.5498 Ops/s | |
| test_a2c_speed[False-None] | 5.9799ms | 5.5604ms | 179.8425 Ops/s | 182.1897 Ops/s | |
| test_a2c_speed[False-backward] | 12.6981ms | 12.1679ms | 82.1833 Ops/s | 82.1485 Ops/s | |
| test_a2c_speed[True-None] | 4.4424ms | 3.9015ms | 256.3129 Ops/s | 251.7031 Ops/s | |
| test_a2c_speed[True-backward] | 9.2606ms | 9.0350ms | 110.6802 Ops/s | 112.7947 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 4.4932ms | 3.8989ms | 256.4815 Ops/s | 257.7449 Ops/s | |
| test_ppo_speed[False-None] | 6.7859ms | 6.0687ms | 164.7792 Ops/s | 165.7256 Ops/s | |
| test_ppo_speed[False-backward] | 13.5409ms | 12.9402ms | 77.2787 Ops/s | 77.9799 Ops/s | |
| test_ppo_speed[True-None] | 4.3131ms | 3.8266ms | 261.3264 Ops/s | 258.8925 Ops/s | |
| test_ppo_speed[True-backward] | 9.1559ms | 8.7773ms | 113.9309 Ops/s | 110.1940 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 4.3433ms | 3.8031ms | 262.9404 Ops/s | 264.0853 Ops/s | |
| test_reinforce_speed[False-None] | 5.0265ms | 4.6935ms | 213.0622 Ops/s | 212.9458 Ops/s | |
| test_reinforce_speed[False-backward] | 7.9582ms | 7.5897ms | 131.7578 Ops/s | 132.1637 Ops/s | |
| test_reinforce_speed[True-None] | 3.5295ms | 3.0216ms | 330.9495 Ops/s | 321.0837 Ops/s | |
| test_reinforce_speed[True-backward] | 8.3867ms | 8.0594ms | 124.0781 Ops/s | 123.5754 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.4990ms | 2.9856ms | 334.9427 Ops/s | 328.6310 Ops/s | |
| test_iql_speed[False-None] | 22.2322ms | 20.6115ms | 48.5166 Ops/s | 48.5434 Ops/s | |
| test_iql_speed[False-backward] | 33.4416ms | 31.4700ms | 31.7763 Ops/s | 31.7928 Ops/s | |
| test_iql_speed[True-None] | 9.5120ms | 8.8427ms | 113.0875 Ops/s | 113.7191 Ops/s | |
| test_iql_speed[True-backward] | 18.1484ms | 17.3439ms | 57.6573 Ops/s | 58.1855 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 9.3146ms | 8.8395ms | 113.1281 Ops/s | 109.9138 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.0643ms | 5.8261ms | 171.6408 Ops/s | 171.4327 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 3.1105ms | 0.3371ms | 2.9666 KOps/s | 3.3417 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6901ms | 0.3724ms | 2.6850 KOps/s | 3.5088 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1293ms | 5.5972ms | 178.6596 Ops/s | 175.6584 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.9356ms | 0.3140ms | 3.1846 KOps/s | 3.3783 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6068ms | 0.3116ms | 3.2090 KOps/s | 3.5963 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.7389ms | 1.2821ms | 779.9815 Ops/s | 734.5809 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6657ms | 1.1970ms | 835.4051 Ops/s | 793.2646 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 9.0588ms | 5.8635ms | 170.5470 Ops/s | 172.0745 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8654ms | 0.4628ms | 2.1606 KOps/s | 2.1526 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 1.0325ms | 0.4613ms | 2.1679 KOps/s | 2.2857 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1012ms | 5.6726ms | 176.2855 Ops/s | 176.7177 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0508ms | 0.3004ms | 3.3293 KOps/s | 2.7503 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5799ms | 0.2724ms | 3.6711 KOps/s | 3.4341 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1686ms | 5.6031ms | 178.4712 Ops/s | 178.2515 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0399ms | 0.3318ms | 3.0139 KOps/s | 3.2985 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.8652ms | 0.3367ms | 2.9698 KOps/s | 2.8068 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.2518ms | 5.7737ms | 173.1993 Ops/s | 170.5598 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.7230ms | 0.4976ms | 2.0097 KOps/s | 1.8926 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.6537ms | 0.4711ms | 2.1226 KOps/s | 1.8964 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.6355ms | 4.9671ms | 201.3233 Ops/s | 197.9394 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 4.0310ms | 2.0351ms | 491.3874 Ops/s | 478.8809 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.1189ms | 1.0897ms | 917.6674 Ops/s | 776.3732 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.6723s | 18.3946ms | 54.3637 Ops/s | 37.4812 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.9035ms | 1.7905ms | 558.4962 Ops/s | 559.4803 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.1024ms | 0.8881ms | 1.1260 KOps/s | 1.0579 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 9.4163ms | 5.2181ms | 191.6417 Ops/s | 191.4330 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 9.3107ms | 2.0899ms | 478.4842 Ops/s | 520.5824 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 10.3368ms | 1.3769ms | 726.2874 Ops/s | 736.6754 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 42.9317ms | 38.2444ms | 26.1476 Ops/s | 25.4429 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.7396ms | 18.0504ms | 55.4006 Ops/s | 53.1700 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 43.0846ms | 39.3239ms | 25.4298 Ops/s | 24.3591 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.7028ms | 18.1748ms | 55.0212 Ops/s | 52.4037 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 42.1047ms | 40.2821ms | 24.8249 Ops/s | 24.0650 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 20.5331ms | 19.5522ms | 51.1453 Ops/s | 49.8611 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8445ms | 0.2147ms | 4.6572 KOps/s | 4.3734 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.9268ms | 1.4721ms | 679.2825 Ops/s | 665.4939 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 3.1519ms | 2.4442ms | 409.1343 Ops/s | 409.1744 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.5426ms | 2.9650ms | 337.2688 Ops/s | 320.6740 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.6085ms | 0.1328ms | 7.5326 KOps/s | 7.5464 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3259ms | 0.1884ms | 5.3082 KOps/s | 5.1894 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 2.0614ms | 1.8332ms | 545.4980 Ops/s | 534.7015 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.6132ms | 1.3638ms | 733.2247 Ops/s | 722.9541 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.2455ms | 1.0811ms | 924.9761 Ops/s | 914.4627 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.8615ms | 3.4912ms | 286.4354 Ops/s | 280.4766 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 6.2534ms | 5.8792ms | 170.0899 Ops/s | 170.5760 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.7612ms | 7.3180ms | 136.6494 Ops/s | 139.7142 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4364ms | 0.2715ms | 3.6836 KOps/s | 3.5986 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 2.0086ms | 1.5536ms | 643.6631 Ops/s | 628.4667 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 3.0763ms | 2.5319ms | 394.9541 Ops/s | 390.6502 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.5670ms | 3.1614ms | 316.3180 Ops/s | 302.3802 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 32.5709ms | 31.5606ms | 31.6850 Ops/s | 31.0117 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 0.6553s | 98.9675ms | 10.1043 Ops/s | 15.6920 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 37.2861ms | 36.5556ms | 27.3556 Ops/s | 27.1435 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 72.2230ms | 71.4820ms | 13.9895 Ops/s | 13.9051 Ops/s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
When _trust_step_output is True, EnvBase.step() skips _assert_tensordict_shape,
partial_steps handling, next_preset logic, and _step_proc_data. Similarly,
TransformedEnv._step() skips partial_steps, next_preset, and _complete_done.
This eliminates all per-step Python validation overhead for well-behaved envs.
Made-with: Cursor