[BugFix] Fix CUDA graph capture for Bounded spec projection#3453
Merged
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3453
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 80.5809μs | 78.9921μs | 12.6595 KOps/s | 11.6371 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1377ms | 0.1370ms | 7.2997 KOps/s | 6.7917 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1079s | 0.1076s | 9.2924 Ops/s | 9.4897 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.6082μs | 2.5970μs | 385.0640 KOps/s | 402.0182 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 37.3257μs | 37.1357μs | 26.9283 KOps/s | 25.0581 KOps/s | |
| test_simple | 0.5470s | 0.5455s | 1.8333 Ops/s | 1.7477 Ops/s | |
| test_transformed | 1.1264s | 1.1256s | 0.8884 Ops/s | 0.8631 Ops/s | |
| test_serial | 1.7303s | 1.7279s | 0.5787 Ops/s | 0.5726 Ops/s | |
| test_parallel | 1.2177s | 1.1348s | 0.8812 Ops/s | 0.8920 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.3638ms | 44.9211μs | 22.2612 KOps/s | 21.8800 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 71.3710μs | 25.2363μs | 39.6254 KOps/s | 38.9279 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 57.6810μs | 25.3270μs | 39.4835 KOps/s | 38.2800 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 45.0600μs | 14.0030μs | 71.4134 KOps/s | 69.6196 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 81.9510μs | 48.1505μs | 20.7682 KOps/s | 20.0827 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 51.0710μs | 27.9832μs | 35.7358 KOps/s | 34.8855 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 59.5310μs | 27.7796μs | 35.9976 KOps/s | 34.4416 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 42.3010μs | 16.8897μs | 59.2076 KOps/s | 58.5434 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 86.5910μs | 51.1771μs | 19.5400 KOps/s | 19.3611 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 60.8110μs | 31.2678μs | 31.9817 KOps/s | 31.5930 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 61.5910μs | 27.7601μs | 36.0229 KOps/s | 34.7790 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 63.2010μs | 16.8616μs | 59.3065 KOps/s | 59.1894 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 85.2010μs | 53.0959μs | 18.8338 KOps/s | 18.4784 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 80.4300μs | 33.6475μs | 29.7199 KOps/s | 29.4746 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 61.5410μs | 30.1534μs | 33.1637 KOps/s | 31.8420 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 51.8210μs | 19.4856μs | 51.3200 KOps/s | 50.7352 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 82.5310μs | 51.6291μs | 19.3689 KOps/s | 19.4976 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 58.5610μs | 30.8902μs | 32.3727 KOps/s | 31.9485 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.2935ms | 31.7173μs | 31.5285 KOps/s | 30.2624 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 45.1000μs | 18.2347μs | 54.8404 KOps/s | 53.5381 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 0.1263ms | 53.6207μs | 18.6495 KOps/s | 18.9634 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 58.2400μs | 32.9109μs | 30.3851 KOps/s | 29.2680 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 73.8610μs | 33.6910μs | 29.6815 KOps/s | 28.1893 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 92.4410μs | 20.3448μs | 49.1525 KOps/s | 46.6230 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 92.4610μs | 55.5075μs | 18.0156 KOps/s | 17.5472 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 70.5800μs | 36.0511μs | 27.7384 KOps/s | 26.7778 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 65.3910μs | 33.6804μs | 29.6908 KOps/s | 28.4676 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 49.7400μs | 20.6319μs | 48.4687 KOps/s | 46.7144 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.1076ms | 58.2924μs | 17.1549 KOps/s | 16.5785 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 69.1510μs | 37.9169μs | 26.3735 KOps/s | 25.3213 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 68.3410μs | 35.7408μs | 27.9793 KOps/s | 26.5293 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 62.3500μs | 22.9963μs | 43.4853 KOps/s | 41.6360 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8653s | 0.7717s | 1.2959 Ops/s | 1.2920 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7297s | 0.6331s | 1.5796 Ops/s | 1.5701 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7687s | 1.6824s | 0.5944 Ops/s | 0.5923 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5337s | 1.4535s | 0.6880 Ops/s | 0.6807 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9948s | 1.9180s | 0.5214 Ops/s | 0.5139 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7765s | 1.7001s | 0.5882 Ops/s | 0.5811 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7832s | 4.6828s | 0.2135 Ops/s | 0.2169 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5785s | 4.4187s | 0.2263 Ops/s | 0.2259 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 2.0394s | 1.9920s | 0.5020 Ops/s | 0.5069 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7496s | 1.6717s | 0.5982 Ops/s | 0.5856 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 10.0069ms | 9.7818ms | 102.2309 Ops/s | 99.8056 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 17.4525ms | 11.5988ms | 86.2158 Ops/s | 90.6036 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2027ms | 0.1188ms | 8.4164 KOps/s | 7.7133 KOps/s | |
| test_values[td1_return_estimate-False-False] | 29.1659ms | 27.5651ms | 36.2777 Ops/s | 35.9825 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 11.3724ms | 11.0814ms | 90.2413 Ops/s | 90.6618 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 42.5650ms | 40.4529ms | 24.7201 Ops/s | 24.1942 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 17.4115ms | 11.2180ms | 89.1422 Ops/s | 90.9569 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.7156ms | 8.5973ms | 116.3162 Ops/s | 113.4057 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7357ms | 1.5159ms | 659.6680 Ops/s | 659.6805 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4768ms | 0.4212ms | 2.3743 KOps/s | 2.4072 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 29.8366ms | 29.1868ms | 34.2621 Ops/s | 38.8551 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 1.7832ms | 1.6897ms | 591.8383 Ops/s | 585.7160 Ops/s | |
| test_dqn_speed[False-None] | 1.5261ms | 1.4033ms | 712.6258 Ops/s | 700.1961 Ops/s | |
| test_dqn_speed[False-backward] | 2.0243ms | 1.9063ms | 524.5833 Ops/s | 502.4356 Ops/s | |
| test_dqn_speed[True-None] | 0.6522ms | 0.5460ms | 1.8315 KOps/s | 1.7760 KOps/s | |
| test_dqn_speed[True-backward] | 1.0309ms | 0.9882ms | 1.0119 KOps/s | 981.7778 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6811ms | 0.5347ms | 1.8702 KOps/s | 1.8259 KOps/s | |
| test_ddpg_speed[False-None] | 3.2028ms | 2.8192ms | 354.7090 Ops/s | 345.8403 Ops/s | |
| test_ddpg_speed[False-backward] | 4.1077ms | 4.0413ms | 247.4448 Ops/s | 241.5704 Ops/s | |
| test_ddpg_speed[True-None] | 1.8113ms | 1.3945ms | 717.0798 Ops/s | 705.3566 Ops/s | |
| test_ddpg_speed[True-backward] | 2.7729ms | 2.3655ms | 422.7481 Ops/s | 406.0860 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.8270ms | 1.3850ms | 722.0054 Ops/s | 703.1804 Ops/s | |
| test_sac_speed[False-None] | 8.5666ms | 8.0122ms | 124.8090 Ops/s | 124.3772 Ops/s | |
| test_sac_speed[False-backward] | 11.7921ms | 11.2914ms | 88.5631 Ops/s | 87.9772 Ops/s | |
| test_sac_speed[True-None] | 2.4605ms | 2.1354ms | 468.3048 Ops/s | 453.9554 Ops/s | |
| test_sac_speed[True-backward] | 4.2184ms | 4.0200ms | 248.7568 Ops/s | 242.9608 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.5454ms | 2.1224ms | 471.1677 Ops/s | 459.9326 Ops/s | |
| test_redq_speed[False-None] | 11.3200ms | 10.3051ms | 97.0393 Ops/s | 95.8953 Ops/s | |
| test_redq_speed[False-backward] | 18.8371ms | 17.7598ms | 56.3070 Ops/s | 55.8332 Ops/s | |
| test_redq_speed[True-None] | 5.9899ms | 4.5258ms | 220.9543 Ops/s | 223.4449 Ops/s | |
| test_redq_speed[True-backward] | 9.6941ms | 9.4166ms | 106.1958 Ops/s | 101.2607 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.7732ms | 4.3286ms | 231.0217 Ops/s | 223.4898 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.0585ms | 10.6407ms | 93.9787 Ops/s | 91.4058 Ops/s | |
| test_redq_deprec_speed[False-backward] | 15.6256ms | 15.2090ms | 65.7507 Ops/s | 64.4627 Ops/s | |
| test_redq_deprec_speed[True-None] | 3.9245ms | 3.5460ms | 282.0086 Ops/s | 263.0305 Ops/s | |
| test_redq_deprec_speed[True-backward] | 7.5444ms | 7.2500ms | 137.9305 Ops/s | 135.7290 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 3.8558ms | 3.5211ms | 283.9993 Ops/s | 268.2141 Ops/s | |
| test_td3_speed[False-None] | 8.1069ms | 7.9362ms | 126.0046 Ops/s | 123.5921 Ops/s | |
| test_td3_speed[False-backward] | 11.2379ms | 10.7550ms | 92.9803 Ops/s | 91.7588 Ops/s | |
| test_td3_speed[True-None] | 1.8810ms | 1.8167ms | 550.4364 Ops/s | 541.3208 Ops/s | |
| test_td3_speed[True-backward] | 3.6713ms | 3.5404ms | 282.4523 Ops/s | 239.6334 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.7863ms | 1.7605ms | 568.0276 Ops/s | 538.3055 Ops/s | |
| test_cql_speed[False-None] | 28.3830ms | 25.7850ms | 38.7823 Ops/s | 38.8273 Ops/s | |
| test_cql_speed[False-backward] | 38.1860ms | 34.8535ms | 28.6916 Ops/s | 27.4262 Ops/s | |
| test_cql_speed[True-None] | 12.3737ms | 12.0845ms | 82.7507 Ops/s | 81.5096 Ops/s | |
| test_cql_speed[True-backward] | 17.9461ms | 17.3390ms | 57.6734 Ops/s | 56.5462 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 12.4868ms | 12.0715ms | 82.8398 Ops/s | 65.7883 Ops/s | |
| test_a2c_speed[False-None] | 5.6003ms | 5.3459ms | 187.0577 Ops/s | 181.0824 Ops/s | |
| test_a2c_speed[False-backward] | 11.9137ms | 11.6834ms | 85.5913 Ops/s | 84.2902 Ops/s | |
| test_a2c_speed[True-None] | 3.8251ms | 3.7070ms | 269.7605 Ops/s | 268.4232 Ops/s | |
| test_a2c_speed[True-backward] | 8.6862ms | 8.4762ms | 117.9780 Ops/s | 115.8618 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 3.8454ms | 3.6824ms | 271.5625 Ops/s | 269.6632 Ops/s | |
| test_ppo_speed[False-None] | 6.0020ms | 5.8032ms | 172.3197 Ops/s | 166.5460 Ops/s | |
| test_ppo_speed[False-backward] | 12.5083ms | 12.2108ms | 81.8948 Ops/s | 80.4794 Ops/s | |
| test_ppo_speed[True-None] | 3.7870ms | 3.6190ms | 276.3198 Ops/s | 271.3793 Ops/s | |
| test_ppo_speed[True-backward] | 8.5983ms | 8.3234ms | 120.1434 Ops/s | 113.6067 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 3.7278ms | 3.5815ms | 279.2121 Ops/s | 275.0553 Ops/s | |
| test_reinforce_speed[False-None] | 4.8203ms | 4.4613ms | 224.1520 Ops/s | 217.2272 Ops/s | |
| test_reinforce_speed[False-backward] | 7.3808ms | 7.1732ms | 139.4085 Ops/s | 135.3076 Ops/s | |
| test_reinforce_speed[True-None] | 2.9905ms | 2.8710ms | 348.3145 Ops/s | 334.9090 Ops/s | |
| test_reinforce_speed[True-backward] | 7.7496ms | 7.5476ms | 132.4930 Ops/s | 129.0363 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.0804ms | 2.8378ms | 352.3818 Ops/s | 346.7866 Ops/s | |
| test_iql_speed[False-None] | 24.3619ms | 19.8706ms | 50.3257 Ops/s | 48.5029 Ops/s | |
| test_iql_speed[False-backward] | 35.0545ms | 29.9515ms | 33.3873 Ops/s | 32.6657 Ops/s | |
| test_iql_speed[True-None] | 8.6605ms | 8.3890ms | 119.2038 Ops/s | 114.6129 Ops/s | |
| test_iql_speed[True-backward] | 16.6562ms | 16.2960ms | 61.3648 Ops/s | 60.2092 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 8.6370ms | 8.4542ms | 118.2838 Ops/s | 116.3588 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2812ms | 6.0572ms | 165.0936 Ops/s | 160.4248 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.9934ms | 0.3730ms | 2.6810 KOps/s | 3.3451 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5618ms | 0.3530ms | 2.8327 KOps/s | 3.6748 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.3041ms | 5.8275ms | 171.6008 Ops/s | 165.9663 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7252ms | 0.3670ms | 2.7245 KOps/s | 3.3849 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5524ms | 0.3380ms | 2.9590 KOps/s | 3.3680 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.8245ms | 1.4729ms | 678.9137 Ops/s | 768.0558 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6308ms | 1.3467ms | 742.5386 Ops/s | 842.1466 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 9.3469ms | 6.0829ms | 164.3963 Ops/s | 161.1935 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 2.1588ms | 0.4781ms | 2.0918 KOps/s | 2.1941 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7001ms | 0.4464ms | 2.2401 KOps/s | 2.2873 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.0156ms | 5.8367ms | 171.3309 Ops/s | 165.6921 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9161ms | 0.3831ms | 2.6101 KOps/s | 3.4619 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5846ms | 0.3698ms | 2.7044 KOps/s | 3.4652 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.0238ms | 5.7777ms | 173.0795 Ops/s | 165.9388 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6974ms | 0.3223ms | 3.1025 KOps/s | 3.0490 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.4782ms | 0.2595ms | 3.8534 KOps/s | 3.4277 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.3832ms | 5.9526ms | 167.9947 Ops/s | 161.5447 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 2.4433ms | 0.4908ms | 2.0377 KOps/s | 2.0570 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7373ms | 0.4701ms | 2.1271 KOps/s | 2.2204 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.4872ms | 5.0233ms | 199.0738 Ops/s | 51.6161 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 8.2882ms | 1.8763ms | 532.9567 Ops/s | 506.7679 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 10.4253ms | 1.2399ms | 806.4983 Ops/s | 836.3193 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 7.4608ms | 5.0504ms | 198.0055 Ops/s | 198.0838 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.8857ms | 1.7349ms | 576.3884 Ops/s | 483.9157 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.0572ms | 0.8666ms | 1.1540 KOps/s | 897.0640 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.5488s | 16.2108ms | 61.6874 Ops/s | 191.0213 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 4.1800ms | 1.8853ms | 530.4325 Ops/s | 507.6167 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.3652ms | 1.0572ms | 945.9093 Ops/s | 989.1316 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 38.2750ms | 35.8165ms | 27.9201 Ops/s | 28.0391 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.8035ms | 17.8224ms | 56.1090 Ops/s | 55.1490 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 40.3777ms | 36.8985ms | 27.1013 Ops/s | 26.5672 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.2045ms | 18.6084ms | 53.7391 Ops/s | 53.7863 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 41.7219ms | 39.5666ms | 25.2738 Ops/s | 25.2211 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.9014ms | 20.4003ms | 49.0189 Ops/s | 50.4907 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8987ms | 0.2259ms | 4.4267 KOps/s | 4.3286 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.5550ms | 1.3839ms | 722.5846 Ops/s | 726.1910 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.5208ms | 2.3660ms | 422.6620 Ops/s | 420.4097 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.0401ms | 2.8521ms | 350.6217 Ops/s | 347.8934 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2232ms | 0.1315ms | 7.6036 KOps/s | 7.4847 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3445ms | 0.1788ms | 5.5940 KOps/s | 5.4920 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.8992ms | 1.7356ms | 576.1811 Ops/s | 582.3595 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.4572ms | 1.2826ms | 779.6718 Ops/s | 791.5148 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.3026ms | 1.1129ms | 898.5173 Ops/s | 896.9222 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.6531ms | 3.5304ms | 283.2562 Ops/s | 273.0237 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 5.9710ms | 5.4534ms | 183.3719 Ops/s | 173.7980 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.2855ms | 6.9891ms | 143.0796 Ops/s | 146.2636 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4439ms | 0.2738ms | 3.6521 KOps/s | 3.4781 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6742ms | 1.5001ms | 666.6434 Ops/s | 656.3810 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.7745ms | 2.4858ms | 402.2885 Ops/s | 398.0225 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3991ms | 3.0791ms | 324.7697 Ops/s | 315.8878 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 34.3655ms | 33.5346ms | 29.8199 Ops/s | 29.5949 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 66.4780ms | 66.1934ms | 15.1072 Ops/s | 14.9139 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 0.5823s | 58.9650ms | 16.9592 Ops/s | 25.7437 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 76.5379ms | 75.4594ms | 13.2522 Ops/s | 13.1631 Ops/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 83.8439μs | 81.1762μs | 12.3189 KOps/s | 11.5663 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1423ms | 0.1415ms | 7.0696 KOps/s | 6.7212 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1110s | 0.1103s | 9.0702 Ops/s | 8.7079 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.6482μs | 2.6439μs | 378.2309 KOps/s | 365.4122 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 37.3413μs | 37.0379μs | 26.9994 KOps/s | 25.3141 KOps/s | |
| test_simple | 0.8226s | 0.8130s | 1.2300 Ops/s | 1.1828 Ops/s | |
| test_transformed | 1.5661s | 1.4751s | 0.6779 Ops/s | 0.6741 Ops/s | |
| test_serial | 2.4597s | 2.3694s | 0.4221 Ops/s | 0.4143 Ops/s | |
| test_parallel | 2.0827s | 1.9680s | 0.5081 Ops/s | 0.4883 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.1992ms | 46.5797μs | 21.4686 KOps/s | 21.8135 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 61.9810μs | 26.5108μs | 37.7205 KOps/s | 39.3866 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 0.1281ms | 25.3309μs | 39.4775 KOps/s | 39.3049 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 54.3710μs | 14.3841μs | 69.5210 KOps/s | 71.4162 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 0.1098ms | 50.1952μs | 19.9222 KOps/s | 20.4486 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 54.0300μs | 28.9053μs | 34.5957 KOps/s | 35.2578 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 81.3710μs | 28.4015μs | 35.2094 KOps/s | 34.9884 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 47.1310μs | 17.4332μs | 57.3618 KOps/s | 58.7148 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 81.8310μs | 52.9587μs | 18.8826 KOps/s | 19.0766 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 63.3210μs | 32.2318μs | 31.0252 KOps/s | 31.6307 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 58.8100μs | 28.1733μs | 35.4946 KOps/s | 35.2835 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 71.8410μs | 17.2251μs | 58.0548 KOps/s | 57.9122 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 96.8820μs | 54.5887μs | 18.3188 KOps/s | 18.1864 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 80.0610μs | 35.3174μs | 28.3146 KOps/s | 29.1347 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 81.6910μs | 31.5103μs | 31.7356 KOps/s | 32.1958 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 71.0010μs | 20.6166μs | 48.5045 KOps/s | 50.8459 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 81.5210μs | 53.4163μs | 18.7209 KOps/s | 19.1310 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 62.9710μs | 32.8047μs | 30.4835 KOps/s | 31.9907 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.2786ms | 32.6194μs | 30.6566 KOps/s | 30.0929 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 48.6510μs | 18.9717μs | 52.7100 KOps/s | 52.1717 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 0.1198ms | 54.9282μs | 18.2056 KOps/s | 18.0987 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 84.6610μs | 34.8193μs | 28.7197 KOps/s | 28.9214 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 0.1015ms | 35.4178μs | 28.2344 KOps/s | 28.1843 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 61.6710μs | 21.7937μs | 45.8847 KOps/s | 45.8678 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 0.1010ms | 58.4514μs | 17.1082 KOps/s | 17.4935 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 76.6710μs | 38.2471μs | 26.1458 KOps/s | 26.5462 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 90.7710μs | 34.6955μs | 28.8222 KOps/s | 27.6802 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 53.7700μs | 21.6123μs | 46.2699 KOps/s | 46.8705 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.4932ms | 59.7051μs | 16.7490 KOps/s | 16.9028 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 77.9710μs | 40.1223μs | 24.9238 KOps/s | 25.2091 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 0.4577ms | 37.7144μs | 26.5150 KOps/s | 26.8653 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 0.4642ms | 24.4620μs | 40.8797 KOps/s | 41.4279 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8739s | 0.7834s | 1.2766 Ops/s | 1.2792 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7363s | 0.6401s | 1.5622 Ops/s | 1.5517 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7768s | 1.6958s | 0.5897 Ops/s | 0.5880 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5425s | 1.4694s | 0.6805 Ops/s | 0.6761 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0200s | 1.9426s | 0.5148 Ops/s | 0.5121 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.8051s | 1.7213s | 0.5810 Ops/s | 0.5781 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7060s | 4.6312s | 0.2159 Ops/s | 0.2117 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.5680s | 4.4963s | 0.2224 Ops/s | 0.2238 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 2.0602s | 1.9862s | 0.5035 Ops/s | 0.5038 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7688s | 1.6806s | 0.5950 Ops/s | 0.5846 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 21.5569ms | 21.0887ms | 47.4188 Ops/s | 47.1892 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1328s | 3.5916ms | 278.4245 Ops/s | 259.0226 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1124ms | 85.6226μs | 11.6792 KOps/s | 11.6824 KOps/s | |
| test_values[td1_return_estimate-False-False] | 50.4791ms | 49.7804ms | 20.0882 Ops/s | 19.9915 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3089ms | 1.1045ms | 905.3724 Ops/s | 900.5397 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 82.7304ms | 81.7123ms | 12.2381 Ops/s | 12.1019 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.2784ms | 1.1020ms | 907.4202 Ops/s | 904.0775 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 21.5184ms | 21.3186ms | 46.9074 Ops/s | 46.6643 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0419ms | 0.7766ms | 1.2877 KOps/s | 1.2852 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7653ms | 0.6950ms | 1.4388 KOps/s | 1.3559 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5604ms | 1.5070ms | 663.5836 Ops/s | 659.6050 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7777ms | 0.7115ms | 1.4054 KOps/s | 1.3865 KOps/s | |
| test_dqn_speed[False-None] | 1.6347ms | 1.5655ms | 638.7631 Ops/s | 629.4472 Ops/s | |
| test_dqn_speed[False-backward] | 2.5265ms | 2.2237ms | 449.7103 Ops/s | 441.0059 Ops/s | |
| test_dqn_speed[True-None] | 0.6893ms | 0.5654ms | 1.7686 KOps/s | 1.7271 KOps/s | |
| test_dqn_speed[True-backward] | 1.2656ms | 1.2182ms | 820.8593 Ops/s | 817.0723 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6657ms | 0.5921ms | 1.6890 KOps/s | 1.6224 KOps/s | |
| test_ddpg_speed[False-None] | 3.3333ms | 2.9611ms | 337.7180 Ops/s | 330.5641 Ops/s | |
| test_ddpg_speed[False-backward] | 4.8256ms | 4.4219ms | 226.1449 Ops/s | 222.8785 Ops/s | |
| test_ddpg_speed[True-None] | 1.4326ms | 1.3378ms | 747.4867 Ops/s | 745.0031 Ops/s | |
| test_ddpg_speed[True-backward] | 2.6401ms | 2.5573ms | 391.0376 Ops/s | 389.3255 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4715ms | 1.3665ms | 731.7876 Ops/s | 728.4990 Ops/s | |
| test_sac_speed[False-None] | 9.9836ms | 8.6625ms | 115.4406 Ops/s | 115.5112 Ops/s | |
| test_sac_speed[False-backward] | 12.2043ms | 11.6584ms | 85.7749 Ops/s | 83.0591 Ops/s | |
| test_sac_speed[True-None] | 2.0584ms | 1.8439ms | 542.3392 Ops/s | 537.5706 Ops/s | |
| test_sac_speed[True-backward] | 3.5643ms | 3.4861ms | 286.8499 Ops/s | 271.1755 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 20.0095ms | 11.1976ms | 89.3048 Ops/s | 89.2234 Ops/s | |
| test_redq_deprec_speed[False-None] | 10.0381ms | 9.5150ms | 105.0976 Ops/s | 104.2512 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.2718ms | 12.7031ms | 78.7210 Ops/s | 76.0140 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.7070ms | 2.5628ms | 390.1981 Ops/s | 383.8010 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.2899ms | 4.1626ms | 240.2335 Ops/s | 225.9285 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 16.4456ms | 10.1015ms | 98.9954 Ops/s | 99.5453 Ops/s | |
| test_td3_speed[False-None] | 8.6773ms | 8.4453ms | 118.4089 Ops/s | 111.8853 Ops/s | |
| test_td3_speed[False-backward] | 11.3444ms | 10.8824ms | 91.8913 Ops/s | 89.8804 Ops/s | |
| test_td3_speed[True-None] | 1.6834ms | 1.6511ms | 605.6514 Ops/s | 591.9918 Ops/s | |
| test_td3_speed[True-backward] | 3.2144ms | 3.1487ms | 317.5925 Ops/s | 297.9734 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 73.8719ms | 25.4426ms | 39.3041 Ops/s | 39.1117 Ops/s | |
| test_cql_speed[False-None] | 18.3187ms | 17.7283ms | 56.4071 Ops/s | 56.0474 Ops/s | |
| test_cql_speed[False-backward] | 24.0049ms | 23.1262ms | 43.2411 Ops/s | 42.2238 Ops/s | |
| test_cql_speed[True-None] | 3.4022ms | 3.2847ms | 304.4439 Ops/s | 281.9197 Ops/s | |
| test_cql_speed[True-backward] | 5.9864ms | 5.3843ms | 185.7245 Ops/s | 174.3002 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 19.4274ms | 12.0839ms | 82.7548 Ops/s | 83.2283 Ops/s | |
| test_a2c_speed[False-None] | 4.0207ms | 3.3729ms | 296.4771 Ops/s | 297.7744 Ops/s | |
| test_a2c_speed[False-backward] | 6.8347ms | 6.3870ms | 156.5683 Ops/s | 150.6744 Ops/s | |
| test_a2c_speed[True-None] | 1.4639ms | 1.3458ms | 743.0430 Ops/s | 729.7385 Ops/s | |
| test_a2c_speed[True-backward] | 3.4883ms | 3.1834ms | 314.1265 Ops/s | 311.7182 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.1787ms | 1.0055ms | 994.4896 Ops/s | 994.8987 Ops/s | |
| test_ppo_speed[False-None] | 4.0942ms | 3.9480ms | 253.2955 Ops/s | 251.7072 Ops/s | |
| test_ppo_speed[False-backward] | 7.9923ms | 7.3911ms | 135.2979 Ops/s | 138.5974 Ops/s | |
| test_ppo_speed[True-None] | 1.5776ms | 1.4531ms | 688.2011 Ops/s | 687.1273 Ops/s | |
| test_ppo_speed[True-backward] | 3.3208ms | 3.2852ms | 304.3984 Ops/s | 298.1566 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.1980ms | 1.0546ms | 948.2294 Ops/s | 909.4386 Ops/s | |
| test_reinforce_speed[False-None] | 2.5740ms | 2.3434ms | 426.7338 Ops/s | 420.1233 Ops/s | |
| test_reinforce_speed[False-backward] | 3.8486ms | 3.4031ms | 293.8454 Ops/s | 282.2635 Ops/s | |
| test_reinforce_speed[True-None] | 1.4141ms | 1.3199ms | 757.6539 Ops/s | 770.5362 Ops/s | |
| test_reinforce_speed[True-backward] | 2.9805ms | 2.9327ms | 340.9860 Ops/s | 331.0514 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 0.4529s | 10.4586ms | 95.6150 Ops/s | 104.0287 Ops/s | |
| test_iql_speed[False-None] | 9.9704ms | 9.6652ms | 103.4636 Ops/s | 103.0250 Ops/s | |
| test_iql_speed[False-backward] | 13.9227ms | 13.4837ms | 74.1635 Ops/s | 74.2332 Ops/s | |
| test_iql_speed[True-None] | 2.3601ms | 2.2153ms | 451.3968 Ops/s | 447.2570 Ops/s | |
| test_iql_speed[True-backward] | 5.0471ms | 4.7660ms | 209.8203 Ops/s | 200.4161 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 18.3044ms | 10.7188ms | 93.2939 Ops/s | 94.5986 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.3897ms | 6.1399ms | 162.8699 Ops/s | 159.8555 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.8874ms | 0.3818ms | 2.6193 KOps/s | 3.1647 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6379ms | 0.3692ms | 2.7087 KOps/s | 3.3158 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.3004ms | 6.0133ms | 166.2982 Ops/s | 166.0873 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.3832ms | 0.3187ms | 3.1380 KOps/s | 2.8614 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.7169ms | 0.2946ms | 3.3945 KOps/s | 3.2311 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.7522ms | 1.4380ms | 695.3886 Ops/s | 707.6580 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.4332ms | 1.2179ms | 821.1017 Ops/s | 731.5519 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.4089ms | 6.1994ms | 161.3060 Ops/s | 161.5305 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9332ms | 0.4755ms | 2.1032 KOps/s | 2.2899 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7953ms | 0.5260ms | 1.9013 KOps/s | 2.1942 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2132ms | 6.0526ms | 165.2173 Ops/s | 165.7334 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.8876ms | 0.3216ms | 3.1094 KOps/s | 3.1321 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5244ms | 0.3166ms | 3.1587 KOps/s | 2.8617 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.3058ms | 5.9935ms | 166.8480 Ops/s | 168.1736 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.8377ms | 0.3167ms | 3.1578 KOps/s | 2.7970 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5753ms | 0.3212ms | 3.1136 KOps/s | 3.1344 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.3354ms | 6.1976ms | 161.3538 Ops/s | 161.3699 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.9631ms | 0.4661ms | 2.1455 KOps/s | 2.1897 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7029ms | 0.4892ms | 2.0442 KOps/s | 2.0041 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.6500s | 18.1184ms | 55.1925 Ops/s | 191.4180 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 8.3424ms | 2.0434ms | 489.3779 Ops/s | 495.3038 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 9.4998ms | 1.3066ms | 765.3587 Ops/s | 993.2289 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 7.1461ms | 5.2460ms | 190.6200 Ops/s | 190.5996 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 3.9628ms | 1.8132ms | 551.5059 Ops/s | 498.7986 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.1093ms | 0.9266ms | 1.0792 KOps/s | 777.6168 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.6007s | 17.3173ms | 57.7458 Ops/s | 48.1891 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 4.2563ms | 1.9569ms | 511.0154 Ops/s | 455.9667 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.2058ms | 1.1473ms | 871.6345 Ops/s | 876.3190 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 39.4911ms | 37.0136ms | 27.0171 Ops/s | 26.3768 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 19.6466ms | 18.2195ms | 54.8864 Ops/s | 54.5077 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 40.8604ms | 37.6575ms | 26.5552 Ops/s | 25.9542 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.7622ms | 18.6228ms | 53.6976 Ops/s | 53.0594 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 42.4893ms | 39.9210ms | 25.0495 Ops/s | 24.7443 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 22.0778ms | 20.8876ms | 47.8753 Ops/s | 48.7144 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.9271ms | 0.2373ms | 4.2148 KOps/s | 4.4948 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.6624ms | 1.4179ms | 705.2898 Ops/s | 691.4757 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.7204ms | 2.2947ms | 435.7922 Ops/s | 432.1262 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1208ms | 2.9520ms | 338.7558 Ops/s | 337.1903 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.5015ms | 0.1514ms | 6.6054 KOps/s | 6.4509 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3807ms | 0.2087ms | 4.7915 KOps/s | 4.4692 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 2.0897ms | 1.8167ms | 550.4414 Ops/s | 557.3680 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5532ms | 1.3418ms | 745.2599 Ops/s | 776.6210 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.3742ms | 1.1788ms | 848.3427 Ops/s | 856.6189 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.8650ms | 3.6995ms | 270.3038 Ops/s | 274.1923 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 11.3122ms | 5.7723ms | 173.2400 Ops/s | 172.1791 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.3110ms | 7.1071ms | 140.7035 Ops/s | 133.3587 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4622ms | 0.2759ms | 3.6248 KOps/s | 3.4394 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6829ms | 1.4843ms | 673.7138 Ops/s | 636.2263 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.9868ms | 2.4061ms | 415.6152 Ops/s | 415.5374 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3923ms | 3.1736ms | 315.1027 Ops/s | 313.3269 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 35.3399ms | 34.8313ms | 28.7098 Ops/s | 28.6965 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 69.1895ms | 68.2090ms | 14.6608 Ops/s | 14.2748 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 40.9820ms | 39.4420ms | 25.3537 Ops/s | 25.0612 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 78.5831ms | 77.7333ms | 12.8645 Ops/s | 12.9219 Ops/s | |
| test_collector_without_rb_cuda[100-img_shape0-atari] | 60.2530ms | 58.4997ms | 17.0941 Ops/s | 17.2069 Ops/s | |
| test_collector_without_rb_cuda[200-img_shape1-large_batch] | 0.1183s | 0.1155s | 8.6561 Ops/s | 8.5729 Ops/s | |
| test_collector_with_rb_cuda[100-img_shape0-atari] | 61.5590ms | 59.8362ms | 16.7123 Ops/s | 16.4630 Ops/s | |
| test_collector_with_rb_cuda[200-img_shape1-large_batch] | 0.1226s | 0.1204s | 8.3027 Ops/s | 8.3909 Ops/s |
Cache device-specific bounds tensors in Bounded._get_space_bounds() to avoid .to(device) calls during CUDA graph capture. Previously, Bounded._project() and Bounded.is_in() called .to(device) on low/high bounds during every forward pass. This created DeviceCopy operations that are incompatible with CUDA graph capture, causing: - "operation not permitted when stream is capturing" errors - Graph partitioning warnings reducing performance The fix adds lazy per-device caching: during warmup, the cache is populated with .to() results. During capture and replay, cached tensors are returned directly, avoiding the problematic device copies. The cache is excluded from serialization via __getstate__ to avoid pickling CUDA tensors, which could cause issues when loading on different devices or machines. Co-authored-by: Cursor <cursoragent@cursor.com>
3926f01 to
06b14cd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix CUDA graph capture compatibility for
Boundedspec by caching device-specific bounds tensors.Problem:
Bounded._project()andBounded.is_in()called.to(device)onlow/highbounds during every forward passDeviceCopyoperations incompatible with CUDA graph captureSolution:
_get_space_bounds(device)helper that caches(low, high)tensors per deviceChanges
Bounded._get_space_bounds()method for lazy per-device bounds cachingBounded._project()to use cached boundsBounded.is_in()to use cached boundsTest plan
Verified on cluster with CUDA graph capture:
Also tested with Dreamer collector policy compilation with
cudagraphs=True:Made with Cursor