Skip to content

[Feature] AsyncBatchedCollector: coordinator loop and direct submission mode#3499

Open
vmoens wants to merge 6 commits intogh/vmoens/241/basefrom
gh/vmoens/241/head
Open

[Feature] AsyncBatchedCollector: coordinator loop and direct submission mode#3499
vmoens wants to merge 6 commits intogh/vmoens/241/basefrom
gh/vmoens/241/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 12, 2026

Stack from ghstack (oldest at bottom):

Rewrite the AsyncBatchedCollector to use a coordinator thread that
pipelines env stepping and batched inference without a global sync
barrier. Add a direct=True mode where each env thread submits
directly to the InferenceServer, eliminating the coordinator thread
and its serialization overhead.

Benchmark results (8 mock pixel envs, Nature-CNN, CPU):
AsyncBatchedCollector direct: 3183 fps (+72% vs coordinator)
AsyncBatchedCollector threading: 1850 fps (coordinator mode)
AsyncBatchedCollector mp: 1042 fps (coordinator mode)

Co-authored-by: Cursor cursoragent@cursor.com

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3499

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit c583dcd with merge base 266e4aa (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Feb 12, 2026
…on mode

Rewrite the AsyncBatchedCollector to use a coordinator thread that
pipelines env stepping and batched inference without a global sync
barrier. Add a `direct=True` mode where each env thread submits
directly to the InferenceServer, eliminating the coordinator thread
and its serialization overhead.

Benchmark results (8 mock pixel envs, Nature-CNN, CPU):
  AsyncBatchedCollector direct:    3183 fps (+72% vs coordinator)
  AsyncBatchedCollector threading: 1850 fps (coordinator mode)
  AsyncBatchedCollector mp:        1042 fps (coordinator mode)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 225d2a4
Pull-Request: #3499
@github-actions github-actions bot added the Feature New feature label Feb 12, 2026
@github-actions github-actions bot added Benchmarks rl/benchmark changes Collectors and removed Feature New feature labels Feb 12, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2026
@github-actions github-actions bot added the Feature New feature label Feb 12, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 12, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 173. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.4355μs 83.9417μs 11.9130 KOps/s 12.4012 KOps/s $\color{#d91a1a}-3.94\%$
test_tensor_to_bytestream_speed[torch.save] 0.1443ms 0.1439ms 6.9514 KOps/s 7.1860 KOps/s $\color{#d91a1a}-3.27\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1017s 0.1013s 9.8689 Ops/s 9.7064 Ops/s $\color{#35bf28}+1.67\%$
test_tensor_to_bytestream_speed[numpy] 2.5678μs 2.5599μs 390.6396 KOps/s 403.8303 KOps/s $\color{#d91a1a}-3.27\%$
test_tensor_to_bytestream_speed[safetensors] 39.0869μs 38.8879μs 25.7150 KOps/s 26.9108 KOps/s $\color{#d91a1a}-4.44\%$
test_simple 0.5428s 0.5413s 1.8473 Ops/s 1.7838 Ops/s $\color{#35bf28}+3.56\%$
test_transformed 1.0765s 1.0751s 0.9301 Ops/s 0.9125 Ops/s $\color{#35bf28}+1.93\%$
test_serial 1.6451s 1.6434s 0.6085 Ops/s 0.6038 Ops/s $\color{#35bf28}+0.78\%$
test_parallel 1.1265s 1.0257s 0.9749 Ops/s 0.9832 Ops/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[True-True-True-True-True] 0.1679ms 41.5736μs 24.0537 KOps/s 24.6411 KOps/s $\color{#d91a1a}-2.38\%$
test_step_mdp_speed[True-True-True-True-False] 63.2410μs 23.5299μs 42.4991 KOps/s 42.4314 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[True-True-True-False-True] 59.7810μs 23.6467μs 42.2892 KOps/s 42.9463 KOps/s $\color{#d91a1a}-1.53\%$
test_step_mdp_speed[True-True-True-False-False] 49.6410μs 12.9889μs 76.9889 KOps/s 78.4521 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[True-True-False-True-True] 79.3420μs 44.6939μs 22.3744 KOps/s 22.8478 KOps/s $\color{#d91a1a}-2.07\%$
test_step_mdp_speed[True-True-False-True-False] 63.1820μs 26.0605μs 38.3722 KOps/s 39.3679 KOps/s $\color{#d91a1a}-2.53\%$
test_step_mdp_speed[True-True-False-False-True] 61.4410μs 26.0096μs 38.4473 KOps/s 40.3306 KOps/s $\color{#d91a1a}-4.67\%$
test_step_mdp_speed[True-True-False-False-False] 48.9500μs 15.4105μs 64.8909 KOps/s 65.5748 KOps/s $\color{#d91a1a}-1.04\%$
test_step_mdp_speed[True-False-True-True-True] 80.8710μs 47.8732μs 20.8885 KOps/s 21.6170 KOps/s $\color{#d91a1a}-3.37\%$
test_step_mdp_speed[True-False-True-True-False] 64.8010μs 29.2892μs 34.1423 KOps/s 35.1547 KOps/s $\color{#d91a1a}-2.88\%$
test_step_mdp_speed[True-False-True-False-True] 56.9910μs 26.0726μs 38.3544 KOps/s 39.1060 KOps/s $\color{#d91a1a}-1.92\%$
test_step_mdp_speed[True-False-True-False-False] 41.0310μs 15.7919μs 63.3236 KOps/s 64.4999 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[True-False-False-True-True] 82.5520μs 49.7043μs 20.1190 KOps/s 20.6984 KOps/s $\color{#d91a1a}-2.80\%$
test_step_mdp_speed[True-False-False-True-False] 66.3610μs 31.3011μs 31.9478 KOps/s 32.4991 KOps/s $\color{#d91a1a}-1.70\%$
test_step_mdp_speed[True-False-False-False-True] 62.7810μs 28.4167μs 35.1905 KOps/s 35.4090 KOps/s $\color{#d91a1a}-0.62\%$
test_step_mdp_speed[True-False-False-False-False] 42.0500μs 18.2045μs 54.9315 KOps/s 56.1476 KOps/s $\color{#d91a1a}-2.17\%$
test_step_mdp_speed[False-True-True-True-True] 78.9610μs 47.2680μs 21.1560 KOps/s 21.1728 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-True-True-True-False] 46.4310μs 28.6657μs 34.8849 KOps/s 35.6420 KOps/s $\color{#d91a1a}-2.12\%$
test_step_mdp_speed[False-True-True-False-True] 2.5331ms 30.4827μs 32.8055 KOps/s 32.6281 KOps/s $\color{#35bf28}+0.54\%$
test_step_mdp_speed[False-True-True-False-False] 46.3110μs 17.4470μs 57.3165 KOps/s 57.7083 KOps/s $\color{#d91a1a}-0.68\%$
test_step_mdp_speed[False-True-False-True-True] 81.2910μs 49.6059μs 20.1589 KOps/s 19.5078 KOps/s $\color{#35bf28}+3.34\%$
test_step_mdp_speed[False-True-False-True-False] 60.6310μs 31.2928μs 31.9562 KOps/s 31.6995 KOps/s $\color{#35bf28}+0.81\%$
test_step_mdp_speed[False-True-False-False-True] 58.9910μs 32.3176μs 30.9429 KOps/s 30.6625 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[False-True-False-False-False] 54.3110μs 19.7764μs 50.5652 KOps/s 50.1203 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[False-False-True-True-True] 94.9310μs 52.9113μs 18.8996 KOps/s 19.1889 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[False-False-True-True-False] 61.8410μs 34.3334μs 29.1262 KOps/s 29.1031 KOps/s $\color{#35bf28}+0.08\%$
test_step_mdp_speed[False-False-True-False-True] 78.2910μs 32.4154μs 30.8496 KOps/s 30.6955 KOps/s $\color{#35bf28}+0.50\%$
test_step_mdp_speed[False-False-True-False-False] 53.6900μs 20.2274μs 49.4379 KOps/s 49.5752 KOps/s $\color{#d91a1a}-0.28\%$
test_step_mdp_speed[False-False-False-True-True] 90.0320μs 54.5743μs 18.3236 KOps/s 18.2620 KOps/s $\color{#35bf28}+0.34\%$
test_step_mdp_speed[False-False-False-True-False] 0.1100ms 35.9868μs 27.7879 KOps/s 27.2942 KOps/s $\color{#35bf28}+1.81\%$
test_step_mdp_speed[False-False-False-False-True] 67.6810μs 34.6701μs 28.8433 KOps/s 28.9698 KOps/s $\color{#d91a1a}-0.44\%$
test_step_mdp_speed[False-False-False-False-False] 53.7910μs 22.1387μs 45.1698 KOps/s 44.8807 KOps/s $\color{#35bf28}+0.64\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8436s 0.7416s 1.3484 Ops/s 1.3558 Ops/s $\color{#d91a1a}-0.55\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6932s 0.6090s 1.6420 Ops/s 1.6617 Ops/s $\color{#d91a1a}-1.19\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7661s 1.6731s 0.5977 Ops/s 0.6139 Ops/s $\color{#d91a1a}-2.64\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4822s 1.4067s 0.7109 Ops/s 0.7132 Ops/s $\color{#d91a1a}-0.33\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9435s 1.8626s 0.5369 Ops/s 0.5350 Ops/s $\color{#35bf28}+0.36\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7533s 1.6611s 0.6020 Ops/s 0.6077 Ops/s $\color{#d91a1a}-0.94\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6698s 4.6021s 0.2173 Ops/s 0.2170 Ops/s $\color{#35bf28}+0.12\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4581s 4.3620s 0.2293 Ops/s 0.2264 Ops/s $\color{#35bf28}+1.24\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9338s 1.8449s 0.5420 Ops/s 0.5341 Ops/s $\color{#35bf28}+1.49\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6276s 1.5501s 0.6451 Ops/s 0.6390 Ops/s $\color{#35bf28}+0.95\%$
test_values[generalized_advantage_estimate-True-True] 10.0011ms 9.7916ms 102.1281 Ops/s 102.6640 Ops/s $\color{#d91a1a}-0.52\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.1155ms 17.4168ms 57.4157 Ops/s 56.2353 Ops/s $\color{#35bf28}+2.10\%$
test_values[td0_return_estimate-False-False] 0.2075ms 0.1254ms 7.9713 KOps/s 4.7963 KOps/s $\textbf{\color{#35bf28}+66.20\%}$
test_values[td1_return_estimate-False-False] 26.7421ms 26.3658ms 37.9279 Ops/s 38.4612 Ops/s $\color{#d91a1a}-1.39\%$
test_values[vec_td1_return_estimate-False-False] 20.5360ms 17.7245ms 56.4191 Ops/s 56.1317 Ops/s $\color{#35bf28}+0.51\%$
test_values[td_lambda_return_estimate-True-False] 39.6530ms 38.9424ms 25.6789 Ops/s 26.0006 Ops/s $\color{#d91a1a}-1.24\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.4655ms 17.5660ms 56.9281 Ops/s 55.8712 Ops/s $\color{#35bf28}+1.89\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.6859ms 8.6264ms 115.9230 Ops/s 116.3894 Ops/s $\color{#d91a1a}-0.40\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7215ms 1.5087ms 662.8021 Ops/s 661.7838 Ops/s $\color{#35bf28}+0.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4881ms 0.4053ms 2.4674 KOps/s 2.4899 KOps/s $\color{#d91a1a}-0.90\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.1992ms 34.5617ms 28.9338 Ops/s 28.7967 Ops/s $\color{#35bf28}+0.48\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8853ms 1.7202ms 581.3412 Ops/s 585.5632 Ops/s $\color{#d91a1a}-0.72\%$
test_dqn_speed[False-None] 1.4967ms 1.3611ms 734.6944 Ops/s 731.0159 Ops/s $\color{#35bf28}+0.50\%$
test_dqn_speed[False-backward] 1.9664ms 1.8620ms 537.0648 Ops/s 541.9404 Ops/s $\color{#d91a1a}-0.90\%$
test_dqn_speed[True-None] 0.6844ms 0.5424ms 1.8435 KOps/s 1.8134 KOps/s $\color{#35bf28}+1.66\%$
test_dqn_speed[True-backward] 1.1274ms 0.9854ms 1.0148 KOps/s 849.4576 Ops/s $\textbf{\color{#35bf28}+19.46\%}$
test_dqn_speed[reduce-overhead-None] 0.5798ms 0.5273ms 1.8965 KOps/s 1.8534 KOps/s $\color{#35bf28}+2.33\%$
test_ddpg_speed[False-None] 3.1046ms 2.7804ms 359.6616 Ops/s 352.6151 Ops/s $\color{#35bf28}+2.00\%$
test_ddpg_speed[False-backward] 4.0538ms 3.9363ms 254.0487 Ops/s 254.2734 Ops/s $\color{#d91a1a}-0.09\%$
test_ddpg_speed[True-None] 1.5716ms 1.3964ms 716.1406 Ops/s 720.5199 Ops/s $\color{#d91a1a}-0.61\%$
test_ddpg_speed[True-backward] 2.3883ms 2.3459ms 426.2803 Ops/s 366.3585 Ops/s $\textbf{\color{#35bf28}+16.36\%}$
test_ddpg_speed[reduce-overhead-None] 1.5002ms 1.3814ms 723.8966 Ops/s 700.3871 Ops/s $\color{#35bf28}+3.36\%$
test_sac_speed[False-None] 8.3609ms 7.7292ms 129.3799 Ops/s 129.5013 Ops/s $\color{#d91a1a}-0.09\%$
test_sac_speed[False-backward] 11.1905ms 10.8433ms 92.2232 Ops/s 92.9572 Ops/s $\color{#d91a1a}-0.79\%$
test_sac_speed[True-None] 2.3166ms 2.1374ms 467.8517 Ops/s 452.4028 Ops/s $\color{#35bf28}+3.41\%$
test_sac_speed[True-backward] 4.0785ms 3.9659ms 252.1482 Ops/s 235.5251 Ops/s $\textbf{\color{#35bf28}+7.06\%}$
test_sac_speed[reduce-overhead-None] 2.2938ms 2.1287ms 469.7811 Ops/s 475.5219 Ops/s $\color{#d91a1a}-1.21\%$
test_redq_speed[False-None] 13.5202ms 10.2690ms 97.3807 Ops/s 100.3654 Ops/s $\color{#d91a1a}-2.97\%$
test_redq_speed[False-backward] 18.7753ms 17.5491ms 56.9829 Ops/s 59.4519 Ops/s $\color{#d91a1a}-4.15\%$
test_redq_speed[True-None] 4.7231ms 4.3835ms 228.1287 Ops/s 226.9820 Ops/s $\color{#35bf28}+0.51\%$
test_redq_speed[True-backward] 10.0445ms 9.7723ms 102.3298 Ops/s 103.5432 Ops/s $\color{#d91a1a}-1.17\%$
test_redq_speed[reduce-overhead-None] 4.8742ms 4.3858ms 228.0063 Ops/s 223.5392 Ops/s $\color{#35bf28}+2.00\%$
test_redq_deprec_speed[False-None] 11.3546ms 10.7311ms 93.1873 Ops/s 93.6663 Ops/s $\color{#d91a1a}-0.51\%$
test_redq_deprec_speed[False-backward] 16.1891ms 15.4068ms 64.9063 Ops/s 66.2743 Ops/s $\color{#d91a1a}-2.06\%$
test_redq_deprec_speed[True-None] 3.8098ms 3.6277ms 275.6598 Ops/s 270.9693 Ops/s $\color{#35bf28}+1.73\%$
test_redq_deprec_speed[True-backward] 7.5851ms 7.3709ms 135.6691 Ops/s 136.9931 Ops/s $\color{#d91a1a}-0.97\%$
test_redq_deprec_speed[reduce-overhead-None] 3.8498ms 3.5441ms 282.1577 Ops/s 283.6548 Ops/s $\color{#d91a1a}-0.53\%$
test_td3_speed[False-None] 7.9697ms 7.7779ms 128.5695 Ops/s 128.4114 Ops/s $\color{#35bf28}+0.12\%$
test_td3_speed[False-backward] 10.8510ms 10.5028ms 95.2131 Ops/s 94.8757 Ops/s $\color{#35bf28}+0.36\%$
test_td3_speed[True-None] 2.2732ms 1.8403ms 543.3861 Ops/s 542.4061 Ops/s $\color{#35bf28}+0.18\%$
test_td3_speed[True-backward] 3.8484ms 3.6264ms 275.7579 Ops/s 246.7815 Ops/s $\textbf{\color{#35bf28}+11.74\%}$
test_td3_speed[reduce-overhead-None] 1.8439ms 1.7826ms 560.9874 Ops/s 549.6944 Ops/s $\color{#35bf28}+2.05\%$
test_cql_speed[False-None] 29.2419ms 25.9021ms 38.6069 Ops/s 39.2085 Ops/s $\color{#d91a1a}-1.53\%$
test_cql_speed[False-backward] 41.3921ms 35.5981ms 28.0914 Ops/s 28.8870 Ops/s $\color{#d91a1a}-2.75\%$
test_cql_speed[True-None] 12.8935ms 12.3717ms 80.8295 Ops/s 82.0127 Ops/s $\color{#d91a1a}-1.44\%$
test_cql_speed[True-backward] 18.5168ms 18.1509ms 55.0938 Ops/s 54.1750 Ops/s $\color{#35bf28}+1.70\%$
test_cql_speed[reduce-overhead-None] 12.7091ms 12.3760ms 80.8017 Ops/s 80.5922 Ops/s $\color{#35bf28}+0.26\%$
test_a2c_speed[False-None] 5.6249ms 5.3467ms 187.0301 Ops/s 187.5239 Ops/s $\color{#d91a1a}-0.26\%$
test_a2c_speed[False-backward] 12.1369ms 11.5859ms 86.3119 Ops/s 86.4450 Ops/s $\color{#d91a1a}-0.15\%$
test_a2c_speed[True-None] 4.1721ms 3.7192ms 268.8732 Ops/s 271.3160 Ops/s $\color{#d91a1a}-0.90\%$
test_a2c_speed[True-backward] 9.1001ms 8.5640ms 116.7682 Ops/s 106.5249 Ops/s $\textbf{\color{#35bf28}+9.62\%}$
test_a2c_speed[reduce-overhead-None] 4.1763ms 3.7042ms 269.9619 Ops/s 269.5602 Ops/s $\color{#35bf28}+0.15\%$
test_ppo_speed[False-None] 6.2403ms 5.8635ms 170.5463 Ops/s 172.4129 Ops/s $\color{#d91a1a}-1.08\%$
test_ppo_speed[False-backward] 12.6487ms 12.2446ms 81.6686 Ops/s 82.2948 Ops/s $\color{#d91a1a}-0.76\%$
test_ppo_speed[True-None] 3.9716ms 3.6038ms 277.4881 Ops/s 278.3743 Ops/s $\color{#d91a1a}-0.32\%$
test_ppo_speed[True-backward] 8.5365ms 8.3648ms 119.5487 Ops/s 120.1945 Ops/s $\color{#d91a1a}-0.54\%$
test_ppo_speed[reduce-overhead-None] 3.7535ms 3.6183ms 276.3762 Ops/s 276.8814 Ops/s $\color{#d91a1a}-0.18\%$
test_reinforce_speed[False-None] 4.9189ms 4.5114ms 221.6605 Ops/s 226.2260 Ops/s $\color{#d91a1a}-2.02\%$
test_reinforce_speed[False-backward] 7.4412ms 7.2149ms 138.6019 Ops/s 139.7823 Ops/s $\color{#d91a1a}-0.84\%$
test_reinforce_speed[True-None] 3.4590ms 2.9003ms 344.7940 Ops/s 336.9547 Ops/s $\color{#35bf28}+2.33\%$
test_reinforce_speed[True-backward] 8.2408ms 7.7120ms 129.6676 Ops/s 132.5259 Ops/s $\color{#d91a1a}-2.16\%$
test_reinforce_speed[reduce-overhead-None] 3.2251ms 2.8614ms 349.4770 Ops/s 354.4638 Ops/s $\color{#d91a1a}-1.41\%$
test_iql_speed[False-None] 20.3158ms 19.6724ms 50.8326 Ops/s 50.9146 Ops/s $\color{#d91a1a}-0.16\%$
test_iql_speed[False-backward] 30.9451ms 29.7813ms 33.5781 Ops/s 33.5752 Ops/s $+0.01\%$
test_iql_speed[True-None] 8.9041ms 8.5168ms 117.4157 Ops/s 117.9447 Ops/s $\color{#d91a1a}-0.45\%$
test_iql_speed[True-backward] 16.9866ms 16.5553ms 60.4036 Ops/s 58.8125 Ops/s $\color{#35bf28}+2.71\%$
test_iql_speed[reduce-overhead-None] 9.0711ms 8.5627ms 116.7857 Ops/s 117.3078 Ops/s $\color{#d91a1a}-0.45\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.5044ms 6.0522ms 165.2279 Ops/s 164.1403 Ops/s $\color{#35bf28}+0.66\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.1670ms 0.2807ms 3.5626 KOps/s 3.3658 KOps/s $\textbf{\color{#35bf28}+5.85\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5237ms 0.3120ms 3.2049 KOps/s 3.8117 KOps/s $\textbf{\color{#d91a1a}-15.92\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2291ms 5.8928ms 169.6984 Ops/s 171.2132 Ops/s $\color{#d91a1a}-0.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7945ms 0.3024ms 3.3064 KOps/s 3.5550 KOps/s $\textbf{\color{#d91a1a}-6.99\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6317ms 0.2993ms 3.3406 KOps/s 3.9043 KOps/s $\textbf{\color{#d91a1a}-14.44\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6533ms 1.2559ms 796.2576 Ops/s 804.0953 Ops/s $\color{#d91a1a}-0.97\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5566ms 1.1667ms 857.1273 Ops/s 868.3154 Ops/s $\color{#d91a1a}-1.29\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.4585ms 6.1287ms 163.1671 Ops/s 167.0838 Ops/s $\color{#d91a1a}-2.34\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0487ms 0.4263ms 2.3459 KOps/s 2.2328 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8596ms 0.5353ms 1.8682 KOps/s 2.4346 KOps/s $\textbf{\color{#d91a1a}-23.26\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2993ms 5.8757ms 170.1931 Ops/s 173.0149 Ops/s $\color{#d91a1a}-1.63\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.7038ms 0.3084ms 3.2427 KOps/s 2.7054 KOps/s $\textbf{\color{#35bf28}+19.86\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5759ms 0.3238ms 3.0885 KOps/s 2.7833 KOps/s $\textbf{\color{#35bf28}+10.97\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4909ms 5.8655ms 170.4888 Ops/s 171.4604 Ops/s $\color{#d91a1a}-0.57\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0356ms 0.3083ms 3.2440 KOps/s 2.7358 KOps/s $\textbf{\color{#35bf28}+18.58\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7012ms 0.2872ms 3.4816 KOps/s 2.8971 KOps/s $\textbf{\color{#35bf28}+20.18\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1998ms 6.0488ms 165.3213 Ops/s 166.8789 Ops/s $\color{#d91a1a}-0.93\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9232ms 0.5069ms 1.9727 KOps/s 1.9693 KOps/s $\color{#35bf28}+0.17\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6856ms 0.4980ms 2.0082 KOps/s 2.0563 KOps/s $\color{#d91a1a}-2.34\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.6325ms 5.0919ms 196.3915 Ops/s 199.6703 Ops/s $\color{#d91a1a}-1.64\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.8394ms 2.1272ms 470.0997 Ops/s 452.0521 Ops/s $\color{#35bf28}+3.99\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.0999ms 1.0985ms 910.3000 Ops/s 1.1540 KOps/s $\textbf{\color{#d91a1a}-21.12\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5574s 16.2351ms 61.5950 Ops/s 58.0587 Ops/s $\textbf{\color{#35bf28}+6.09\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9556ms 1.7654ms 566.4516 Ops/s 513.5848 Ops/s $\textbf{\color{#35bf28}+10.29\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.0742ms 0.9076ms 1.1018 KOps/s 794.5530 Ops/s $\textbf{\color{#35bf28}+38.67\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.5366ms 5.3837ms 185.7467 Ops/s 190.4632 Ops/s $\color{#d91a1a}-2.48\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.3258ms 2.0564ms 486.2798 Ops/s 493.0474 Ops/s $\color{#d91a1a}-1.37\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.2197ms 1.1651ms 858.2984 Ops/s 942.8862 Ops/s $\textbf{\color{#d91a1a}-8.97\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 40.2344ms 35.9140ms 27.8443 Ops/s 28.0547 Ops/s $\color{#d91a1a}-0.75\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.1506ms 17.8059ms 56.1610 Ops/s 55.5942 Ops/s $\color{#35bf28}+1.02\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 40.9200ms 36.9788ms 27.0425 Ops/s 26.8316 Ops/s $\color{#35bf28}+0.79\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.6681ms 18.1378ms 55.1333 Ops/s 54.5772 Ops/s $\color{#35bf28}+1.02\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.2425ms 38.5927ms 25.9116 Ops/s 25.3430 Ops/s $\color{#35bf28}+2.24\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.6189ms 20.1565ms 49.6117 Ops/s 49.0555 Ops/s $\color{#35bf28}+1.13\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8713ms 0.2237ms 4.4698 KOps/s 2.3341 KOps/s $\textbf{\color{#35bf28}+91.50\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.8481ms 1.3802ms 724.5442 Ops/s 709.6567 Ops/s $\color{#35bf28}+2.10\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7119ms 2.2625ms 441.9856 Ops/s 420.1617 Ops/s $\textbf{\color{#35bf28}+5.19\%}$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.3935ms 2.9008ms 344.7295 Ops/s 341.3319 Ops/s $\color{#35bf28}+1.00\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2487ms 0.1319ms 7.5824 KOps/s 7.6276 KOps/s $\color{#d91a1a}-0.59\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3468ms 0.1762ms 5.6749 KOps/s 5.3287 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.0623ms 1.7614ms 567.7395 Ops/s 558.0763 Ops/s $\color{#35bf28}+1.73\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.3961ms 1.2800ms 781.2352 Ops/s 777.4343 Ops/s $\color{#35bf28}+0.49\%$
test_collector_stack_then_write[50-img_shape0-small] 1.5153ms 1.1137ms 897.9403 Ops/s 893.6280 Ops/s $\color{#35bf28}+0.48\%$
test_collector_stack_then_write[100-img_shape1-atari] 4.0130ms 3.5400ms 282.4870 Ops/s 280.5371 Ops/s $\color{#35bf28}+0.70\%$
test_collector_stack_then_write[100-img_shape2-large_img] 6.5416ms 5.5577ms 179.9293 Ops/s 180.9658 Ops/s $\color{#d91a1a}-0.57\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4078ms 6.9119ms 144.6778 Ops/s 143.4382 Ops/s $\color{#35bf28}+0.86\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.7170ms 0.2778ms 3.6003 KOps/s 3.6253 KOps/s $\color{#d91a1a}-0.69\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.9424ms 1.4868ms 672.5713 Ops/s 657.9243 Ops/s $\color{#35bf28}+2.23\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.5302ms 2.3929ms 417.9048 Ops/s 402.1758 Ops/s $\color{#35bf28}+3.91\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.2960ms 3.1194ms 320.5794 Ops/s 317.2372 Ops/s $\color{#35bf28}+1.05\%$
test_collector_without_rb[100-img_shape0-atari] 33.6044ms 32.4504ms 30.8163 Ops/s 30.7904 Ops/s $\color{#35bf28}+0.08\%$
test_collector_without_rb[200-img_shape1-large_batch] 65.3389ms 63.6581ms 15.7089 Ops/s 15.6162 Ops/s $\color{#35bf28}+0.59\%$
test_collector_with_rb[100-img_shape0-atari] 37.9688ms 37.0339ms 27.0023 Ops/s 27.0913 Ops/s $\color{#d91a1a}-0.33\%$
test_collector_with_rb[200-img_shape1-large_batch] 72.3797ms 71.8261ms 13.9225 Ops/s 13.8838 Ops/s $\color{#35bf28}+0.28\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 12, 2026

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_tensor_to_bytestream_speed[pickle] 83.5064μs 81.3028μs 12.2997 KOps/s
test_tensor_to_bytestream_speed[torch.save] 0.1417ms 0.1384ms 7.2230 KOps/s
test_tensor_to_bytestream_speed[untyped_storage] 0.1066s 0.1059s 9.4425 Ops/s
test_tensor_to_bytestream_speed[numpy] 2.5027μs 2.4955μs 400.7239 KOps/s
test_tensor_to_bytestream_speed[safetensors] 35.9728μs 35.7298μs 27.9878 KOps/s
test_simple 0.7897s 0.7884s 1.2684 Ops/s
test_transformed 1.3750s 1.3733s 0.7282 Ops/s
test_serial 2.4073s 2.3157s 0.4318 Ops/s
test_parallel 1.8984s 1.8222s 0.5488 Ops/s
test_step_mdp_speed[True-True-True-True-True] 0.2398ms 42.1777μs 23.7092 KOps/s
test_step_mdp_speed[True-True-True-True-False] 55.1910μs 24.7630μs 40.3828 KOps/s
test_step_mdp_speed[True-True-True-False-True] 60.3410μs 24.2732μs 41.1976 KOps/s
test_step_mdp_speed[True-True-True-False-False] 36.9210μs 13.2711μs 75.3519 KOps/s
test_step_mdp_speed[True-True-False-True-True] 84.0910μs 46.9425μs 21.3026 KOps/s
test_step_mdp_speed[True-True-False-True-False] 58.5710μs 26.6966μs 37.4579 KOps/s
test_step_mdp_speed[True-True-False-False-True] 62.0720μs 26.6078μs 37.5830 KOps/s
test_step_mdp_speed[True-True-False-False-False] 42.3810μs 15.8443μs 63.1142 KOps/s
test_step_mdp_speed[True-False-True-True-True] 91.9220μs 48.0312μs 20.8198 KOps/s
test_step_mdp_speed[True-False-True-True-False] 70.7110μs 29.4416μs 33.9655 KOps/s
test_step_mdp_speed[True-False-True-False-True] 62.7210μs 26.4236μs 37.8450 KOps/s
test_step_mdp_speed[True-False-True-False-False] 45.2210μs 16.0690μs 62.2316 KOps/s
test_step_mdp_speed[True-False-False-True-True] 94.3410μs 50.5189μs 19.7946 KOps/s
test_step_mdp_speed[True-False-False-True-False] 62.9310μs 31.7091μs 31.5367 KOps/s
test_step_mdp_speed[True-False-False-False-True] 68.2620μs 28.7948μs 34.7285 KOps/s
test_step_mdp_speed[True-False-False-False-False] 44.4700μs 18.4163μs 54.2997 KOps/s
test_step_mdp_speed[False-True-True-True-True] 83.5120μs 48.5976μs 20.5771 KOps/s
test_step_mdp_speed[False-True-True-True-False] 65.9220μs 29.4148μs 33.9964 KOps/s
test_step_mdp_speed[False-True-True-False-True] 2.4673ms 30.6199μs 32.6585 KOps/s
test_step_mdp_speed[False-True-True-False-False] 52.8410μs 17.8561μs 56.0031 KOps/s
test_step_mdp_speed[False-True-False-True-True] 0.1009ms 51.7710μs 19.3158 KOps/s
test_step_mdp_speed[False-True-False-True-False] 79.3420μs 31.8065μs 31.4401 KOps/s
test_step_mdp_speed[False-True-False-False-True] 69.1610μs 32.8569μs 30.4350 KOps/s
test_step_mdp_speed[False-True-False-False-False] 51.2310μs 20.3767μs 49.0757 KOps/s
test_step_mdp_speed[False-False-True-True-True] 98.3520μs 53.1222μs 18.8245 KOps/s
test_step_mdp_speed[False-False-True-True-False] 69.0920μs 34.5810μs 28.9176 KOps/s
test_step_mdp_speed[False-False-True-False-True] 68.9820μs 32.3965μs 30.8675 KOps/s
test_step_mdp_speed[False-False-True-False-False] 46.0610μs 19.8513μs 50.3745 KOps/s
test_step_mdp_speed[False-False-False-True-True] 84.8310μs 54.8433μs 18.2338 KOps/s
test_step_mdp_speed[False-False-False-True-False] 73.3320μs 36.9397μs 27.0711 KOps/s
test_step_mdp_speed[False-False-False-False-True] 79.2410μs 34.1336μs 29.2966 KOps/s
test_step_mdp_speed[False-False-False-False-False] 63.6020μs 22.0564μs 45.3384 KOps/s
test_non_tensor_env_rollout_speed[1000-single-True] 0.8347s 0.7355s 1.3596 Ops/s
test_non_tensor_env_rollout_speed[1000-single-False] 0.6971s 0.6006s 1.6649 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6842s 1.6074s 0.6221 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4672s 1.3867s 0.7211 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9333s 1.8508s 0.5403 Ops/s
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7127s 1.6344s 0.6118 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6809s 4.5832s 0.2182 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5440s 4.3770s 0.2285 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9094s 1.8370s 0.5444 Ops/s
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6324s 1.5454s 0.6471 Ops/s
test_values[generalized_advantage_estimate-True-True] 21.5107ms 21.1289ms 47.3286 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1333s 3.5989ms 277.8593 Ops/s
test_values[td0_return_estimate-False-False] 0.1089ms 84.9912μs 11.7659 KOps/s
test_values[td1_return_estimate-False-False] 50.8436ms 50.3876ms 19.8461 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.3496ms 1.1095ms 901.2894 Ops/s
test_values[td_lambda_return_estimate-True-False] 82.8291ms 82.3472ms 12.1437 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.3454ms 1.1077ms 902.8079 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.6026ms 21.3754ms 46.7827 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0464ms 0.7757ms 1.2892 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7411ms 0.6949ms 1.4391 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6122ms 1.5067ms 663.6850 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7533ms 0.7122ms 1.4041 KOps/s
test_dqn_speed[False-None] 1.6757ms 1.5330ms 652.3245 Ops/s
test_dqn_speed[False-backward] 2.5595ms 2.1947ms 455.6499 Ops/s
test_dqn_speed[True-None] 0.7598ms 0.5766ms 1.7342 KOps/s
test_dqn_speed[True-backward] 1.2418ms 1.2045ms 830.2414 Ops/s
test_dqn_speed[reduce-overhead-None] 0.7423ms 0.5825ms 1.7168 KOps/s
test_ddpg_speed[False-None] 3.2669ms 2.8950ms 345.4261 Ops/s
test_ddpg_speed[False-backward] 4.8392ms 4.3138ms 231.8136 Ops/s
test_ddpg_speed[True-None] 1.4262ms 1.3100ms 763.3498 Ops/s
test_ddpg_speed[True-backward] 2.5894ms 2.5039ms 399.3784 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.6051ms 1.3376ms 747.6051 Ops/s
test_sac_speed[False-None] 8.6640ms 8.2297ms 121.5111 Ops/s
test_sac_speed[False-backward] 12.1112ms 11.5823ms 86.3390 Ops/s
test_sac_speed[True-None] 1.9528ms 1.8072ms 553.3383 Ops/s
test_sac_speed[True-backward] 3.6724ms 3.5695ms 280.1520 Ops/s
test_sac_speed[reduce-overhead-None] 19.1455ms 10.8713ms 91.9851 Ops/s
test_redq_deprec_speed[False-None] 9.9984ms 9.3098ms 107.4140 Ops/s
test_redq_deprec_speed[False-backward] 12.7612ms 12.5449ms 79.7135 Ops/s
test_redq_deprec_speed[True-None] 2.6954ms 2.5506ms 392.0695 Ops/s
test_redq_deprec_speed[True-backward] 4.6127ms 4.1755ms 239.4944 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 15.8644ms 9.7373ms 102.6977 Ops/s
test_td3_speed[False-None] 8.4182ms 8.2065ms 121.8540 Ops/s
test_td3_speed[False-backward] 11.3659ms 10.6663ms 93.7532 Ops/s
test_td3_speed[True-None] 1.7203ms 1.6485ms 606.6169 Ops/s
test_td3_speed[True-backward] 3.2239ms 3.1070ms 321.8585 Ops/s
test_td3_speed[reduce-overhead-None] 46.5535ms 23.5681ms 42.4303 Ops/s
test_cql_speed[False-None] 17.5033ms 17.2180ms 58.0787 Ops/s
test_cql_speed[False-backward] 23.1175ms 22.6188ms 44.2110 Ops/s
test_cql_speed[True-None] 3.3188ms 3.2530ms 307.4058 Ops/s
test_cql_speed[True-backward] 5.8265ms 5.3803ms 185.8630 Ops/s
test_cql_speed[reduce-overhead-None] 18.7821ms 11.8336ms 84.5048 Ops/s
test_a2c_speed[False-None] 4.0591ms 3.2587ms 306.8704 Ops/s
test_a2c_speed[False-backward] 6.7370ms 6.2345ms 160.3972 Ops/s
test_a2c_speed[True-None] 1.4051ms 1.3335ms 749.9327 Ops/s
test_a2c_speed[True-backward] 3.0158ms 2.9601ms 337.8303 Ops/s
test_a2c_speed[reduce-overhead-None] 1.0611ms 0.9704ms 1.0305 KOps/s
test_ppo_speed[False-None] 4.0856ms 3.8855ms 257.3654 Ops/s
test_ppo_speed[False-backward] 7.5020ms 7.0622ms 141.5994 Ops/s
test_ppo_speed[True-None] 1.4639ms 1.3826ms 723.2694 Ops/s
test_ppo_speed[True-backward] 3.1923ms 3.1028ms 322.2870 Ops/s
test_ppo_speed[reduce-overhead-None] 1.1424ms 1.0339ms 967.2381 Ops/s
test_reinforce_speed[False-None] 2.4148ms 2.2812ms 438.3576 Ops/s
test_reinforce_speed[False-backward] 3.3636ms 3.3175ms 301.4314 Ops/s
test_reinforce_speed[True-None] 1.4411ms 1.2733ms 785.3892 Ops/s
test_reinforce_speed[True-backward] 2.9928ms 2.8981ms 345.0551 Ops/s
test_reinforce_speed[reduce-overhead-None] 17.2523ms 9.3489ms 106.9644 Ops/s
test_iql_speed[False-None] 10.6924ms 9.4356ms 105.9812 Ops/s
test_iql_speed[False-backward] 13.9284ms 13.4345ms 74.4354 Ops/s
test_iql_speed[True-None] 2.3013ms 2.1583ms 463.3223 Ops/s
test_iql_speed[True-backward] 5.0323ms 4.8396ms 206.6294 Ops/s
test_iql_speed[reduce-overhead-None] 17.6554ms 10.4620ms 95.5842 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1999ms 5.8157ms 171.9470 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1980ms 0.3409ms 2.9335 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6161ms 0.3279ms 3.0494 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.6788ms 5.4770ms 182.5813 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.4318ms 0.2711ms 3.6885 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4808ms 0.2523ms 3.9631 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4825ms 1.2623ms 792.1927 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4363ms 1.1900ms 840.3486 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.3256ms 5.8452ms 171.0803 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.5319ms 0.4887ms 2.0460 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8341ms 0.4467ms 2.2385 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.6551ms 5.5331ms 180.7294 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8069s 1.3414ms 745.4776 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5972ms 0.3509ms 2.8495 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8905ms 5.6525ms 176.9136 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0511ms 0.2940ms 3.4009 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5305ms 0.3063ms 3.2643 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0910ms 5.8014ms 172.3711 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3592ms 0.4250ms 2.3528 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6593ms 0.4540ms 2.2025 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4327ms 4.9947ms 200.2132 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.1882ms 2.0725ms 482.5158 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.5652ms 0.9702ms 1.0308 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5848s 16.6971ms 59.8907 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.7581ms 1.9280ms 518.6794 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 9.4944ms 1.2776ms 782.7132 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.1811ms 5.2267ms 191.3256 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.3664ms 2.0581ms 485.8908 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.3786ms 1.0870ms 919.9928 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 39.3543ms 36.0516ms 27.7380 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.3705ms 18.3632ms 54.4568 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 41.8324ms 37.8371ms 26.4291 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.6705ms 18.8473ms 53.0580 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 41.8724ms 39.7249ms 25.1732 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.7391ms 20.2564ms 49.3671 Ops/s
test_storage_write_lazystack[50-img_shape0-small] 0.8952ms 0.2254ms 4.4357 KOps/s
test_storage_write_lazystack[100-img_shape1-atari] 1.8480ms 1.4164ms 706.0161 Ops/s
test_storage_write_lazystack[100-img_shape2-large_img] 2.7079ms 2.2861ms 437.4175 Ops/s
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1556ms 2.9246ms 341.9280 Ops/s
test_storage_write_contiguous[50-img_shape0-small] 0.2865ms 0.1602ms 6.2435 KOps/s
test_storage_write_contiguous[100-img_shape1-atari] 0.5118ms 0.2424ms 4.1262 KOps/s
test_storage_write_contiguous[100-img_shape2-large_img] 1.9983ms 1.8206ms 549.2638 Ops/s
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5208ms 1.3930ms 717.8660 Ops/s
test_collector_stack_then_write[50-img_shape0-small] 1.3827ms 1.1199ms 892.9414 Ops/s
test_collector_stack_then_write[100-img_shape1-atari] 7.5444ms 3.6882ms 271.1368 Ops/s
test_collector_stack_then_write[100-img_shape2-large_img] 11.1201ms 5.6869ms 175.8437 Ops/s
test_collector_stack_then_write[200-img_shape3-large_batch] 15.0042ms 7.0086ms 142.6817 Ops/s
test_collector_lazystack_then_write[50-img_shape0-small] 0.4419ms 0.2753ms 3.6329 KOps/s
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7119ms 1.5223ms 656.9179 Ops/s
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6433ms 2.4380ms 410.1680 Ops/s
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3120ms 3.1193ms 320.5829 Ops/s
test_collector_without_rb[100-img_shape0-atari] 34.0447ms 33.1633ms 30.1538 Ops/s
test_collector_without_rb[200-img_shape1-large_batch] 65.3519ms 64.9141ms 15.4050 Ops/s
test_collector_with_rb[100-img_shape0-atari] 38.1093ms 37.3234ms 26.7928 Ops/s
test_collector_with_rb[200-img_shape1-large_batch] 74.3413ms 73.6662ms 13.5748 Ops/s
test_collector_without_rb_cuda[100-img_shape0-atari] 56.3078ms 55.8787ms 17.8959 Ops/s
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1118s 0.1114s 8.9751 Ops/s
test_collector_with_rb_cuda[100-img_shape0-atari] 58.1228ms 57.7792ms 17.3073 Ops/s
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1158s 0.1153s 8.6717 Ops/s

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 12, 2026
…on mode

Rewrite the AsyncBatchedCollector to use a coordinator thread that
pipelines env stepping and batched inference without a global sync
barrier. Add a `direct=True` mode where each env thread submits
directly to the InferenceServer, eliminating the coordinator thread
and its serialization overhead.

Benchmark results (8 mock pixel envs, Nature-CNN, CPU):
  AsyncBatchedCollector direct:    3183 fps (+72% vs coordinator)
  AsyncBatchedCollector threading: 1850 fps (coordinator mode)
  AsyncBatchedCollector mp:        1042 fps (coordinator mode)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: c4d370a
Pull-Request: #3499
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 13, 2026
…on mode

Rewrite the AsyncBatchedCollector to use a coordinator thread that
pipelines env stepping and batched inference without a global sync
barrier. Add a `direct=True` mode where each env thread submits
directly to the InferenceServer, eliminating the coordinator thread
and its serialization overhead.

Benchmark results (8 mock pixel envs, Nature-CNN, CPU):
  AsyncBatchedCollector direct:    3183 fps (+72% vs coordinator)
  AsyncBatchedCollector threading: 1850 fps (coordinator mode)
  AsyncBatchedCollector mp:        1042 fps (coordinator mode)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 784abfc
Pull-Request: #3499
Co-authored-by: Cursor <cursoragent@cursor.com>
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 14, 2026
…on mode

Rewrite the AsyncBatchedCollector to use a coordinator thread that
pipelines env stepping and batched inference without a global sync
barrier. Add a `direct=True` mode where each env thread submits
directly to the InferenceServer, eliminating the coordinator thread
and its serialization overhead.

Benchmark results (8 mock pixel envs, Nature-CNN, CPU):
  AsyncBatchedCollector direct:    3183 fps (+72% vs coordinator)
  AsyncBatchedCollector threading: 1850 fps (coordinator mode)
  AsyncBatchedCollector mp:        1042 fps (coordinator mode)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 270d3e3
Pull-Request: #3499
Co-authored-by: Cursor <cursoragent@cursor.com>
[ghstack-poisoned]
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Benchmarks rl/benchmark changes CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Examples Feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant