[Feature] Auto-batching inference server: Monarch transport#3496
Open
vmoens wants to merge 4 commits intogh/vmoens/238/basefrom
Open
[Feature] Auto-batching inference server: Monarch transport#3496vmoens wants to merge 4 commits intogh/vmoens/238/basefrom
vmoens wants to merge 4 commits intogh/vmoens/238/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3496
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New FailuresAs of commit afcd83c with merge base 266e4aa ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 79.1073μs | 78.3351μs | 12.7657 KOps/s | 12.5799 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1364ms | 0.1355ms | 7.3814 KOps/s | 7.2570 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1004s | 99.9200ms | 10.0080 Ops/s | 9.9547 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.3939μs | 2.3827μs | 419.6911 KOps/s | 419.5212 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 38.6377μs | 38.3305μs | 26.0889 KOps/s | 26.0055 KOps/s | |
| test_simple | 0.5293s | 0.5282s | 1.8934 Ops/s | 1.8025 Ops/s | |
| test_transformed | 1.0581s | 1.0545s | 0.9483 Ops/s | 0.9347 Ops/s | |
| test_serial | 1.6240s | 1.6124s | 0.6202 Ops/s | 0.6104 Ops/s | |
| test_parallel | 1.1037s | 1.0127s | 0.9875 Ops/s | 0.9852 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.1764ms | 41.6180μs | 24.0281 KOps/s | 24.3212 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 55.5710μs | 22.8869μs | 43.6930 KOps/s | 43.1974 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 56.2610μs | 23.6663μs | 42.2542 KOps/s | 42.3721 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 42.7800μs | 13.0366μs | 76.7070 KOps/s | 77.0425 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 77.6310μs | 44.9967μs | 22.2238 KOps/s | 22.6882 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 50.9510μs | 25.4987μs | 39.2177 KOps/s | 38.9354 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 61.7410μs | 26.6905μs | 37.4665 KOps/s | 38.8234 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 47.2410μs | 15.6826μs | 63.7648 KOps/s | 65.1403 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 85.2310μs | 47.7273μs | 20.9524 KOps/s | 21.3317 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 59.3210μs | 28.6064μs | 34.9572 KOps/s | 34.9750 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 56.4310μs | 26.4875μs | 37.7537 KOps/s | 39.1539 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 47.0610μs | 15.3436μs | 65.1737 KOps/s | 65.2486 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 85.8410μs | 50.3587μs | 19.8575 KOps/s | 20.2698 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 62.0510μs | 30.9334μs | 32.3276 KOps/s | 32.6436 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 62.8110μs | 28.8080μs | 34.7125 KOps/s | 36.0489 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 47.5310μs | 18.0708μs | 55.3378 KOps/s | 55.8231 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 85.0710μs | 46.9523μs | 21.2982 KOps/s | 21.6781 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 56.9810μs | 28.2852μs | 35.3542 KOps/s | 34.5516 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.4604ms | 30.3240μs | 32.9772 KOps/s | 33.2902 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 47.3600μs | 17.2058μs | 58.1198 KOps/s | 57.1094 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 94.8810μs | 49.5527μs | 20.1806 KOps/s | 20.2716 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 61.6600μs | 30.7238μs | 32.5481 KOps/s | 32.5573 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 65.0010μs | 32.0272μs | 31.2235 KOps/s | 31.2457 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 57.9810μs | 19.5352μs | 51.1896 KOps/s | 50.9237 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 88.5020μs | 52.2940μs | 19.1226 KOps/s | 19.2926 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 82.8710μs | 33.2392μs | 30.0850 KOps/s | 29.6607 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 67.0310μs | 32.4154μs | 30.8495 KOps/s | 31.1540 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 46.8210μs | 19.6756μs | 50.8243 KOps/s | 50.8378 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 87.8610μs | 54.2650μs | 18.4281 KOps/s | 18.7530 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 65.8310μs | 35.9172μs | 27.8418 KOps/s | 28.1103 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 74.1410μs | 33.9388μs | 29.4648 KOps/s | 29.3852 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 66.2710μs | 21.4776μs | 46.5602 KOps/s | 45.5494 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7047s | 0.7018s | 1.4250 Ops/s | 1.3866 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.6886s | 0.5922s | 1.6885 Ops/s | 1.7031 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.6775s | 1.5980s | 0.6258 Ops/s | 0.6271 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.4546s | 1.3742s | 0.7277 Ops/s | 0.7265 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9131s | 1.8313s | 0.5460 Ops/s | 0.5475 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.6901s | 1.6092s | 0.6214 Ops/s | 0.6231 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6711s | 4.5208s | 0.2212 Ops/s | 0.2218 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.3604s | 4.2957s | 0.2328 Ops/s | 0.2307 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.8940s | 1.8212s | 0.5491 Ops/s | 0.5250 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.6232s | 1.5406s | 0.6491 Ops/s | 0.6542 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 9.7803ms | 9.6008ms | 104.1575 Ops/s | 104.3828 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 19.4372ms | 17.5167ms | 57.0883 Ops/s | 54.6831 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2414ms | 0.1281ms | 7.8051 KOps/s | 7.7865 KOps/s | |
| test_values[td1_return_estimate-False-False] | 26.1156ms | 25.8645ms | 38.6631 Ops/s | 38.8286 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 18.3005ms | 17.6524ms | 56.6495 Ops/s | 53.6661 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 40.7508ms | 38.4963ms | 25.9765 Ops/s | 26.2079 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 18.8348ms | 17.6460ms | 56.6702 Ops/s | 54.1704 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.5086ms | 8.4436ms | 118.4333 Ops/s | 117.7006 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.6813ms | 1.4884ms | 671.8814 Ops/s | 655.7469 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5646ms | 0.3965ms | 2.5219 KOps/s | 2.5162 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 35.3274ms | 34.8063ms | 28.7304 Ops/s | 28.8324 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 1.9628ms | 1.7201ms | 581.3575 Ops/s | 581.7121 Ops/s | |
| test_dqn_speed[False-None] | 1.7383ms | 1.3438ms | 744.1352 Ops/s | 740.4481 Ops/s | |
| test_dqn_speed[False-backward] | 1.8814ms | 1.8419ms | 542.9229 Ops/s | 538.9086 Ops/s | |
| test_dqn_speed[True-None] | 0.7314ms | 0.5883ms | 1.6998 KOps/s | 1.8295 KOps/s | |
| test_dqn_speed[True-backward] | 1.0468ms | 0.9904ms | 1.0097 KOps/s | 843.6236 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.9471ms | 0.5329ms | 1.8766 KOps/s | 1.7728 KOps/s | |
| test_ddpg_speed[False-None] | 3.2432ms | 2.7757ms | 360.2672 Ops/s | 359.3202 Ops/s | |
| test_ddpg_speed[False-backward] | 4.1784ms | 3.9814ms | 251.1677 Ops/s | 254.2580 Ops/s | |
| test_ddpg_speed[True-None] | 1.7903ms | 1.3878ms | 720.5470 Ops/s | 702.1077 Ops/s | |
| test_ddpg_speed[True-backward] | 2.3921ms | 2.3454ms | 426.3754 Ops/s | 383.6957 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 2.1108ms | 1.4012ms | 713.6991 Ops/s | 730.9596 Ops/s | |
| test_sac_speed[False-None] | 8.5964ms | 7.7079ms | 129.7371 Ops/s | 131.2115 Ops/s | |
| test_sac_speed[False-backward] | 11.0227ms | 10.8008ms | 92.5856 Ops/s | 92.8045 Ops/s | |
| test_sac_speed[True-None] | 2.5009ms | 2.1165ms | 472.4797 Ops/s | 463.0426 Ops/s | |
| test_sac_speed[True-backward] | 4.1342ms | 4.0090ms | 249.4372 Ops/s | 228.0679 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.5169ms | 2.1241ms | 470.7896 Ops/s | 465.1369 Ops/s | |
| test_redq_speed[False-None] | 13.4611ms | 10.4351ms | 95.8308 Ops/s | 98.1135 Ops/s | |
| test_redq_speed[False-backward] | 18.3752ms | 17.5385ms | 57.0175 Ops/s | 58.6387 Ops/s | |
| test_redq_speed[True-None] | 4.8199ms | 4.4147ms | 226.5167 Ops/s | 218.1683 Ops/s | |
| test_redq_speed[True-backward] | 10.3064ms | 9.6446ms | 103.6849 Ops/s | 107.5496 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.8135ms | 4.4094ms | 226.7899 Ops/s | 227.8350 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.1514ms | 10.6562ms | 93.8417 Ops/s | 92.3321 Ops/s | |
| test_redq_deprec_speed[False-backward] | 15.8239ms | 15.3271ms | 65.2437 Ops/s | 63.6433 Ops/s | |
| test_redq_deprec_speed[True-None] | 3.9934ms | 3.6188ms | 276.3324 Ops/s | 278.0852 Ops/s | |
| test_redq_deprec_speed[True-backward] | 7.6345ms | 7.3436ms | 136.1734 Ops/s | 140.1071 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 3.9504ms | 3.5208ms | 284.0250 Ops/s | 277.6237 Ops/s | |
| test_td3_speed[False-None] | 7.8209ms | 7.6793ms | 130.2196 Ops/s | 129.9196 Ops/s | |
| test_td3_speed[False-backward] | 10.9868ms | 10.5169ms | 95.0855 Ops/s | 94.8861 Ops/s | |
| test_td3_speed[True-None] | 1.9145ms | 1.8449ms | 542.0257 Ops/s | 542.5123 Ops/s | |
| test_td3_speed[True-backward] | 3.7088ms | 3.5940ms | 278.2425 Ops/s | 276.2918 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.8244ms | 1.7882ms | 559.2253 Ops/s | 562.2002 Ops/s | |
| test_cql_speed[False-None] | 27.7039ms | 25.2477ms | 39.6076 Ops/s | 40.5776 Ops/s | |
| test_cql_speed[False-backward] | 39.6198ms | 35.1964ms | 28.4120 Ops/s | 29.4069 Ops/s | |
| test_cql_speed[True-None] | 12.8122ms | 12.0514ms | 82.9782 Ops/s | 86.0700 Ops/s | |
| test_cql_speed[True-backward] | 17.8688ms | 17.4689ms | 57.2446 Ops/s | 54.4752 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 12.6690ms | 12.2093ms | 81.9049 Ops/s | 80.9236 Ops/s | |
| test_a2c_speed[False-None] | 7.3104ms | 5.3841ms | 185.7327 Ops/s | 187.0257 Ops/s | |
| test_a2c_speed[False-backward] | 12.1733ms | 11.7252ms | 85.2862 Ops/s | 84.7869 Ops/s | |
| test_a2c_speed[True-None] | 4.0268ms | 3.6597ms | 273.2427 Ops/s | 260.3005 Ops/s | |
| test_a2c_speed[True-backward] | 9.0432ms | 8.5824ms | 116.5175 Ops/s | 114.7628 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 4.0061ms | 3.6665ms | 272.7419 Ops/s | 266.4232 Ops/s | |
| test_ppo_speed[False-None] | 6.2808ms | 5.8019ms | 172.3564 Ops/s | 168.6916 Ops/s | |
| test_ppo_speed[False-backward] | 12.9735ms | 12.2308ms | 81.7605 Ops/s | 80.5185 Ops/s | |
| test_ppo_speed[True-None] | 5.0627ms | 3.6848ms | 271.3833 Ops/s | 267.1032 Ops/s | |
| test_ppo_speed[True-backward] | 8.9277ms | 8.5187ms | 117.3888 Ops/s | 114.8908 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 4.0028ms | 3.5980ms | 277.9319 Ops/s | 274.0400 Ops/s | |
| test_reinforce_speed[False-None] | 4.8474ms | 4.5047ms | 221.9921 Ops/s | 213.7225 Ops/s | |
| test_reinforce_speed[False-backward] | 7.6607ms | 7.3510ms | 136.0355 Ops/s | 135.1959 Ops/s | |
| test_reinforce_speed[True-None] | 3.3457ms | 2.9036ms | 344.3953 Ops/s | 343.5182 Ops/s | |
| test_reinforce_speed[True-backward] | 8.0202ms | 7.7457ms | 129.1041 Ops/s | 124.4511 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.2478ms | 2.8831ms | 346.8504 Ops/s | 349.7981 Ops/s | |
| test_iql_speed[False-None] | 24.4922ms | 19.7970ms | 50.5126 Ops/s | 50.0389 Ops/s | |
| test_iql_speed[False-backward] | 36.7798ms | 30.5457ms | 32.7378 Ops/s | 32.7472 Ops/s | |
| test_iql_speed[True-None] | 11.0912ms | 8.5166ms | 117.4179 Ops/s | 116.2120 Ops/s | |
| test_iql_speed[True-backward] | 16.8223ms | 16.4734ms | 60.7039 Ops/s | 60.3185 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 8.7676ms | 8.5123ms | 117.4771 Ops/s | 115.8096 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2254ms | 5.8200ms | 171.8227 Ops/s | 169.3567 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.1413ms | 0.3055ms | 3.2733 KOps/s | 3.1952 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7973ms | 0.3314ms | 3.0171 KOps/s | 3.2470 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.2080ms | 5.6711ms | 176.3328 Ops/s | 177.2553 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.6565ms | 0.2810ms | 3.5583 KOps/s | 3.0208 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.8572ms | 0.3265ms | 3.0632 KOps/s | 3.2946 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6288ms | 1.2079ms | 827.8955 Ops/s | 777.2609 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.3290ms | 1.1312ms | 883.9804 Ops/s | 823.7465 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 9.1630ms | 5.9727ms | 167.4284 Ops/s | 172.2936 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.7697ms | 0.4266ms | 2.3438 KOps/s | 2.0227 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8804ms | 0.4404ms | 2.2709 KOps/s | 2.1116 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.1274ms | 5.6989ms | 175.4734 Ops/s | 175.9968 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.7651ms | 0.3456ms | 2.8935 KOps/s | 2.8238 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5927ms | 0.3252ms | 3.0746 KOps/s | 2.9453 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.2721ms | 5.6846ms | 175.9150 Ops/s | 178.1180 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0830ms | 0.3421ms | 2.9234 KOps/s | 2.8407 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5305ms | 0.3296ms | 3.0341 KOps/s | 3.0048 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.3108ms | 5.8447ms | 171.0942 Ops/s | 173.0704 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8982ms | 0.4552ms | 2.1966 KOps/s | 2.1271 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8624ms | 0.4429ms | 2.2579 KOps/s | 2.1151 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.3317ms | 4.9190ms | 203.2933 Ops/s | 58.2498 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 3.5374ms | 1.9249ms | 519.4992 Ops/s | 518.5107 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.2155ms | 1.1402ms | 877.0580 Ops/s | 1.1548 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5480s | 15.8220ms | 63.2029 Ops/s | 197.7694 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 8.2273ms | 1.8713ms | 534.3889 Ops/s | 546.6288 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 10.3433ms | 1.1905ms | 839.9835 Ops/s | 1.1479 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 8.1628ms | 5.1540ms | 194.0250 Ops/s | 193.1750 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 10.0636ms | 2.0350ms | 491.3889 Ops/s | 526.9559 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 3.6901ms | 1.0547ms | 948.1580 Ops/s | 945.2307 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 36.9332ms | 34.8268ms | 28.7135 Ops/s | 27.7726 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 18.8214ms | 17.3486ms | 57.6416 Ops/s | 33.7095 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 39.7906ms | 35.9939ms | 27.7825 Ops/s | 27.0374 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 19.3915ms | 17.8108ms | 56.1456 Ops/s | 53.9865 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 41.1202ms | 37.9408ms | 26.3568 Ops/s | 25.8977 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 20.9063ms | 19.1113ms | 52.3251 Ops/s | 48.8668 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8570ms | 0.2132ms | 4.6912 KOps/s | 4.4900 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.6392ms | 1.3626ms | 733.9021 Ops/s | 715.2888 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.5899ms | 2.3823ms | 419.7605 Ops/s | 435.7678 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.3548ms | 2.8991ms | 344.9290 Ops/s | 342.9313 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2499ms | 0.1303ms | 7.6738 KOps/s | 7.7549 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.6572ms | 0.1847ms | 5.4139 KOps/s | 5.6820 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9063ms | 1.7947ms | 557.1954 Ops/s | 572.8000 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.4412ms | 1.3045ms | 766.5770 Ops/s | 763.9825 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.5235ms | 1.0947ms | 913.4836 Ops/s | 925.0883 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 4.1252ms | 3.5406ms | 282.4344 Ops/s | 287.7729 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 5.9526ms | 5.5733ms | 179.4273 Ops/s | 179.1551 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.2342ms | 6.8587ms | 145.7997 Ops/s | 147.2607 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.7201ms | 0.2883ms | 3.4686 KOps/s | 3.7144 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 2.0195ms | 1.5203ms | 657.7824 Ops/s | 661.0849 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.9942ms | 2.5462ms | 392.7378 Ops/s | 413.5260 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.5947ms | 3.1613ms | 316.3238 Ops/s | 319.6581 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 33.4064ms | 32.6009ms | 30.6740 Ops/s | 31.0374 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 64.9357ms | 64.2082ms | 15.5743 Ops/s | 15.7920 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 38.4280ms | 37.3986ms | 26.7390 Ops/s | 27.3890 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 73.8607ms | 72.9566ms | 13.7068 Ops/s | 13.8644 Ops/s |
Contributor
Result of GPU Benchmark TestsExpand to view detailed results
|
This was referenced Feb 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Adds MonarchTransport for distributed inference on GPU clusters using
Monarch's actor model and RDMA channels. Monarch is imported lazily
at instantiation time.
Co-authored-by: Cursor cursoragent@cursor.com