Skip to content

[Refactor] Performance utilities#3320

Merged
vmoens merged 2 commits into
gh/vmoens/188/basefrom
gh/vmoens/188/head
Jan 13, 2026
Merged

[Refactor] Performance utilities#3320
vmoens merged 2 commits into
gh/vmoens/188/basefrom
gh/vmoens/188/head

Conversation

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jan 12, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3320

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 2 Unrelated Failures

As of commit 926dac4 with merge base 0a98e17 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

github-actions Bot commented Jan 12, 2026

Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 164. Improved: $\large\color{#35bf28}20$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.3998μs 80.7119μs 12.3897 KOps/s 12.3976 KOps/s $\color{#d91a1a}-0.06\%$
test_tensor_to_bytestream_speed[torch.save] 0.1396ms 0.1387ms 7.2123 KOps/s 7.1671 KOps/s $\color{#35bf28}+0.63\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1188s 0.1186s 8.4349 Ops/s 8.5853 Ops/s $\color{#d91a1a}-1.75\%$
test_tensor_to_bytestream_speed[numpy] 2.6888μs 2.6855μs 372.3742 KOps/s 378.4649 KOps/s $\color{#d91a1a}-1.61\%$
test_tensor_to_bytestream_speed[safetensors] 37.1866μs 37.0464μs 26.9932 KOps/s 26.9254 KOps/s $\color{#35bf28}+0.25\%$
test_simple 0.5418s 0.5409s 1.8486 Ops/s 1.7608 Ops/s $\color{#35bf28}+4.99\%$
test_transformed 1.1158s 1.1146s 0.8972 Ops/s 0.8715 Ops/s $\color{#35bf28}+2.95\%$
test_serial 1.6572s 1.6488s 0.6065 Ops/s 0.5941 Ops/s $\color{#35bf28}+2.09\%$
test_parallel 1.2423s 1.1450s 0.8733 Ops/s 0.8866 Ops/s $\color{#d91a1a}-1.49\%$
test_step_mdp_speed[True-True-True-True-True] 0.3209ms 44.3201μs 22.5631 KOps/s 23.5277 KOps/s $\color{#d91a1a}-4.10\%$
test_step_mdp_speed[True-True-True-True-False] 56.5810μs 25.0298μs 39.9524 KOps/s 40.5372 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[True-True-True-False-True] 60.0010μs 24.6007μs 40.6493 KOps/s 40.5654 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[True-True-True-False-False] 39.6410μs 13.7121μs 72.9281 KOps/s 74.5226 KOps/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[True-True-False-True-True] 85.6910μs 46.5254μs 21.4936 KOps/s 21.5263 KOps/s $\color{#d91a1a}-0.15\%$
test_step_mdp_speed[True-True-False-True-False] 56.4300μs 27.2556μs 36.6898 KOps/s 37.5801 KOps/s $\color{#d91a1a}-2.37\%$
test_step_mdp_speed[True-True-False-False-True] 82.8310μs 27.4682μs 36.4057 KOps/s 37.1710 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-True-False-False-False] 45.3410μs 16.4278μs 60.8723 KOps/s 60.8518 KOps/s $\color{#35bf28}+0.03\%$
test_step_mdp_speed[True-False-True-True-True] 90.6310μs 50.1061μs 19.9576 KOps/s 20.5682 KOps/s $\color{#d91a1a}-2.97\%$
test_step_mdp_speed[True-False-True-True-False] 64.8400μs 30.6145μs 32.6643 KOps/s 33.8441 KOps/s $\color{#d91a1a}-3.49\%$
test_step_mdp_speed[True-False-True-False-True] 97.4520μs 27.0112μs 37.0217 KOps/s 36.8654 KOps/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[True-False-True-False-False] 45.4710μs 16.2402μs 61.5756 KOps/s 62.4051 KOps/s $\color{#d91a1a}-1.33\%$
test_step_mdp_speed[True-False-False-True-True] 79.2510μs 52.2111μs 19.1530 KOps/s 19.1841 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[True-False-False-True-False] 72.8810μs 32.9412μs 30.3571 KOps/s 30.8741 KOps/s $\color{#d91a1a}-1.67\%$
test_step_mdp_speed[True-False-False-False-True] 60.6300μs 30.2107μs 33.1009 KOps/s 33.9812 KOps/s $\color{#d91a1a}-2.59\%$
test_step_mdp_speed[True-False-False-False-False] 47.5200μs 19.1471μs 52.2272 KOps/s 53.0142 KOps/s $\color{#d91a1a}-1.48\%$
test_step_mdp_speed[False-True-True-True-True] 0.1220ms 49.9273μs 20.0291 KOps/s 20.3854 KOps/s $\color{#d91a1a}-1.75\%$
test_step_mdp_speed[False-True-True-True-False] 55.6810μs 30.9549μs 32.3051 KOps/s 33.4464 KOps/s $\color{#d91a1a}-3.41\%$
test_step_mdp_speed[False-True-True-False-True] 2.3372ms 32.2246μs 31.0322 KOps/s 31.7350 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[False-True-True-False-False] 52.7610μs 18.2632μs 54.7550 KOps/s 55.8131 KOps/s $\color{#d91a1a}-1.90\%$
test_step_mdp_speed[False-True-False-True-True] 86.1810μs 52.6987μs 18.9758 KOps/s 19.3154 KOps/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[False-True-False-True-False] 70.6110μs 33.4725μs 29.8752 KOps/s 30.6429 KOps/s $\color{#d91a1a}-2.51\%$
test_step_mdp_speed[False-True-False-False-True] 64.5410μs 33.9923μs 29.4185 KOps/s 30.0240 KOps/s $\color{#d91a1a}-2.02\%$
test_step_mdp_speed[False-True-False-False-False] 50.7300μs 21.3677μs 46.7997 KOps/s 49.0167 KOps/s $\color{#d91a1a}-4.52\%$
test_step_mdp_speed[False-False-True-True-True] 85.0410μs 56.6378μs 17.6561 KOps/s 18.2825 KOps/s $\color{#d91a1a}-3.43\%$
test_step_mdp_speed[False-False-True-True-False] 60.3210μs 35.8896μs 27.8632 KOps/s 28.3161 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-False-True-False-True] 0.1012ms 34.1032μs 29.3227 KOps/s 29.9973 KOps/s $\color{#d91a1a}-2.25\%$
test_step_mdp_speed[False-False-True-False-False] 48.6810μs 20.8672μs 47.9221 KOps/s 48.7076 KOps/s $\color{#d91a1a}-1.61\%$
test_step_mdp_speed[False-False-False-True-True] 92.0610μs 58.0981μs 17.2123 KOps/s 17.7952 KOps/s $\color{#d91a1a}-3.28\%$
test_step_mdp_speed[False-False-False-True-False] 68.0510μs 38.4236μs 26.0257 KOps/s 27.1302 KOps/s $\color{#d91a1a}-4.07\%$
test_step_mdp_speed[False-False-False-False-True] 71.9110μs 35.8513μs 27.8930 KOps/s 28.1335 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[False-False-False-False-False] 52.0000μs 23.4369μs 42.6678 KOps/s 44.0202 KOps/s $\color{#d91a1a}-3.07\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8657s 0.7639s 1.3091 Ops/s 1.3015 Ops/s $\color{#35bf28}+0.58\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7288s 0.6268s 1.5955 Ops/s 1.5826 Ops/s $\color{#35bf28}+0.81\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7669s 1.6808s 0.5950 Ops/s 0.5966 Ops/s $\color{#d91a1a}-0.28\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5357s 1.4528s 0.6883 Ops/s 0.6889 Ops/s $\color{#d91a1a}-0.08\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0079s 1.9255s 0.5193 Ops/s 0.5182 Ops/s $\color{#35bf28}+0.21\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7868s 1.7006s 0.5880 Ops/s 0.5884 Ops/s $\color{#d91a1a}-0.07\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.8061s 4.7173s 0.2120 Ops/s 0.2150 Ops/s $\color{#d91a1a}-1.41\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5587s 4.4958s 0.2224 Ops/s 0.2242 Ops/s $\color{#d91a1a}-0.80\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0521s 1.9880s 0.5030 Ops/s 0.5040 Ops/s $\color{#d91a1a}-0.19\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7400s 1.6557s 0.6040 Ops/s 0.5979 Ops/s $\color{#35bf28}+1.02\%$
test_values[generalized_advantage_estimate-True-True] 10.4507ms 10.2663ms 97.4059 Ops/s 97.2665 Ops/s $\color{#35bf28}+0.14\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.7592ms 17.6215ms 56.7487 Ops/s 56.6764 Ops/s $\color{#35bf28}+0.13\%$
test_values[td0_return_estimate-False-False] 0.2164ms 0.1206ms 8.2924 KOps/s 7.8887 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_values[td1_return_estimate-False-False] 28.1090ms 27.7781ms 35.9996 Ops/s 35.4640 Ops/s $\color{#35bf28}+1.51\%$
test_values[vec_td1_return_estimate-False-False] 18.1673ms 17.7251ms 56.4172 Ops/s 56.4920 Ops/s $\color{#d91a1a}-0.13\%$
test_values[td_lambda_return_estimate-True-False] 41.6248ms 40.8812ms 24.4611 Ops/s 24.0543 Ops/s $\color{#35bf28}+1.69\%$
test_values[vec_td_lambda_return_estimate-True-False] 20.4553ms 17.8019ms 56.1739 Ops/s 56.4292 Ops/s $\color{#d91a1a}-0.45\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.2480ms 9.0929ms 109.9755 Ops/s 109.3085 Ops/s $\color{#35bf28}+0.61\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.9145ms 1.5177ms 658.8770 Ops/s 676.8375 Ops/s $\color{#d91a1a}-2.65\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5368ms 0.4178ms 2.3935 KOps/s 2.4007 KOps/s $\color{#d91a1a}-0.30\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.6222ms 31.4116ms 31.8354 Ops/s 28.6495 Ops/s $\textbf{\color{#35bf28}+11.12\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8456ms 1.7159ms 582.7843 Ops/s 590.4866 Ops/s $\color{#d91a1a}-1.30\%$
test_dqn_speed[False-None] 1.9182ms 1.4051ms 711.6813 Ops/s 711.1330 Ops/s $\color{#35bf28}+0.08\%$
test_dqn_speed[False-backward] 2.1214ms 1.9325ms 517.4536 Ops/s 500.2203 Ops/s $\color{#35bf28}+3.45\%$
test_dqn_speed[True-None] 1.1490ms 0.5456ms 1.8328 KOps/s 1.7877 KOps/s $\color{#35bf28}+2.52\%$
test_dqn_speed[True-backward] 1.0332ms 0.9902ms 1.0099 KOps/s 843.3162 Ops/s $\textbf{\color{#35bf28}+19.75\%}$
test_dqn_speed[reduce-overhead-None] 0.6465ms 0.5251ms 1.9045 KOps/s 1.8018 KOps/s $\textbf{\color{#35bf28}+5.70\%}$
test_dqn_speed[reduce-overhead-backward] 1.0155ms 0.9759ms 1.0247 KOps/s 878.8017 Ops/s $\textbf{\color{#35bf28}+16.60\%}$
test_ddpg_speed[False-None] 3.1767ms 2.8571ms 350.0078 Ops/s 331.5620 Ops/s $\textbf{\color{#35bf28}+5.56\%}$
test_ddpg_speed[False-backward] 4.2395ms 4.1034ms 243.7026 Ops/s 243.0667 Ops/s $\color{#35bf28}+0.26\%$
test_ddpg_speed[True-None] 1.5521ms 1.3994ms 714.5796 Ops/s 698.4030 Ops/s $\color{#35bf28}+2.32\%$
test_ddpg_speed[True-backward] 2.4566ms 2.3928ms 417.9172 Ops/s 353.4455 Ops/s $\textbf{\color{#35bf28}+18.24\%}$
test_ddpg_speed[reduce-overhead-None] 1.4956ms 1.3866ms 721.1873 Ops/s 714.1905 Ops/s $\color{#35bf28}+0.98\%$
test_ddpg_speed[reduce-overhead-backward] 2.4313ms 2.3898ms 418.4476 Ops/s 419.0134 Ops/s $\color{#d91a1a}-0.14\%$
test_sac_speed[False-None] 8.5123ms 8.0136ms 124.7877 Ops/s 123.3251 Ops/s $\color{#35bf28}+1.19\%$
test_sac_speed[False-backward] 11.7790ms 11.3553ms 88.0643 Ops/s 88.0626 Ops/s $+0.00\%$
test_sac_speed[True-None] 2.4508ms 2.1608ms 462.7896 Ops/s 456.1971 Ops/s $\color{#35bf28}+1.45\%$
test_sac_speed[True-backward] 4.2249ms 4.0576ms 246.4523 Ops/s 245.5508 Ops/s $\color{#35bf28}+0.37\%$
test_sac_speed[reduce-overhead-None] 2.3546ms 2.1522ms 464.6414 Ops/s 425.0837 Ops/s $\textbf{\color{#35bf28}+9.31\%}$
test_sac_speed[reduce-overhead-backward] 4.4754ms 4.1610ms 240.3255 Ops/s 220.3503 Ops/s $\textbf{\color{#35bf28}+9.07\%}$
test_redq_speed[False-None] 11.2639ms 10.5250ms 95.0115 Ops/s 96.3274 Ops/s $\color{#d91a1a}-1.37\%$
test_redq_speed[False-backward] 18.8164ms 18.1390ms 55.1298 Ops/s 55.8919 Ops/s $\color{#d91a1a}-1.36\%$
test_redq_speed[True-None] 4.6159ms 4.4382ms 225.3141 Ops/s 225.8924 Ops/s $\color{#d91a1a}-0.26\%$
test_redq_speed[True-backward] 11.8503ms 9.9791ms 100.2092 Ops/s 92.3118 Ops/s $\textbf{\color{#35bf28}+8.56\%}$
test_redq_speed[reduce-overhead-None] 4.7517ms 4.5359ms 220.4630 Ops/s 212.2312 Ops/s $\color{#35bf28}+3.88\%$
test_redq_speed[reduce-overhead-backward] 10.3479ms 10.0838ms 99.1690 Ops/s 101.3318 Ops/s $\color{#d91a1a}-2.13\%$
test_redq_deprec_speed[False-None] 11.8510ms 11.2396ms 88.9713 Ops/s 88.1382 Ops/s $\color{#35bf28}+0.95\%$
test_redq_deprec_speed[False-backward] 16.3224ms 15.9821ms 62.5698 Ops/s 61.2142 Ops/s $\color{#35bf28}+2.21\%$
test_redq_deprec_speed[True-None] 4.1675ms 3.7370ms 267.5913 Ops/s 263.8562 Ops/s $\color{#35bf28}+1.42\%$
test_redq_deprec_speed[True-backward] 7.9879ms 7.8174ms 127.9190 Ops/s 120.2256 Ops/s $\textbf{\color{#35bf28}+6.40\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.8553ms 3.6665ms 272.7412 Ops/s 273.7302 Ops/s $\color{#d91a1a}-0.36\%$
test_redq_deprec_speed[reduce-overhead-backward] 7.9648ms 7.7910ms 128.3527 Ops/s 121.5050 Ops/s $\textbf{\color{#35bf28}+5.64\%}$
test_td3_speed[False-None] 8.1778ms 8.0849ms 123.6871 Ops/s 124.7789 Ops/s $\color{#d91a1a}-0.88\%$
test_td3_speed[False-backward] 11.6269ms 11.0621ms 90.3984 Ops/s 91.7031 Ops/s $\color{#d91a1a}-1.42\%$
test_td3_speed[True-None] 1.9058ms 1.8396ms 543.6035 Ops/s 536.9130 Ops/s $\color{#35bf28}+1.25\%$
test_td3_speed[True-backward] 3.9591ms 3.7245ms 268.4957 Ops/s 272.5270 Ops/s $\color{#d91a1a}-1.48\%$
test_td3_speed[reduce-overhead-None] 1.8962ms 1.8374ms 544.2371 Ops/s 552.0264 Ops/s $\color{#d91a1a}-1.41\%$
test_td3_speed[reduce-overhead-backward] 5.6340ms 4.3071ms 232.1764 Ops/s 269.3156 Ops/s $\textbf{\color{#d91a1a}-13.79\%}$
test_cql_speed[False-None] 27.0988ms 26.1206ms 38.2840 Ops/s 38.2944 Ops/s $\color{#d91a1a}-0.03\%$
test_cql_speed[False-backward] 36.6217ms 35.5724ms 28.1117 Ops/s 28.2160 Ops/s $\color{#d91a1a}-0.37\%$
test_cql_speed[True-None] 13.0244ms 12.5264ms 79.8315 Ops/s 80.7336 Ops/s $\color{#d91a1a}-1.12\%$
test_cql_speed[True-backward] 19.0430ms 18.5471ms 53.9168 Ops/s 54.5592 Ops/s $\color{#d91a1a}-1.18\%$
test_cql_speed[reduce-overhead-None] 12.9150ms 12.5544ms 79.6534 Ops/s 77.7660 Ops/s $\color{#35bf28}+2.43\%$
test_cql_speed[reduce-overhead-backward] 19.0179ms 18.5705ms 53.8488 Ops/s 56.5904 Ops/s $\color{#d91a1a}-4.84\%$
test_a2c_speed[False-None] 5.7443ms 5.4203ms 184.4924 Ops/s 187.7211 Ops/s $\color{#d91a1a}-1.72\%$
test_a2c_speed[False-backward] 12.2866ms 11.8489ms 84.3962 Ops/s 82.9226 Ops/s $\color{#35bf28}+1.78\%$
test_a2c_speed[True-None] 3.9742ms 3.6981ms 270.4114 Ops/s 267.5523 Ops/s $\color{#35bf28}+1.07\%$
test_a2c_speed[True-backward] 8.9293ms 8.7453ms 114.3465 Ops/s 114.5303 Ops/s $\color{#d91a1a}-0.16\%$
test_a2c_speed[reduce-overhead-None] 4.0840ms 3.7347ms 267.7581 Ops/s 268.3482 Ops/s $\color{#d91a1a}-0.22\%$
test_a2c_speed[reduce-overhead-backward] 9.0334ms 8.8876ms 112.5166 Ops/s 113.1642 Ops/s $\color{#d91a1a}-0.57\%$
test_ppo_speed[False-None] 6.3229ms 5.9903ms 166.9358 Ops/s 168.7394 Ops/s $\color{#d91a1a}-1.07\%$
test_ppo_speed[False-backward] 13.2490ms 12.7767ms 78.2676 Ops/s 80.4197 Ops/s $\color{#d91a1a}-2.68\%$
test_ppo_speed[True-None] 3.7065ms 3.6238ms 275.9508 Ops/s 274.9283 Ops/s $\color{#35bf28}+0.37\%$
test_ppo_speed[True-backward] 8.9242ms 8.4619ms 118.1770 Ops/s 118.6082 Ops/s $\color{#d91a1a}-0.36\%$
test_ppo_speed[reduce-overhead-None] 3.7504ms 3.5849ms 278.9451 Ops/s 278.2564 Ops/s $\color{#35bf28}+0.25\%$
test_ppo_speed[reduce-overhead-backward] 9.2970ms 8.7920ms 113.7400 Ops/s 111.5358 Ops/s $\color{#35bf28}+1.98\%$
test_reinforce_speed[False-None] 4.9815ms 4.6101ms 216.9143 Ops/s 219.2405 Ops/s $\color{#d91a1a}-1.06\%$
test_reinforce_speed[False-backward] 7.7202ms 7.4592ms 134.0630 Ops/s 134.7744 Ops/s $\color{#d91a1a}-0.53\%$
test_reinforce_speed[True-None] 3.3165ms 2.8912ms 345.8747 Ops/s 343.3559 Ops/s $\color{#35bf28}+0.73\%$
test_reinforce_speed[True-backward] 8.0164ms 7.8171ms 127.9242 Ops/s 129.8158 Ops/s $\color{#d91a1a}-1.46\%$
test_reinforce_speed[reduce-overhead-None] 3.3337ms 2.8782ms 347.4426 Ops/s 347.6856 Ops/s $\color{#d91a1a}-0.07\%$
test_reinforce_speed[reduce-overhead-backward] 8.3561ms 8.0481ms 124.2533 Ops/s 99.2389 Ops/s $\textbf{\color{#35bf28}+25.21\%}$
test_iql_speed[False-None] 26.1678ms 20.3514ms 49.1367 Ops/s 49.6937 Ops/s $\color{#d91a1a}-1.12\%$
test_iql_speed[False-backward] 31.4239ms 30.7702ms 32.4990 Ops/s 32.6483 Ops/s $\color{#d91a1a}-0.46\%$
test_iql_speed[True-None] 8.8526ms 8.5603ms 116.8183 Ops/s 115.6474 Ops/s $\color{#35bf28}+1.01\%$
test_iql_speed[True-backward] 17.5451ms 16.9876ms 58.8666 Ops/s 57.9373 Ops/s $\color{#35bf28}+1.60\%$
test_iql_speed[reduce-overhead-None] 8.8111ms 8.6011ms 116.2640 Ops/s 115.6198 Ops/s $\color{#35bf28}+0.56\%$
test_iql_speed[reduce-overhead-backward] 17.8009ms 17.3255ms 57.7184 Ops/s 56.9355 Ops/s $\color{#35bf28}+1.38\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.3302ms 6.0320ms 165.7813 Ops/s 167.4444 Ops/s $\color{#d91a1a}-0.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5973ms 0.3219ms 3.1069 KOps/s 3.4420 KOps/s $\textbf{\color{#d91a1a}-9.74\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5296ms 0.2822ms 3.5435 KOps/s 3.7493 KOps/s $\textbf{\color{#d91a1a}-5.49\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9322ms 5.7189ms 174.8587 Ops/s 175.6892 Ops/s $\color{#d91a1a}-0.47\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1126ms 0.2854ms 3.5040 KOps/s 2.9315 KOps/s $\textbf{\color{#35bf28}+19.53\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5014ms 0.2878ms 3.4750 KOps/s 3.1511 KOps/s $\textbf{\color{#35bf28}+10.28\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5032ms 1.2773ms 782.8941 Ops/s 719.9661 Ops/s $\textbf{\color{#35bf28}+8.74\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4301ms 1.1886ms 841.3073 Ops/s 781.4188 Ops/s $\textbf{\color{#35bf28}+7.66\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.6951ms 5.9718ms 167.4535 Ops/s 171.0290 Ops/s $\color{#d91a1a}-2.09\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7963ms 0.4926ms 2.0301 KOps/s 2.0807 KOps/s $\color{#d91a1a}-2.43\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7662ms 0.4697ms 2.1292 KOps/s 2.1255 KOps/s $\color{#35bf28}+0.17\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0674ms 5.7693ms 173.3303 Ops/s 177.3880 Ops/s $\color{#d91a1a}-2.29\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9061ms 0.2895ms 3.4540 KOps/s 3.3578 KOps/s $\color{#35bf28}+2.87\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5530ms 0.3057ms 3.2711 KOps/s 3.7865 KOps/s $\textbf{\color{#d91a1a}-13.61\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9531ms 5.7169ms 174.9214 Ops/s 178.8839 Ops/s $\color{#d91a1a}-2.22\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0573ms 0.3712ms 2.6938 KOps/s 3.1910 KOps/s $\textbf{\color{#d91a1a}-15.58\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8204ms 0.3610ms 2.7700 KOps/s 3.2991 KOps/s $\textbf{\color{#d91a1a}-16.04\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9740ms 5.8838ms 169.9568 Ops/s 172.4568 Ops/s $\color{#d91a1a}-1.45\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.6007ms 0.5205ms 1.9211 KOps/s 2.2963 KOps/s $\textbf{\color{#d91a1a}-16.34\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6956ms 0.5058ms 1.9772 KOps/s 2.4163 KOps/s $\textbf{\color{#d91a1a}-18.17\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4612ms 5.0151ms 199.3982 Ops/s 199.7477 Ops/s $\color{#d91a1a}-0.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.3227ms 2.3347ms 428.3225 Ops/s 472.3508 Ops/s $\textbf{\color{#d91a1a}-9.32\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 8.3750ms 1.2028ms 831.3810 Ops/s 792.2535 Ops/s $\color{#35bf28}+4.94\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6436s 17.8485ms 56.0271 Ops/s 197.7946 Ops/s $\textbf{\color{#d91a1a}-71.67\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 10.1497ms 1.9901ms 502.4794 Ops/s 470.1617 Ops/s $\textbf{\color{#35bf28}+6.87\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.6667ms 1.1921ms 838.8473 Ops/s 774.4533 Ops/s $\textbf{\color{#35bf28}+8.31\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.9734ms 5.2409ms 190.8086 Ops/s 53.7223 Ops/s $\textbf{\color{#35bf28}+255.18\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.2360ms 2.2480ms 444.8309 Ops/s 430.1583 Ops/s $\color{#35bf28}+3.41\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.1981ms 1.3594ms 735.6032 Ops/s 963.4529 Ops/s $\textbf{\color{#d91a1a}-23.65\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 37.6394ms 33.9519ms 29.4535 Ops/s 29.1045 Ops/s $\color{#35bf28}+1.20\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3461ms 17.6624ms 56.6175 Ops/s 55.6383 Ops/s $\color{#35bf28}+1.76\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 37.2828ms 35.0386ms 28.5400 Ops/s 28.2506 Ops/s $\color{#35bf28}+1.02\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.5214ms 17.9039ms 55.8538 Ops/s 53.9950 Ops/s $\color{#35bf28}+3.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 38.5744ms 36.7144ms 27.2372 Ops/s 26.7399 Ops/s $\color{#35bf28}+1.86\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.8727ms 19.4055ms 51.5317 Ops/s 50.4024 Ops/s $\color{#35bf28}+2.24\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 13, 2026
Enhance `timeit` in `torchrl/_utils.py` to support `start()` and `elapsed()`.
Add `set_profiling_enabled` and optimize `_maybe_record_function`.

ghstack-source-id: 31cdce7
Pull-Request: #3320
@vmoens vmoens merged commit 926dac4 into gh/vmoens/188/base Jan 13, 2026
101 of 106 checks passed
@vmoens vmoens deleted the gh/vmoens/188/head branch January 13, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant