-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Faster RNNs #1732
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1732
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (8 Unrelated Failures)As of commit 383372c with merge base 0906206 (): FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 64.1659ms | 63.0740ms | 15.8544 Ops/s | 14.6836 Ops/s | |
test_sync | 47.2133ms | 40.1436ms | 24.9105 Ops/s | 27.1948 Ops/s | |
test_async | 74.9472ms | 33.8204ms | 29.5680 Ops/s | 29.6069 Ops/s | |
test_simple | 0.4891s | 0.4330s | 2.3094 Ops/s | 2.2391 Ops/s | |
test_transformed | 0.6503s | 0.5972s | 1.6746 Ops/s | 1.6348 Ops/s | |
test_serial | 1.3511s | 1.3062s | 0.7656 Ops/s | 0.7272 Ops/s | |
test_parallel | 1.3570s | 1.3144s | 0.7608 Ops/s | 0.7606 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1428ms | 21.8280μs | 45.8127 KOps/s | 45.9460 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 38.1110μs | 13.1503μs | 76.0437 KOps/s | 76.1189 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 28.8640μs | 12.7018μs | 78.7290 KOps/s | 77.8758 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 42.8600μs | 7.6481μs | 130.7508 KOps/s | 128.9362 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 66.5140μs | 23.0509μs | 43.3823 KOps/s | 43.2860 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 40.5960μs | 14.5870μs | 68.5541 KOps/s | 68.5451 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 59.4710μs | 14.0399μs | 71.2254 KOps/s | 70.4017 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 27.9220μs | 9.0281μs | 110.7654 KOps/s | 108.9540 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 58.6190μs | 24.8414μs | 40.2554 KOps/s | 40.4187 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 40.3460μs | 16.0488μs | 62.3099 KOps/s | 62.0565 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 52.3380μs | 14.0572μs | 71.1381 KOps/s | 70.9842 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 27.3210μs | 9.1000μs | 109.8898 KOps/s | 110.0616 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 72.5140μs | 26.0208μs | 38.4308 KOps/s | 38.4953 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 62.3360μs | 17.2837μs | 57.8578 KOps/s | 57.1183 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 37.6000μs | 15.2465μs | 65.5890 KOps/s | 65.1209 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 48.7210μs | 10.2202μs | 97.8452 KOps/s | 95.7264 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 71.0920μs | 24.7482μs | 40.4070 KOps/s | 40.7186 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 34.9960μs | 15.9040μs | 62.8773 KOps/s | 61.6696 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 52.5370μs | 16.4908μs | 60.6400 KOps/s | 60.3425 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 28.4730μs | 10.3532μs | 96.5888 KOps/s | 95.3432 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 58.1480μs | 25.9707μs | 38.5049 KOps/s | 38.9946 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 36.8880μs | 17.2250μs | 58.0550 KOps/s | 57.8578 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 55.2190μs | 17.6616μs | 56.6200 KOps/s | 56.9846 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 29.4750μs | 11.5327μs | 86.7098 KOps/s | 86.4967 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 68.1860μs | 27.2904μs | 36.6430 KOps/s | 36.8233 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 70.6840μs | 18.2187μs | 54.8887 KOps/s | 53.3471 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 36.9990μs | 17.6423μs | 56.6819 KOps/s | 56.4148 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 57.1670μs | 11.4624μs | 87.2419 KOps/s | 86.4171 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 65.4310μs | 28.3160μs | 35.3158 KOps/s | 35.6526 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 62.6970μs | 19.8041μs | 50.4946 KOps/s | 50.3332 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 62.7060μs | 18.9183μs | 52.8589 KOps/s | 53.7593 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 34.6140μs | 12.7367μs | 78.5130 KOps/s | 78.5891 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 12.8939ms | 11.8846ms | 84.1426 Ops/s | 83.9162 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 35.3123ms | 27.7774ms | 36.0005 Ops/s | 38.2026 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.2531ms | 0.1756ms | 5.6936 KOps/s | 5.5336 KOps/s | |
test_values[td1_return_estimate-False-False] | 25.8484ms | 25.3479ms | 39.4511 Ops/s | 39.7437 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 35.3737ms | 27.7551ms | 36.0295 Ops/s | 37.5707 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 35.8830ms | 35.3378ms | 28.2983 Ops/s | 28.4351 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 35.7831ms | 27.8661ms | 35.8859 Ops/s | 37.4642 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.9521ms | 8.0259ms | 124.5968 Ops/s | 125.8645 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 10.2126ms | 1.8926ms | 528.3609 Ops/s | 519.6546 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5332ms | 0.4244ms | 2.3562 KOps/s | 2.3105 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 43.9820ms | 38.3013ms | 26.1088 Ops/s | 26.8299 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 3.5404ms | 2.6219ms | 381.4086 Ops/s | 381.9022 Ops/s | |
test_dqn_speed | 10.2216ms | 1.6258ms | 615.0898 Ops/s | 604.6451 Ops/s | |
test_ddpg_speed | 12.1207ms | 3.6455ms | 274.3082 Ops/s | 274.6707 Ops/s | |
test_sac_speed | 78.0308ms | 10.9177ms | 91.5940 Ops/s | 97.5800 Ops/s | |
test_redq_speed | 27.5107ms | 19.0123ms | 52.5977 Ops/s | 52.1183 Ops/s | |
test_redq_deprec_speed | 23.4420ms | 15.0264ms | 66.5496 Ops/s | 65.8915 Ops/s | |
test_td3_speed | 18.0218ms | 10.4729ms | 95.4845 Ops/s | 94.3396 Ops/s | |
test_cql_speed | 47.3083ms | 38.6661ms | 25.8625 Ops/s | 25.3669 Ops/s | |
test_a2c_speed | 16.2878ms | 8.1006ms | 123.4472 Ops/s | 72.4170 Ops/s | |
test_ppo_speed | 17.0793ms | 8.3906ms | 119.1808 Ops/s | 117.5203 Ops/s | |
test_reinforce_speed | 15.9125ms | 7.1545ms | 139.7719 Ops/s | 139.3745 Ops/s | |
test_iql_speed | 42.7150ms | 34.1893ms | 29.2489 Ops/s | 29.0171 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.9773ms | 1.8568ms | 538.5752 Ops/s | 493.2184 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.1009s | 2.1576ms | 463.4781 Ops/s | 506.2262 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 3.2240ms | 1.9824ms | 504.4361 Ops/s | 503.3468 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 2.3932ms | 1.8665ms | 535.7528 Ops/s | 465.9432 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 98.1648ms | 2.1441ms | 466.3896 Ops/s | 500.7125 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 2.9999ms | 1.9666ms | 508.5025 Ops/s | 508.6317 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 2.3913ms | 1.8504ms | 540.4356 Ops/s | 545.7313 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 99.8966ms | 2.1920ms | 456.2040 Ops/s | 505.9534 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 2.9439ms | 1.9805ms | 504.9262 Ops/s | 508.3101 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.5308ms | 1.8642ms | 536.4214 Ops/s | 542.8105 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.1044s | 2.1984ms | 454.8701 Ops/s | 491.8336 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 2.9102ms | 1.9770ms | 505.8130 Ops/s | 507.9670 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 2.3991ms | 1.8600ms | 537.6389 Ops/s | 530.9514 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.1055s | 2.2338ms | 447.6629 Ops/s | 505.3037 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 3.3809ms | 1.9824ms | 504.4270 Ops/s | 503.1591 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 2.3928ms | 1.8849ms | 530.5182 Ops/s | 541.2887 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.1047s | 2.1633ms | 462.2481 Ops/s | 511.2612 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 2.7457ms | 1.9586ms | 510.5582 Ops/s | 509.2546 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1473s | 16.7387ms | 59.7416 Ops/s | 57.9970 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 0.1030s | 15.8159ms | 63.2274 Ops/s | 63.6203 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 0.1001s | 15.7780ms | 63.3793 Ops/s | 63.7730 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1001s | 15.8805ms | 62.9704 Ops/s | 71.6455 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 99.6975ms | 15.7293ms | 63.5755 Ops/s | 62.7885 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 98.2683ms | 15.6716ms | 63.8096 Ops/s | 63.1373 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1024s | 15.7388ms | 63.5374 Ops/s | 63.2792 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.1027s | 17.7215ms | 56.4286 Ops/s | 63.3636 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 0.1018s | 15.8206ms | 63.2089 Ops/s | 63.2628 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1286s | 0.1276s | 7.8387 Ops/s | 7.9352 Ops/s | |
test_sync | 0.1035s | 0.1025s | 9.7521 Ops/s | 9.4995 Ops/s | |
test_async | 0.2747s | 0.1013s | 9.8718 Ops/s | 9.9094 Ops/s | |
test_single_pixels | 0.1357s | 0.1353s | 7.3914 Ops/s | 6.7296 Ops/s | |
test_sync_pixels | 96.2275ms | 95.3724ms | 10.4852 Ops/s | 10.4999 Ops/s | |
test_async_pixels | 0.2498s | 92.1901ms | 10.8471 Ops/s | 10.8136 Ops/s | |
test_simple | 0.9928s | 0.9382s | 1.0659 Ops/s | 1.0887 Ops/s | |
test_transformed | 1.2452s | 1.1831s | 0.8452 Ops/s | 0.8514 Ops/s | |
test_serial | 2.7244s | 2.6217s | 0.3814 Ops/s | 0.3773 Ops/s | |
test_parallel | 2.6005s | 2.5174s | 0.3972 Ops/s | 0.3975 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1066ms | 33.6404μs | 29.7262 KOps/s | 30.3768 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 51.6210μs | 19.6560μs | 50.8752 KOps/s | 51.1062 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 47.8810μs | 19.2425μs | 51.9683 KOps/s | 52.8304 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 38.5410μs | 11.3846μs | 87.8381 KOps/s | 88.1763 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 57.6420μs | 35.1438μs | 28.4545 KOps/s | 28.6411 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 44.9500μs | 21.6892μs | 46.1059 KOps/s | 46.0833 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 50.6210μs | 21.6389μs | 46.2130 KOps/s | 47.9023 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 33.4400μs | 13.3467μs | 74.9246 KOps/s | 74.9402 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 69.4200μs | 37.6978μs | 26.5267 KOps/s | 26.7853 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 45.6810μs | 23.8423μs | 41.9423 KOps/s | 42.3346 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 71.5300μs | 21.1906μs | 47.1906 KOps/s | 47.2039 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 44.6510μs | 13.4178μs | 74.5279 KOps/s | 74.4363 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 69.6610μs | 39.2170μs | 25.4992 KOps/s | 25.7909 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 62.2810μs | 25.6749μs | 38.9486 KOps/s | 39.3988 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 57.6800μs | 23.0967μs | 43.2963 KOps/s | 43.7986 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 42.3210μs | 15.2396μs | 65.6186 KOps/s | 66.2093 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 68.4210μs | 37.6604μs | 26.5531 KOps/s | 27.1070 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 53.8200μs | 23.7149μs | 42.1676 KOps/s | 42.8198 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 62.1800μs | 25.6522μs | 38.9830 KOps/s | 40.6276 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 38.7000μs | 15.2191μs | 65.7067 KOps/s | 66.4633 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 80.3110μs | 39.0703μs | 25.5949 KOps/s | 26.0996 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 56.9910μs | 25.7080μs | 38.8984 KOps/s | 39.2218 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 52.7310μs | 26.9675μs | 37.0817 KOps/s | 37.7768 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 44.0900μs | 16.9347μs | 59.0502 KOps/s | 59.1566 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 68.2720μs | 40.8300μs | 24.4918 KOps/s | 24.8342 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 58.1400μs | 27.2023μs | 36.7616 KOps/s | 36.4576 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 51.7600μs | 27.1397μs | 36.8464 KOps/s | 37.8655 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 44.1800μs | 16.9282μs | 59.0732 KOps/s | 59.6708 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 76.3800μs | 42.1745μs | 23.7110 KOps/s | 23.8361 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 55.7110μs | 29.2995μs | 34.1303 KOps/s | 34.6178 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 58.3110μs | 28.4964μs | 35.0922 KOps/s | 36.3273 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 43.1100μs | 18.9800μs | 52.6870 KOps/s | 53.5311 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 27.0852ms | 26.6389ms | 37.5390 Ops/s | 39.1844 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 0.1003s | 3.6073ms | 277.2148 Ops/s | 91.0625 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.1431ms | 69.0716μs | 14.4777 KOps/s | 14.7815 KOps/s | |
test_values[td1_return_estimate-False-False] | 58.2017ms | 57.7733ms | 17.3090 Ops/s | 17.8141 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 2.0505ms | 1.8057ms | 553.7981 Ops/s | 556.0273 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 95.3040ms | 93.3819ms | 10.7087 Ops/s | 11.0316 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 2.0857ms | 1.8038ms | 554.3923 Ops/s | 552.0846 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 25.6183ms | 25.3779ms | 39.4043 Ops/s | 39.1497 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.8993ms | 0.7475ms | 1.3378 KOps/s | 1.3256 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7588ms | 0.6974ms | 1.4339 KOps/s | 1.4443 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5506ms | 1.4946ms | 669.0898 Ops/s | 673.6982 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.9890ms | 0.7183ms | 1.3922 KOps/s | 1.3950 KOps/s | |
test_dqn_speed | 8.0848ms | 1.5029ms | 665.3943 Ops/s | 612.0605 Ops/s | |
test_ddpg_speed | 4.4461ms | 3.3759ms | 296.2137 Ops/s | 294.9319 Ops/s | |
test_sac_speed | 10.4121ms | 9.5279ms | 104.9551 Ops/s | 104.9302 Ops/s | |
test_redq_speed | 18.4687ms | 16.9026ms | 59.1625 Ops/s | 59.8014 Ops/s | |
test_redq_deprec_speed | 13.9432ms | 13.0756ms | 76.4782 Ops/s | 75.6222 Ops/s | |
test_td3_speed | 19.2978ms | 9.7598ms | 102.4607 Ops/s | 102.7311 Ops/s | |
test_cql_speed | 32.5304ms | 31.4896ms | 31.7566 Ops/s | 31.0982 Ops/s | |
test_a2c_speed | 8.2376ms | 7.1167ms | 140.5143 Ops/s | 138.2081 Ops/s | |
test_ppo_speed | 8.4743ms | 7.4999ms | 133.3355 Ops/s | 136.2593 Ops/s | |
test_reinforce_speed | 7.3595ms | 6.1454ms | 162.7234 Ops/s | 162.2473 Ops/s | |
test_iql_speed | 28.5740ms | 27.3240ms | 36.5978 Ops/s | 36.5266 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.9214ms | 2.4887ms | 401.8229 Ops/s | 398.8449 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 4.1663ms | 2.6940ms | 371.1988 Ops/s | 336.0483 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 3.7173ms | 2.6592ms | 376.0521 Ops/s | 374.1238 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.1486ms | 2.5072ms | 398.8556 Ops/s | 397.6370 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 3.6310ms | 2.6726ms | 374.1740 Ops/s | 333.1903 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 3.9533ms | 2.6917ms | 371.5122 Ops/s | 374.5692 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.1317ms | 2.5140ms | 397.7662 Ops/s | 398.7505 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 3.8354ms | 2.6796ms | 373.1902 Ops/s | 374.0578 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 4.2158ms | 2.6928ms | 371.3653 Ops/s | 372.8537 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.2418ms | 2.5084ms | 398.6537 Ops/s | 397.3889 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 3.7603ms | 2.6939ms | 371.2152 Ops/s | 372.9937 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 3.5342ms | 2.6973ms | 370.7471 Ops/s | 372.8397 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.1893ms | 2.5171ms | 397.2884 Ops/s | 395.7339 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 3.8802ms | 2.6910ms | 371.6108 Ops/s | 372.9256 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 3.7666ms | 2.6952ms | 371.0277 Ops/s | 372.9339 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.2146ms | 2.5249ms | 396.0539 Ops/s | 400.0923 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 3.8503ms | 2.6997ms | 370.4099 Ops/s | 371.8926 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 3.8711ms | 2.7008ms | 370.2548 Ops/s | 372.7423 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1945s | 18.7925ms | 53.2128 Ops/s | 54.3001 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 0.1224s | 15.1014ms | 66.2191 Ops/s | 58.7659 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 0.1183s | 17.0335ms | 58.7080 Ops/s | 58.9240 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1200s | 17.1001ms | 58.4791 Ops/s | 58.3218 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 0.1195s | 17.1781ms | 58.2135 Ops/s | 66.9865 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 0.1191s | 17.1885ms | 58.1785 Ops/s | 58.3511 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1188s | 17.1288ms | 58.3812 Ops/s | 58.6521 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.1201s | 17.1114ms | 58.4405 Ops/s | 58.5974 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 0.1184s | 17.0642ms | 58.6022 Ops/s | 58.3415 Ops/s |
@@ -1342,7 +1346,7 @@ def forward(self, tensordict: TensorDictBase): | |||
# if splits is not None: | |||
# value = torch.nn.utils.rnn.pack_padded_sequence(value, splits, batch_first=True) | |||
if is_init.any() and hidden is not None: | |||
hidden[is_init] = 0 | |||
hidden = torch.where(is_init, 0, hidden) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertbou92 this too
prob = dist.probs | ||
log_prob = torch.log(torch.where(prob == 0, 1e-8, prob)) | ||
log_prob = prob.clamp_min(torch.finfo(prob.dtype).resolution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmoens Where did the log go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right!
The following code snippet tests lstm with compile and vmap (results below)
## Results of forward calls:
Results of forward + backward calls:
In short: compile does a good job at reducing the (otherwise very high) compute time of LSTM.
Backward benefits from it too (5x slower with compile, vs 15x without). For vmap calls, compile is of little help (it isn't very clear why), whether we put the compile around the vmap or the opposite.