Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Change objectives parameter/buffer/target logic #1424

Merged
merged 8 commits into from
Aug 11, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 28, 2023

Description

Aims at solving #1407

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 28, 2023
@vmoens vmoens added the bug Something isn't working label Jul 28, 2023
@github-actions
Copy link

github-actions bot commented Jul 31, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}0$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1190s 0.1181s 8.4672 Ops/s 8.5671 Ops/s $\color{#d91a1a}-1.17\%$
test_sync 0.1241s 65.7682ms 15.2049 Ops/s 15.2249 Ops/s $\color{#d91a1a}-0.13\%$
test_async 0.1879s 61.3348ms 16.3039 Ops/s 15.6618 Ops/s $\color{#35bf28}+4.10\%$
test_simple 0.5717s 0.5204s 1.9215 Ops/s 1.8969 Ops/s $\color{#35bf28}+1.30\%$
test_transformed 1.3711s 1.3310s 0.7513 Ops/s 0.7460 Ops/s $\color{#35bf28}+0.70\%$
test_serial 1.6700s 1.6661s 0.6002 Ops/s 0.5965 Ops/s $\color{#35bf28}+0.61\%$
test_parallel 1.8732s 1.5691s 0.6373 Ops/s 0.6553 Ops/s $\color{#d91a1a}-2.75\%$
test_step_mdp_speed[True-True-True-True-True] 4.4538ms 42.8241μs 23.3513 KOps/s 23.6261 KOps/s $\color{#d91a1a}-1.16\%$
test_step_mdp_speed[True-True-True-True-False] 45.9000μs 24.1028μs 41.4889 KOps/s 41.5822 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[True-True-True-False-True] 87.4000μs 29.8768μs 33.4708 KOps/s 33.5009 KOps/s $\color{#d91a1a}-0.09\%$
test_step_mdp_speed[True-True-True-False-False] 44.2000μs 16.5531μs 60.4116 KOps/s 60.6526 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[True-True-False-True-True] 0.1039ms 44.2310μs 22.6086 KOps/s 22.9220 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[True-True-False-True-False] 88.6990μs 26.0943μs 38.3225 KOps/s 38.4858 KOps/s $\color{#d91a1a}-0.42\%$
test_step_mdp_speed[True-True-False-False-True] 60.6000μs 31.7422μs 31.5038 KOps/s 31.4848 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-True-False-False-False] 0.2296ms 18.5953μs 53.7772 KOps/s 53.9366 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[True-False-True-True-True] 61.9000μs 46.2926μs 21.6017 KOps/s 21.9266 KOps/s $\color{#d91a1a}-1.48\%$
test_step_mdp_speed[True-False-True-True-False] 79.7000μs 27.6936μs 36.1095 KOps/s 36.2650 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-False-True-False-True] 57.4000μs 31.6362μs 31.6093 KOps/s 31.6505 KOps/s $\color{#d91a1a}-0.13\%$
test_step_mdp_speed[True-False-True-False-False] 62.3000μs 18.4133μs 54.3085 KOps/s 54.6899 KOps/s $\color{#d91a1a}-0.70\%$
test_step_mdp_speed[True-False-False-True-True] 0.1063ms 48.2704μs 20.7166 KOps/s 21.1168 KOps/s $\color{#d91a1a}-1.90\%$
test_step_mdp_speed[True-False-False-True-False] 52.4990μs 29.6320μs 33.7473 KOps/s 33.9072 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[True-False-False-False-True] 85.6000μs 33.3282μs 30.0046 KOps/s 30.1834 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[True-False-False-False-False] 42.2000μs 20.2245μs 49.4450 KOps/s 50.1540 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[False-True-True-True-True] 69.8000μs 46.5575μs 21.4788 KOps/s 22.1300 KOps/s $\color{#d91a1a}-2.94\%$
test_step_mdp_speed[False-True-True-True-False] 86.1000μs 28.0250μs 35.6824 KOps/s 36.0203 KOps/s $\color{#d91a1a}-0.94\%$
test_step_mdp_speed[False-True-True-False-True] 87.8990μs 37.0893μs 26.9620 KOps/s 27.2153 KOps/s $\color{#d91a1a}-0.93\%$
test_step_mdp_speed[False-True-True-False-False] 80.6000μs 20.5079μs 48.7616 KOps/s 48.5657 KOps/s $\color{#35bf28}+0.40\%$
test_step_mdp_speed[False-True-False-True-True] 74.0000μs 47.9369μs 20.8607 KOps/s 21.1656 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[False-True-False-True-False] 82.1990μs 29.6627μs 33.7124 KOps/s 34.0559 KOps/s $\color{#d91a1a}-1.01\%$
test_step_mdp_speed[False-True-False-False-True] 91.4000μs 38.8008μs 25.7726 KOps/s 26.1488 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[False-True-False-False-False] 70.3000μs 22.4897μs 44.4648 KOps/s 44.8245 KOps/s $\color{#d91a1a}-0.80\%$
test_step_mdp_speed[False-False-True-True-True] 0.1066ms 49.9203μs 20.0319 KOps/s 20.6704 KOps/s $\color{#d91a1a}-3.09\%$
test_step_mdp_speed[False-False-True-True-False] 86.5000μs 31.4357μs 31.8110 KOps/s 32.4569 KOps/s $\color{#d91a1a}-1.99\%$
test_step_mdp_speed[False-False-True-False-True] 77.1000μs 38.9099μs 25.7004 KOps/s 26.1568 KOps/s $\color{#d91a1a}-1.74\%$
test_step_mdp_speed[False-False-True-False-False] 78.8990μs 21.8772μs 45.7097 KOps/s 44.9628 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[False-False-False-True-True] 74.5000μs 51.6765μs 19.3512 KOps/s 19.9635 KOps/s $\color{#d91a1a}-3.07\%$
test_step_mdp_speed[False-False-False-True-False] 62.0000μs 33.3879μs 29.9509 KOps/s 30.5135 KOps/s $\color{#d91a1a}-1.84\%$
test_step_mdp_speed[False-False-False-False-True] 1.6125ms 40.5470μs 24.6628 KOps/s 25.4130 KOps/s $\color{#d91a1a}-2.95\%$
test_step_mdp_speed[False-False-False-False-False] 2.6162ms 24.1739μs 41.3670 KOps/s 41.8021 KOps/s $\color{#d91a1a}-1.04\%$
test_values[generalized_advantage_estimate-True-True] 14.4536ms 13.3998ms 74.6278 Ops/s 73.0237 Ops/s $\color{#35bf28}+2.20\%$
test_values[vec_generalized_advantage_estimate-True-True] 56.9013ms 50.7920ms 19.6881 Ops/s 19.6013 Ops/s $\color{#35bf28}+0.44\%$
test_values[td0_return_estimate-False-False] 0.3326ms 0.1881ms 5.3155 KOps/s 5.6036 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_values[td1_return_estimate-False-False] 13.3764ms 13.0898ms 76.3951 Ops/s 75.9270 Ops/s $\color{#35bf28}+0.62\%$
test_values[vec_td1_return_estimate-False-False] 56.4833ms 50.8821ms 19.6533 Ops/s 19.7190 Ops/s $\color{#d91a1a}-0.33\%$
test_values[td_lambda_return_estimate-True-False] 32.5225ms 31.9197ms 31.3286 Ops/s 31.3666 Ops/s $\color{#d91a1a}-0.12\%$
test_values[vec_td_lambda_return_estimate-True-False] 75.6917ms 51.5745ms 19.3894 Ops/s 19.8133 Ops/s $\color{#d91a1a}-2.14\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 12.1501ms 11.9789ms 83.4803 Ops/s 83.0232 Ops/s $\color{#35bf28}+0.55\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.6603ms 2.3646ms 422.8965 Ops/s 421.4769 Ops/s $\color{#35bf28}+0.34\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5329ms 0.3908ms 2.5588 KOps/s 2.5967 KOps/s $\color{#d91a1a}-1.46\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 53.1319ms 51.6148ms 19.3743 Ops/s 19.6662 Ops/s $\color{#d91a1a}-1.48\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 8.3476ms 3.7540ms 266.3791 Ops/s 267.5043 Ops/s $\color{#d91a1a}-0.42\%$
test_dqn_speed 6.3153ms 1.7289ms 578.3901 Ops/s 811.7444 Ops/s $\textbf{\color{#d91a1a}-28.75\%}$
test_ddpg_speed 7.9415ms 2.4661ms 405.5026 Ops/s 414.3943 Ops/s $\color{#d91a1a}-2.15\%$
test_sac_speed 12.4068ms 7.8308ms 127.7004 Ops/s 124.3120 Ops/s $\color{#35bf28}+2.73\%$
test_redq_speed 19.4275ms 14.7907ms 67.6102 Ops/s 72.7231 Ops/s $\textbf{\color{#d91a1a}-7.03\%}$
test_redq_deprec_speed 16.9634ms 12.2807ms 81.4289 Ops/s 83.2011 Ops/s $\color{#d91a1a}-2.13\%$
test_td3_speed 11.1853ms 9.5043ms 105.2153 Ops/s 110.4251 Ops/s $\color{#d91a1a}-4.72\%$
test_cql_speed 33.0496ms 26.6870ms 37.4714 Ops/s 39.6174 Ops/s $\textbf{\color{#d91a1a}-5.42\%}$
test_a2c_speed 10.1046ms 5.4685ms 182.8670 Ops/s 192.3820 Ops/s $\color{#d91a1a}-4.95\%$
test_ppo_speed 10.6969ms 5.7322ms 174.4535 Ops/s 177.3914 Ops/s $\color{#d91a1a}-1.66\%$
test_reinforce_speed 9.1887ms 4.1715ms 239.7194 Ops/s 246.4547 Ops/s $\color{#d91a1a}-2.73\%$
test_iql_speed 26.9902ms 21.5153ms 46.4785 Ops/s 48.6907 Ops/s $\color{#d91a1a}-4.54\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0377ms 2.5327ms 394.8317 Ops/s 394.8279 Ops/s $+0.00\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 4.1143ms 2.6552ms 376.6223 Ops/s 374.3416 Ops/s $\color{#35bf28}+0.61\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 5.5271ms 2.6558ms 376.5301 Ops/s 369.9187 Ops/s $\color{#35bf28}+1.79\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.5362ms 2.5448ms 392.9640 Ops/s 400.1014 Ops/s $\color{#d91a1a}-1.78\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 4.9878ms 2.6561ms 376.4920 Ops/s 371.8375 Ops/s $\color{#35bf28}+1.25\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 5.7776ms 2.6718ms 374.2838 Ops/s 372.9101 Ops/s $\color{#35bf28}+0.37\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.2423ms 2.4544ms 407.4304 Ops/s 390.7475 Ops/s $\color{#35bf28}+4.27\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 4.4004ms 2.6483ms 377.5968 Ops/s 371.0392 Ops/s $\color{#35bf28}+1.77\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.1151s 3.0129ms 331.9083 Ops/s 374.6575 Ops/s $\textbf{\color{#d91a1a}-11.41\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.6302ms 2.5843ms 386.9471 Ops/s 390.8564 Ops/s $\color{#d91a1a}-1.00\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 4.6984ms 2.6838ms 372.6123 Ops/s 371.5439 Ops/s $\color{#35bf28}+0.29\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 5.2595ms 2.6691ms 374.6643 Ops/s 373.0071 Ops/s $\color{#35bf28}+0.44\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.2624ms 2.5263ms 395.8402 Ops/s 390.8729 Ops/s $\color{#35bf28}+1.27\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 4.1958ms 2.6458ms 377.9587 Ops/s 377.2186 Ops/s $\color{#35bf28}+0.20\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.3768ms 2.6394ms 378.8780 Ops/s 366.7975 Ops/s $\color{#35bf28}+3.29\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.3785ms 2.5086ms 398.6271 Ops/s 386.7797 Ops/s $\color{#35bf28}+3.06\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 5.2533ms 2.6382ms 379.0401 Ops/s 368.4975 Ops/s $\color{#35bf28}+2.86\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 5.2360ms 2.6792ms 373.2446 Ops/s 364.1810 Ops/s $\color{#35bf28}+2.49\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1786s 26.9511ms 37.1042 Ops/s 36.9411 Ops/s $\color{#35bf28}+0.44\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1134s 25.4625ms 39.2735 Ops/s 37.6251 Ops/s $\color{#35bf28}+4.38\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1147s 23.7579ms 42.0912 Ops/s 40.7962 Ops/s $\color{#35bf28}+3.17\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1126s 25.5397ms 39.1547 Ops/s 37.7320 Ops/s $\color{#35bf28}+3.77\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1142s 23.6573ms 42.2702 Ops/s 40.8544 Ops/s $\color{#35bf28}+3.47\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1152s 25.6398ms 39.0019 Ops/s 37.5785 Ops/s $\color{#35bf28}+3.79\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1130s 23.7068ms 42.1819 Ops/s 40.8767 Ops/s $\color{#35bf28}+3.19\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1140s 25.5031ms 39.2110 Ops/s 37.9976 Ops/s $\color{#35bf28}+3.19\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1130s 23.7573ms 42.0924 Ops/s 41.3723 Ops/s $\color{#35bf28}+1.74\%$

@vmoens
Copy link
Contributor Author

vmoens commented Aug 9, 2023

The last outstanding issues with this PR are

  • the dtype and device casting
  • the call to loss_module.requires_grad_(True) which will mess up the target params

We need target params that are not property (for within algo module repurposing).
We can take the data of the param and register it within a different nn.Parameter but then calling to(smth) will unbind the param and the target. Augmenting module.to will not solve the issue when the module is registered within another one, as the calls to to are not recursive. The previous solution with the pre-get hook did not work well either and was a bit brittle.

@vmoens vmoens marked this pull request as ready for review August 10, 2023 12:59
@vmoens vmoens merged commit c12d8bc into main Aug 11, 2023
49 of 54 checks passed
vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023
@vmoens vmoens deleted the refactor_param_losses branch August 7, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] The actor_params attribute is lost when passing a Loss module to a multiprocessing process
3 participants