Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix reward sum within parallel envs #1454

Merged
merged 3 commits into from
Aug 30, 2023
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Aug 9, 2023

Description

Fixes #1453

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 9, 2023
@vmoens vmoens marked this pull request as ready for review August 9, 2023 12:17
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1187s 0.1181s 8.4662 Ops/s 8.4917 Ops/s $\color{#d91a1a}-0.30\%$
test_sync 0.1232s 65.9804ms 15.1560 Ops/s 14.6966 Ops/s $\color{#35bf28}+3.13\%$
test_async 0.1782s 61.2057ms 16.3383 Ops/s 16.2376 Ops/s $\color{#35bf28}+0.62\%$
test_simple 0.5840s 0.5278s 1.8947 Ops/s 1.8573 Ops/s $\color{#35bf28}+2.01\%$
test_transformed 1.3627s 1.3220s 0.7564 Ops/s 0.7384 Ops/s $\color{#35bf28}+2.44\%$
test_serial 1.6806s 1.6400s 0.6097 Ops/s 0.5890 Ops/s $\color{#35bf28}+3.52\%$
test_parallel 1.7371s 1.4892s 0.6715 Ops/s 0.6889 Ops/s $\color{#d91a1a}-2.53\%$
test_step_mdp_speed[True-True-True-True-True] 0.3417ms 41.9488μs 23.8386 KOps/s 23.2842 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[True-True-True-True-False] 92.4000μs 23.9487μs 41.7559 KOps/s 40.9901 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[True-True-True-False-True] 54.3000μs 29.0353μs 34.4408 KOps/s 33.7143 KOps/s $\color{#35bf28}+2.15\%$
test_step_mdp_speed[True-True-True-False-False] 34.2000μs 16.3468μs 61.1740 KOps/s 59.1834 KOps/s $\color{#35bf28}+3.36\%$
test_step_mdp_speed[True-True-False-True-True] 66.8000μs 43.3320μs 23.0776 KOps/s 22.3314 KOps/s $\color{#35bf28}+3.34\%$
test_step_mdp_speed[True-True-False-True-False] 48.1000μs 25.7455μs 38.8417 KOps/s 37.7635 KOps/s $\color{#35bf28}+2.86\%$
test_step_mdp_speed[True-True-False-False-True] 78.5000μs 31.3653μs 31.8824 KOps/s 31.6230 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[True-True-False-False-False] 71.3990μs 18.3653μs 54.4505 KOps/s 53.6252 KOps/s $\color{#35bf28}+1.54\%$
test_step_mdp_speed[True-False-True-True-True] 85.1000μs 45.5888μs 21.9352 KOps/s 21.0858 KOps/s $\color{#35bf28}+4.03\%$
test_step_mdp_speed[True-False-True-True-False] 0.1808ms 27.5003μs 36.3632 KOps/s 35.2333 KOps/s $\color{#35bf28}+3.21\%$
test_step_mdp_speed[True-False-True-False-True] 52.9000μs 30.8654μs 32.3987 KOps/s 31.4311 KOps/s $\color{#35bf28}+3.08\%$
test_step_mdp_speed[True-False-True-False-False] 49.8000μs 18.2504μs 54.7933 KOps/s 53.6000 KOps/s $\color{#35bf28}+2.23\%$
test_step_mdp_speed[True-False-False-True-True] 0.2154ms 46.9304μs 21.3081 KOps/s 20.3611 KOps/s $\color{#35bf28}+4.65\%$
test_step_mdp_speed[True-False-False-True-False] 50.4000μs 28.8974μs 34.6052 KOps/s 33.1031 KOps/s $\color{#35bf28}+4.54\%$
test_step_mdp_speed[True-False-False-False-True] 77.5000μs 32.3668μs 30.8959 KOps/s 29.8479 KOps/s $\color{#35bf28}+3.51\%$
test_step_mdp_speed[True-False-False-False-False] 1.7450ms 20.1915μs 49.5258 KOps/s 48.9124 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[False-True-True-True-True] 62.7000μs 45.2768μs 22.0864 KOps/s 21.2660 KOps/s $\color{#35bf28}+3.86\%$
test_step_mdp_speed[False-True-True-True-False] 73.2000μs 27.5787μs 36.2599 KOps/s 35.1213 KOps/s $\color{#35bf28}+3.24\%$
test_step_mdp_speed[False-True-True-False-True] 60.0000μs 35.6254μs 28.0698 KOps/s 26.8566 KOps/s $\color{#35bf28}+4.52\%$
test_step_mdp_speed[False-True-True-False-False] 64.7000μs 20.8645μs 47.9283 KOps/s 47.6794 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[False-True-False-True-True] 74.8000μs 46.7144μs 21.4067 KOps/s 20.6727 KOps/s $\color{#35bf28}+3.55\%$
test_step_mdp_speed[False-True-False-True-False] 57.7000μs 29.2292μs 34.2124 KOps/s 33.1761 KOps/s $\color{#35bf28}+3.12\%$
test_step_mdp_speed[False-True-False-False-True] 86.4000μs 37.3229μs 26.7932 KOps/s 25.7547 KOps/s $\color{#35bf28}+4.03\%$
test_step_mdp_speed[False-True-False-False-False] 0.1353ms 21.9363μs 45.5865 KOps/s 43.7678 KOps/s $\color{#35bf28}+4.16\%$
test_step_mdp_speed[False-False-True-True-True] 77.8000μs 48.6579μs 20.5517 KOps/s 19.7860 KOps/s $\color{#35bf28}+3.87\%$
test_step_mdp_speed[False-False-True-True-False] 55.9010μs 31.0734μs 32.1819 KOps/s 30.7067 KOps/s $\color{#35bf28}+4.80\%$
test_step_mdp_speed[False-False-True-False-True] 61.4000μs 37.4335μs 26.7140 KOps/s 25.8353 KOps/s $\color{#35bf28}+3.40\%$
test_step_mdp_speed[False-False-True-False-False] 72.9000μs 21.9582μs 45.5412 KOps/s 43.7955 KOps/s $\color{#35bf28}+3.99\%$
test_step_mdp_speed[False-False-False-True-True] 2.0420ms 50.4116μs 19.8367 KOps/s 19.1992 KOps/s $\color{#35bf28}+3.32\%$
test_step_mdp_speed[False-False-False-True-False] 74.3010μs 32.5262μs 30.7445 KOps/s 29.3083 KOps/s $\color{#35bf28}+4.90\%$
test_step_mdp_speed[False-False-False-False-True] 78.7000μs 38.8093μs 25.7671 KOps/s 24.9180 KOps/s $\color{#35bf28}+3.41\%$
test_step_mdp_speed[False-False-False-False-False] 45.9010μs 23.5089μs 42.5371 KOps/s 41.0821 KOps/s $\color{#35bf28}+3.54\%$
test_values[generalized_advantage_estimate-True-True] 14.7766ms 13.3994ms 74.6300 Ops/s 73.9753 Ops/s $\color{#35bf28}+0.89\%$
test_values[vec_generalized_advantage_estimate-True-True] 56.2641ms 50.4043ms 19.8396 Ops/s 19.4426 Ops/s $\color{#35bf28}+2.04\%$
test_values[td0_return_estimate-False-False] 0.5622ms 0.1992ms 5.0199 KOps/s 4.4960 KOps/s $\textbf{\color{#35bf28}+11.65\%}$
test_values[td1_return_estimate-False-False] 13.4107ms 13.1775ms 75.8868 Ops/s 73.0290 Ops/s $\color{#35bf28}+3.91\%$
test_values[vec_td1_return_estimate-False-False] 55.9084ms 50.5512ms 19.7819 Ops/s 19.7221 Ops/s $\color{#35bf28}+0.30\%$
test_values[td_lambda_return_estimate-True-False] 34.2593ms 31.7868ms 31.4596 Ops/s 31.0166 Ops/s $\color{#35bf28}+1.43\%$
test_values[vec_td_lambda_return_estimate-True-False] 56.5529ms 51.5904ms 19.3835 Ops/s 19.7846 Ops/s $\color{#d91a1a}-2.03\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 12.9894ms 12.0165ms 83.2189 Ops/s 81.3979 Ops/s $\color{#35bf28}+2.24\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 3.1648ms 2.3907ms 418.2893 Ops/s 385.3225 Ops/s $\textbf{\color{#35bf28}+8.56\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 4.8243ms 0.4119ms 2.4277 KOps/s 2.4140 KOps/s $\color{#35bf28}+0.57\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 57.6616ms 52.6833ms 18.9814 Ops/s 19.4273 Ops/s $\color{#d91a1a}-2.30\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.1776ms 3.7811ms 264.4758 Ops/s 265.3367 Ops/s $\color{#d91a1a}-0.32\%$
test_dqn_speed 6.5966ms 1.7374ms 575.5831 Ops/s 576.9624 Ops/s $\color{#d91a1a}-0.24\%$
test_ddpg_speed 7.4951ms 2.4671ms 405.3290 Ops/s 403.9577 Ops/s $\color{#35bf28}+0.34\%$
test_sac_speed 12.3025ms 7.7134ms 129.6451 Ops/s 128.6577 Ops/s $\color{#35bf28}+0.77\%$
test_redq_speed 19.6086ms 15.5083ms 64.4817 Ops/s 65.6992 Ops/s $\color{#d91a1a}-1.85\%$
test_redq_deprec_speed 17.8464ms 12.6545ms 79.0233 Ops/s 80.3229 Ops/s $\color{#d91a1a}-1.62\%$
test_td3_speed 10.7144ms 9.5769ms 104.4181 Ops/s 104.1795 Ops/s $\color{#35bf28}+0.23\%$
test_cql_speed 39.2337ms 29.2928ms 34.1380 Ops/s 39.7049 Ops/s $\textbf{\color{#d91a1a}-14.02\%}$
test_a2c_speed 10.2868ms 5.2879ms 189.1124 Ops/s 189.7434 Ops/s $\color{#d91a1a}-0.33\%$
test_ppo_speed 14.8883ms 5.7549ms 173.7655 Ops/s 173.7969 Ops/s $\color{#d91a1a}-0.02\%$
test_reinforce_speed 8.9519ms 4.1314ms 242.0488 Ops/s 244.2003 Ops/s $\color{#d91a1a}-0.88\%$
test_iql_speed 25.7726ms 20.9834ms 47.6567 Ops/s 45.4287 Ops/s $\color{#35bf28}+4.90\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0218ms 2.4917ms 401.3271 Ops/s 413.2824 Ops/s $\color{#d91a1a}-2.89\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 4.2661ms 2.6169ms 382.1360 Ops/s 388.1360 Ops/s $\color{#d91a1a}-1.55\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1070s 2.8202ms 354.5820 Ops/s 366.0240 Ops/s $\color{#d91a1a}-3.13\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1679ms 2.4319ms 411.2077 Ops/s 329.6348 Ops/s $\textbf{\color{#35bf28}+24.75\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.7614ms 2.4900ms 401.5987 Ops/s 359.1965 Ops/s $\textbf{\color{#35bf28}+11.80\%}$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.0341ms 2.5112ms 398.2085 Ops/s 360.6777 Ops/s $\textbf{\color{#35bf28}+10.41\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.8632ms 2.3477ms 425.9409 Ops/s 381.0714 Ops/s $\textbf{\color{#35bf28}+11.77\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 3.6314ms 2.5590ms 390.7798 Ops/s 369.6513 Ops/s $\textbf{\color{#35bf28}+5.72\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2.5290ms 2.3608ms 423.5822 Ops/s 373.7505 Ops/s $\textbf{\color{#35bf28}+13.33\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0717ms 2.3423ms 426.9329 Ops/s 410.5260 Ops/s $\color{#35bf28}+4.00\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.1002s 2.7863ms 358.8958 Ops/s 370.2692 Ops/s $\color{#d91a1a}-3.07\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.9479ms 2.5273ms 395.6767 Ops/s 369.2233 Ops/s $\textbf{\color{#35bf28}+7.16\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.9742ms 2.5770ms 388.0538 Ops/s 405.8804 Ops/s $\color{#d91a1a}-4.39\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 4.7281ms 2.6856ms 372.3622 Ops/s 385.0157 Ops/s $\color{#d91a1a}-3.29\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.1243ms 2.7011ms 370.2133 Ops/s 385.2467 Ops/s $\color{#d91a1a}-3.90\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.3702ms 2.6084ms 383.3779 Ops/s 407.4434 Ops/s $\textbf{\color{#d91a1a}-5.91\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 4.1245ms 2.6359ms 379.3736 Ops/s 388.3527 Ops/s $\color{#d91a1a}-2.31\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.7200ms 2.5666ms 389.6244 Ops/s 359.6003 Ops/s $\textbf{\color{#35bf28}+8.35\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1809s 26.3173ms 37.9978 Ops/s 36.6085 Ops/s $\color{#35bf28}+3.80\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1072s 22.9400ms 43.5920 Ops/s 38.8907 Ops/s $\textbf{\color{#35bf28}+12.09\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1070s 24.7535ms 40.3984 Ops/s 42.2116 Ops/s $\color{#d91a1a}-4.30\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1106s 23.1160ms 43.2600 Ops/s 38.5408 Ops/s $\textbf{\color{#35bf28}+12.24\%}$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1083s 24.8719ms 40.2061 Ops/s 41.1872 Ops/s $\color{#d91a1a}-2.38\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1170s 23.5993ms 42.3742 Ops/s 42.1946 Ops/s $\color{#35bf28}+0.43\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1186s 25.8829ms 38.6356 Ops/s 42.2681 Ops/s $\textbf{\color{#d91a1a}-8.59\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1182s 23.7273ms 42.1456 Ops/s 39.6871 Ops/s $\textbf{\color{#35bf28}+6.19\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1169s 25.5321ms 39.1665 Ops/s 42.2982 Ops/s $\textbf{\color{#d91a1a}-7.40\%}$

@vmoens vmoens added the bug Something isn't working label Aug 30, 2023
Copy link
Contributor

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vmoens vmoens merged commit 530efa8 into main Aug 30, 2023
49 of 54 checks passed
osalpekar pushed a commit to osalpekar/rl that referenced this pull request Aug 30, 2023
vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023
@vmoens vmoens deleted the fix_reward_sum_parallel branch August 7, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] RewardSum not working with ParallelEnv
3 participants