Skip to content

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 6, 2024

This PR:

  • Allows users to stack data (and not cat) in MultiSyncDataCollector (@matteobettini)
  • Documents how batches are collected across collectors
  • Documents how collectors interact with replay buffers
  • Solves a bug for multidim buffers when ndim > 2
  • Makes sure that the trajectories in MultiSyncDataCollector are unique and consecutive
  • Makes sure that the trajectories in MultiaSyncDataCollector are unique and consecutive
  • Masks and squashes preempted batches in MultiSyncDataCollector

TODO:

  • Add the stack_result keyword in docstrings
  • Make stack_result raise an exception in MultiaSyncDataCollector
  • Resolve the cat dim issue for preemption (cat along dim=-1) and document it
  • Check that things work of without doing cat inplace and cloning

Check the updated doc below in collectors to learn more

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1994

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 13 New Failures

As of commit c742a43 with merge base 4bce371 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2024
@github-actions
Copy link

github-actions bot commented Mar 6, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 53.1089ms 52.4619ms 19.0615 Ops/s 17.8592 Ops/s $\textbf{\color{#35bf28}+6.73\%}$
test_sync 35.5930ms 29.7961ms 33.5614 Ops/s 30.5180 Ops/s $\textbf{\color{#35bf28}+9.97\%}$
test_async 48.0734ms 27.8519ms 35.9041 Ops/s 35.5460 Ops/s $\color{#35bf28}+1.01\%$
test_simple 0.3818s 0.3279s 3.0499 Ops/s 3.0865 Ops/s $\color{#d91a1a}-1.19\%$
test_transformed 0.5020s 0.4594s 2.1768 Ops/s 2.1565 Ops/s $\color{#35bf28}+0.94\%$
test_serial 1.1867s 1.1606s 0.8616 Ops/s 0.8428 Ops/s $\color{#35bf28}+2.23\%$
test_parallel 1.0493s 1.0099s 0.9902 Ops/s 0.9702 Ops/s $\color{#35bf28}+2.07\%$
test_step_mdp_speed[True-True-True-True-True] 0.1250ms 21.3903μs 46.7502 KOps/s 46.7949 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-True-True-True-False] 35.2950μs 13.0374μs 76.7025 KOps/s 76.8782 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-True-True-False-True] 42.2580μs 12.3168μs 81.1900 KOps/s 80.0651 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[True-True-True-False-False] 26.4190μs 7.5499μs 132.4519 KOps/s 129.5662 KOps/s $\color{#35bf28}+2.23\%$
test_step_mdp_speed[True-True-False-True-True] 74.0880μs 22.3446μs 44.7536 KOps/s 44.1395 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[True-True-False-True-False] 40.1950μs 14.2987μs 69.9366 KOps/s 69.4999 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[True-True-False-False-True] 38.1520μs 13.6421μs 73.3025 KOps/s 73.1831 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[True-True-False-False-False] 32.5910μs 8.6569μs 115.5152 KOps/s 112.7608 KOps/s $\color{#35bf28}+2.44\%$
test_step_mdp_speed[True-False-True-True-True] 57.0970μs 23.5144μs 42.5271 KOps/s 41.9789 KOps/s $\color{#35bf28}+1.31\%$
test_step_mdp_speed[True-False-True-True-False] 50.8550μs 15.3461μs 65.1630 KOps/s 63.6732 KOps/s $\color{#35bf28}+2.34\%$
test_step_mdp_speed[True-False-True-False-True] 34.3850μs 13.5242μs 73.9413 KOps/s 72.0770 KOps/s $\color{#35bf28}+2.59\%$
test_step_mdp_speed[True-False-True-False-False] 33.3020μs 8.7365μs 114.4623 KOps/s 113.3148 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[True-False-False-True-True] 52.3570μs 24.8933μs 40.1714 KOps/s 39.5818 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[True-False-False-True-False] 37.2990μs 16.7168μs 59.8199 KOps/s 60.3541 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-False-False-False-True] 50.5350μs 14.5713μs 68.6279 KOps/s 67.2846 KOps/s $\color{#35bf28}+2.00\%$
test_step_mdp_speed[True-False-False-False-False] 26.2490μs 9.6708μs 103.4043 KOps/s 99.4530 KOps/s $\color{#35bf28}+3.97\%$
test_step_mdp_speed[False-True-True-True-True] 53.1390μs 23.6391μs 42.3028 KOps/s 41.1045 KOps/s $\color{#35bf28}+2.92\%$
test_step_mdp_speed[False-True-True-True-False] 36.0170μs 15.5179μs 64.4416 KOps/s 63.7756 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[False-True-True-False-True] 42.8390μs 15.5322μs 64.3823 KOps/s 61.4712 KOps/s $\color{#35bf28}+4.74\%$
test_step_mdp_speed[False-True-True-False-False] 50.5540μs 9.8581μs 101.4390 KOps/s 99.9243 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[False-True-False-True-True] 36.6280μs 25.2565μs 39.5938 KOps/s 38.7349 KOps/s $\color{#35bf28}+2.22\%$
test_step_mdp_speed[False-True-False-True-False] 39.2430μs 16.8569μs 59.3230 KOps/s 59.9915 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[False-True-False-False-True] 45.7250μs 16.8425μs 59.3737 KOps/s 58.6620 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[False-True-False-False-False] 35.4870μs 11.1355μs 89.8032 KOps/s 88.5966 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[False-False-True-True-True] 54.7720μs 26.1864μs 38.1877 KOps/s 37.5347 KOps/s $\color{#35bf28}+1.74\%$
test_step_mdp_speed[False-False-True-True-False] 41.4570μs 17.9696μs 55.6497 KOps/s 55.3146 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[False-False-True-False-True] 45.6250μs 16.6239μs 60.1545 KOps/s 58.4985 KOps/s $\color{#35bf28}+2.83\%$
test_step_mdp_speed[False-False-True-False-False] 32.2400μs 11.1159μs 89.9610 KOps/s 88.1230 KOps/s $\color{#35bf28}+2.09\%$
test_step_mdp_speed[False-False-False-True-True] 54.9230μs 27.3030μs 36.6260 KOps/s 36.2383 KOps/s $\color{#35bf28}+1.07\%$
test_step_mdp_speed[False-False-False-True-False] 60.4120μs 19.2345μs 51.9900 KOps/s 52.4694 KOps/s $\color{#d91a1a}-0.91\%$
test_step_mdp_speed[False-False-False-False-True] 57.1770μs 17.6069μs 56.7959 KOps/s 54.4672 KOps/s $\color{#35bf28}+4.28\%$
test_step_mdp_speed[False-False-False-False-False] 38.3810μs 12.1714μs 82.1599 KOps/s 81.0185 KOps/s $\color{#35bf28}+1.41\%$
test_values[generalized_advantage_estimate-True-True] 9.8043ms 9.0875ms 110.0414 Ops/s 107.2400 Ops/s $\color{#35bf28}+2.61\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.5890ms 35.0775ms 28.5083 Ops/s 28.7461 Ops/s $\color{#d91a1a}-0.83\%$
test_values[td0_return_estimate-False-False] 0.2203ms 0.1611ms 6.2061 KOps/s 6.2029 KOps/s $\color{#35bf28}+0.05\%$
test_values[td1_return_estimate-False-False] 25.7561ms 22.7775ms 43.9030 Ops/s 43.0699 Ops/s $\color{#35bf28}+1.93\%$
test_values[vec_td1_return_estimate-False-False] 38.1532ms 35.2206ms 28.3925 Ops/s 28.6193 Ops/s $\color{#d91a1a}-0.79\%$
test_values[td_lambda_return_estimate-True-False] 35.0813ms 32.7377ms 30.5458 Ops/s 30.0173 Ops/s $\color{#35bf28}+1.76\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.8560ms 35.2269ms 28.3874 Ops/s 28.6598 Ops/s $\color{#d91a1a}-0.95\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.6559ms 8.0781ms 123.7914 Ops/s 121.1603 Ops/s $\color{#35bf28}+2.17\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.4300ms 1.9107ms 523.3781 Ops/s 529.2330 Ops/s $\color{#d91a1a}-1.11\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4744ms 0.3450ms 2.8986 KOps/s 2.8970 KOps/s $\color{#35bf28}+0.05\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 46.1720ms 44.9930ms 22.2257 Ops/s 22.8807 Ops/s $\color{#d91a1a}-2.86\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.6552ms 3.0170ms 331.4513 Ops/s 332.6410 Ops/s $\color{#d91a1a}-0.36\%$
test_dqn_speed 1.5418ms 1.3210ms 757.0176 Ops/s 759.5988 Ops/s $\color{#d91a1a}-0.34\%$
test_ddpg_speed 3.4279ms 2.6397ms 378.8267 Ops/s 381.2410 Ops/s $\color{#d91a1a}-0.63\%$
test_sac_speed 8.3517ms 8.0126ms 124.8033 Ops/s 125.0693 Ops/s $\color{#d91a1a}-0.21\%$
test_redq_speed 14.8748ms 12.8585ms 77.7696 Ops/s 77.6354 Ops/s $\color{#35bf28}+0.17\%$
test_redq_deprec_speed 13.1945ms 12.7046ms 78.7114 Ops/s 78.2568 Ops/s $\color{#35bf28}+0.58\%$
test_td3_speed 8.1084ms 7.9417ms 125.9182 Ops/s 124.5643 Ops/s $\color{#35bf28}+1.09\%$
test_cql_speed 0.1139s 38.9495ms 25.6743 Ops/s 27.7055 Ops/s $\textbf{\color{#d91a1a}-7.33\%}$
test_a2c_speed 8.5951ms 7.2642ms 137.6606 Ops/s 138.8972 Ops/s $\color{#d91a1a}-0.89\%$
test_ppo_speed 8.1655ms 7.5537ms 132.3863 Ops/s 133.3126 Ops/s $\color{#d91a1a}-0.69\%$
test_reinforce_speed 7.3639ms 6.4781ms 154.3657 Ops/s 155.0059 Ops/s $\color{#d91a1a}-0.41\%$
test_iql_speed 32.5968ms 32.0920ms 31.1604 Ops/s 31.3865 Ops/s $\color{#d91a1a}-0.72\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1668ms 2.0510ms 487.5612 Ops/s 478.3535 Ops/s $\color{#35bf28}+1.92\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6458ms 0.4924ms 2.0307 KOps/s 2.0466 KOps/s $\color{#d91a1a}-0.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7082ms 0.4665ms 2.1436 KOps/s 2.1760 KOps/s $\color{#d91a1a}-1.49\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.9805ms 2.0553ms 486.5561 Ops/s 491.5167 Ops/s $\color{#d91a1a}-1.01\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8988ms 0.4852ms 2.0612 KOps/s 2.0990 KOps/s $\color{#d91a1a}-1.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6528ms 0.4577ms 2.1851 KOps/s 2.1917 KOps/s $\color{#d91a1a}-0.30\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5383ms 1.2749ms 784.3799 Ops/s 792.6227 Ops/s $\color{#d91a1a}-1.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6343ms 1.2081ms 827.7760 Ops/s 834.7936 Ops/s $\color{#d91a1a}-0.84\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.4567ms 2.1776ms 459.2168 Ops/s 452.2404 Ops/s $\color{#35bf28}+1.54\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9387ms 0.6012ms 1.6634 KOps/s 1.6702 KOps/s $\color{#d91a1a}-0.40\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 96.2632ms 0.6454ms 1.5493 KOps/s 1.7655 KOps/s $\textbf{\color{#d91a1a}-12.24\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.2259ms 2.0476ms 488.3712 Ops/s 484.7456 Ops/s $\color{#35bf28}+0.75\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6462ms 0.4926ms 2.0300 KOps/s 2.0431 KOps/s $\color{#d91a1a}-0.64\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.7594ms 0.4710ms 2.1230 KOps/s 2.1677 KOps/s $\color{#d91a1a}-2.06\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.2010ms 2.0844ms 479.7431 Ops/s 478.5863 Ops/s $\color{#35bf28}+0.24\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5640ms 0.4834ms 2.0687 KOps/s 2.0792 KOps/s $\color{#d91a1a}-0.50\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.5818ms 0.4670ms 2.1413 KOps/s 2.1529 KOps/s $\color{#d91a1a}-0.54\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.2854ms 2.1726ms 460.2683 Ops/s 453.1496 Ops/s $\color{#35bf28}+1.57\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1800ms 0.6076ms 1.6458 KOps/s 1.6660 KOps/s $\color{#d91a1a}-1.21\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7689ms 0.5763ms 1.7351 KOps/s 1.7422 KOps/s $\color{#d91a1a}-0.41\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1024s 7.3330ms 136.3692 Ops/s 144.2807 Ops/s $\textbf{\color{#d91a1a}-5.48\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 13.9910ms 11.9698ms 83.5438 Ops/s 83.6562 Ops/s $\color{#d91a1a}-0.13\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.5816ms 1.0728ms 932.1422 Ops/s 993.7897 Ops/s $\textbf{\color{#d91a1a}-6.20\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 85.9577ms 5.1925ms 192.5868 Ops/s 149.4518 Ops/s $\textbf{\color{#35bf28}+28.86\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 93.1944ms 13.5613ms 73.7390 Ops/s 83.6510 Ops/s $\textbf{\color{#d91a1a}-11.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.5437ms 1.0427ms 959.0321 Ops/s 944.3283 Ops/s $\color{#35bf28}+1.56\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 88.8976ms 5.6054ms 178.3986 Ops/s 183.2893 Ops/s $\color{#d91a1a}-2.67\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 14.7068ms 12.3973ms 80.6627 Ops/s 71.8944 Ops/s $\textbf{\color{#35bf28}+12.20\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.8868ms 1.3606ms 734.9822 Ops/s 717.0357 Ops/s $\color{#35bf28}+2.50\%$

@github-actions
Copy link

github-actions bot commented Mar 6, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 99.5076ms 98.5673ms 10.1454 Ops/s 9.3467 Ops/s $\textbf{\color{#35bf28}+8.54\%}$
test_sync 91.5732ms 87.6182ms 11.4131 Ops/s 11.6097 Ops/s $\color{#d91a1a}-1.69\%$
test_async 0.1747s 87.6835ms 11.4047 Ops/s 11.5133 Ops/s $\color{#d91a1a}-0.94\%$
test_single_pixels 0.1094s 0.1090s 9.1743 Ops/s 9.0476 Ops/s $\color{#35bf28}+1.40\%$
test_sync_pixels 68.2034ms 65.8578ms 15.1842 Ops/s 15.1073 Ops/s $\color{#35bf28}+0.51\%$
test_async_pixels 0.1218s 55.3405ms 18.0700 Ops/s 18.0921 Ops/s $\color{#d91a1a}-0.12\%$
test_simple 0.6423s 0.6417s 1.5584 Ops/s 1.4795 Ops/s $\textbf{\color{#35bf28}+5.34\%}$
test_transformed 0.8439s 0.8428s 1.1865 Ops/s 1.1405 Ops/s $\color{#35bf28}+4.04\%$
test_serial 2.0915s 2.0288s 0.4929 Ops/s 0.4724 Ops/s $\color{#35bf28}+4.35\%$
test_parallel 1.8423s 1.7811s 0.5615 Ops/s 0.5512 Ops/s $\color{#35bf28}+1.86\%$
test_step_mdp_speed[True-True-True-True-True] 88.6910μs 33.5437μs 29.8119 KOps/s 30.3453 KOps/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[True-True-True-True-False] 43.9100μs 19.5849μs 51.0596 KOps/s 50.1738 KOps/s $\color{#35bf28}+1.77\%$
test_step_mdp_speed[True-True-True-False-True] 35.8900μs 18.6420μs 53.6424 KOps/s 52.5745 KOps/s $\color{#35bf28}+2.03\%$
test_step_mdp_speed[True-True-True-False-False] 42.6010μs 11.2057μs 89.2401 KOps/s 87.4749 KOps/s $\color{#35bf28}+2.02\%$
test_step_mdp_speed[True-True-False-True-True] 54.5510μs 34.3886μs 29.0794 KOps/s 28.3625 KOps/s $\color{#35bf28}+2.53\%$
test_step_mdp_speed[True-True-False-True-False] 42.0000μs 21.3755μs 46.7826 KOps/s 45.7955 KOps/s $\color{#35bf28}+2.16\%$
test_step_mdp_speed[True-True-False-False-True] 36.9910μs 20.2928μs 49.2786 KOps/s 47.9571 KOps/s $\color{#35bf28}+2.76\%$
test_step_mdp_speed[True-True-False-False-False] 37.6100μs 13.0823μs 76.4390 KOps/s 74.6943 KOps/s $\color{#35bf28}+2.34\%$
test_step_mdp_speed[True-False-True-True-True] 90.0610μs 36.2543μs 27.5829 KOps/s 26.8850 KOps/s $\color{#35bf28}+2.60\%$
test_step_mdp_speed[True-False-True-True-False] 42.2400μs 23.3749μs 42.7810 KOps/s 41.6122 KOps/s $\color{#35bf28}+2.81\%$
test_step_mdp_speed[True-False-True-False-True] 47.5500μs 20.4923μs 48.7989 KOps/s 47.8742 KOps/s $\color{#35bf28}+1.93\%$
test_step_mdp_speed[True-False-True-False-False] 30.6300μs 13.0634μs 76.5500 KOps/s 74.9468 KOps/s $\color{#35bf28}+2.14\%$
test_step_mdp_speed[True-False-False-True-True] 0.1067ms 37.4070μs 26.7330 KOps/s 25.9027 KOps/s $\color{#35bf28}+3.21\%$
test_step_mdp_speed[True-False-False-True-False] 91.7510μs 25.1786μs 39.7163 KOps/s 38.8560 KOps/s $\color{#35bf28}+2.21\%$
test_step_mdp_speed[True-False-False-False-True] 38.8610μs 22.1574μs 45.1316 KOps/s 44.6352 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-False-False-False] 49.2200μs 14.9155μs 67.0446 KOps/s 66.6733 KOps/s $\color{#35bf28}+0.56\%$
test_step_mdp_speed[False-True-True-True-True] 53.0900μs 36.3837μs 27.4849 KOps/s 27.2041 KOps/s $\color{#35bf28}+1.03\%$
test_step_mdp_speed[False-True-True-True-False] 66.1100μs 23.4607μs 42.6244 KOps/s 41.8236 KOps/s $\color{#35bf28}+1.91\%$
test_step_mdp_speed[False-True-True-False-True] 42.1310μs 24.2554μs 41.2279 KOps/s 40.6812 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[False-True-True-False-False] 40.0510μs 14.9457μs 66.9088 KOps/s 66.1627 KOps/s $\color{#35bf28}+1.13\%$
test_step_mdp_speed[False-True-False-True-True] 59.5210μs 37.8985μs 26.3863 KOps/s 25.7345 KOps/s $\color{#35bf28}+2.53\%$
test_step_mdp_speed[False-True-False-True-False] 41.1310μs 25.0729μs 39.8837 KOps/s 38.2875 KOps/s $\color{#35bf28}+4.17\%$
test_step_mdp_speed[False-True-False-False-True] 41.2710μs 25.7931μs 38.7700 KOps/s 37.7910 KOps/s $\color{#35bf28}+2.59\%$
test_step_mdp_speed[False-True-False-False-False] 33.3300μs 16.6132μs 60.1931 KOps/s 58.6200 KOps/s $\color{#35bf28}+2.68\%$
test_step_mdp_speed[False-False-True-True-True] 65.7110μs 40.2592μs 24.8390 KOps/s 24.7378 KOps/s $\color{#35bf28}+0.41\%$
test_step_mdp_speed[False-False-True-True-False] 53.7410μs 26.9172μs 37.1509 KOps/s 36.2873 KOps/s $\color{#35bf28}+2.38\%$
test_step_mdp_speed[False-False-True-False-True] 92.2810μs 25.8202μs 38.7294 KOps/s 37.9929 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[False-False-True-False-False] 33.8210μs 16.5543μs 60.4071 KOps/s 59.1055 KOps/s $\color{#35bf28}+2.20\%$
test_step_mdp_speed[False-False-False-True-True] 58.4100μs 40.6822μs 24.5808 KOps/s 23.6408 KOps/s $\color{#35bf28}+3.98\%$
test_step_mdp_speed[False-False-False-True-False] 51.0000μs 28.4764μs 35.1168 KOps/s 33.9722 KOps/s $\color{#35bf28}+3.37\%$
test_step_mdp_speed[False-False-False-False-True] 46.3910μs 27.3161μs 36.6085 KOps/s 35.3213 KOps/s $\color{#35bf28}+3.64\%$
test_step_mdp_speed[False-False-False-False-False] 34.2400μs 18.5225μs 53.9883 KOps/s 53.4061 KOps/s $\color{#35bf28}+1.09\%$
test_values[generalized_advantage_estimate-True-True] 22.9465ms 22.6116ms 44.2251 Ops/s 43.0435 Ops/s $\color{#35bf28}+2.75\%$
test_values[vec_generalized_advantage_estimate-True-True] 83.7874ms 3.2079ms 311.7277 Ops/s 304.1930 Ops/s $\color{#35bf28}+2.48\%$
test_values[td0_return_estimate-False-False] 86.2010μs 60.7794μs 16.4529 KOps/s 15.7910 KOps/s $\color{#35bf28}+4.19\%$
test_values[td1_return_estimate-False-False] 49.2283ms 48.6248ms 20.5656 Ops/s 19.9354 Ops/s $\color{#35bf28}+3.16\%$
test_values[vec_td1_return_estimate-False-False] 2.0336ms 1.7116ms 584.2618 Ops/s 579.7887 Ops/s $\color{#35bf28}+0.77\%$
test_values[td_lambda_return_estimate-True-False] 82.9479ms 81.2637ms 12.3056 Ops/s 12.4587 Ops/s $\color{#d91a1a}-1.23\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.0806ms 1.7208ms 581.1133 Ops/s 578.5999 Ops/s $\color{#35bf28}+0.43\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.9709ms 22.1209ms 45.2062 Ops/s 46.5883 Ops/s $\color{#d91a1a}-2.97\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8616ms 0.6521ms 1.5334 KOps/s 1.4902 KOps/s $\color{#35bf28}+2.90\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6793ms 0.6076ms 1.6458 KOps/s 1.6110 KOps/s $\color{#35bf28}+2.16\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5060ms 1.4184ms 705.0085 Ops/s 700.4333 Ops/s $\color{#35bf28}+0.65\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9378ms 0.6199ms 1.6131 KOps/s 1.5415 KOps/s $\color{#35bf28}+4.64\%$
test_dqn_speed 8.0564ms 1.4167ms 705.8470 Ops/s 710.0952 Ops/s $\color{#d91a1a}-0.60\%$
test_ddpg_speed 2.8392ms 2.6670ms 374.9578 Ops/s 371.4999 Ops/s $\color{#35bf28}+0.93\%$
test_sac_speed 8.4142ms 7.7995ms 128.2140 Ops/s 126.8708 Ops/s $\color{#35bf28}+1.06\%$
test_redq_speed 11.7780ms 10.4265ms 95.9091 Ops/s 94.9090 Ops/s $\color{#35bf28}+1.05\%$
test_redq_deprec_speed 11.7469ms 10.9656ms 91.1946 Ops/s 88.6271 Ops/s $\color{#35bf28}+2.90\%$
test_td3_speed 8.0616ms 7.7992ms 128.2178 Ops/s 128.2228 Ops/s $-0.00\%$
test_cql_speed 26.0683ms 25.2926ms 39.5372 Ops/s 39.1353 Ops/s $\color{#35bf28}+1.03\%$
test_a2c_speed 6.0259ms 5.6002ms 178.5643 Ops/s 181.6196 Ops/s $\color{#d91a1a}-1.68\%$
test_ppo_speed 6.2873ms 5.9761ms 167.3345 Ops/s 170.2161 Ops/s $\color{#d91a1a}-1.69\%$
test_reinforce_speed 5.4095ms 4.5331ms 220.6016 Ops/s 219.2982 Ops/s $\color{#35bf28}+0.59\%$
test_iql_speed 20.4522ms 19.6837ms 50.8035 Ops/s 50.3173 Ops/s $\color{#35bf28}+0.97\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1392ms 2.8778ms 347.4923 Ops/s 347.7342 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.2408ms 0.5462ms 1.8309 KOps/s 1.8050 KOps/s $\color{#35bf28}+1.43\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7325ms 0.5258ms 1.9017 KOps/s 1.8570 KOps/s $\color{#35bf28}+2.41\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.0943ms 2.8991ms 344.9310 Ops/s 342.8397 Ops/s $\color{#35bf28}+0.61\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.3471ms 0.5426ms 1.8428 KOps/s 1.8268 KOps/s $\color{#35bf28}+0.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8125ms 0.5237ms 1.9097 KOps/s 1.8965 KOps/s $\color{#35bf28}+0.69\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 3.9098ms 1.4612ms 684.3508 Ops/s 657.9487 Ops/s $\color{#35bf28}+4.01\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6256ms 1.4152ms 706.6100 Ops/s 687.4975 Ops/s $\color{#35bf28}+2.78\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1949ms 3.0009ms 333.2360 Ops/s 333.9926 Ops/s $\color{#d91a1a}-0.23\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9300ms 0.6681ms 1.4968 KOps/s 1.4744 KOps/s $\color{#35bf28}+1.52\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.4668ms 0.6539ms 1.5292 KOps/s 1.5445 KOps/s $\color{#d91a1a}-0.99\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.9668ms 2.8590ms 349.7754 Ops/s 346.3722 Ops/s $\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7978ms 0.5430ms 1.8415 KOps/s 1.7937 KOps/s $\color{#35bf28}+2.66\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.2654ms 0.5363ms 1.8645 KOps/s 1.8448 KOps/s $\color{#35bf28}+1.07\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1183ms 2.9096ms 343.6874 Ops/s 342.0067 Ops/s $\color{#35bf28}+0.49\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2687ms 0.5423ms 1.8441 KOps/s 1.8367 KOps/s $\color{#35bf28}+0.41\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6665ms 0.5254ms 1.9032 KOps/s 1.8860 KOps/s $\color{#35bf28}+0.91\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1426ms 3.0017ms 333.1396 Ops/s 330.9529 Ops/s $\color{#35bf28}+0.66\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8267ms 0.6713ms 1.4897 KOps/s 1.4702 KOps/s $\color{#35bf28}+1.33\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.6274ms 0.6588ms 1.5178 KOps/s 1.5137 KOps/s $\color{#35bf28}+0.27\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1148s 8.9723ms 111.4541 Ops/s 114.9524 Ops/s $\color{#d91a1a}-3.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 16.3224ms 14.0385ms 71.2328 Ops/s 69.6984 Ops/s $\color{#35bf28}+2.20\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1576ms 1.0645ms 939.4067 Ops/s 880.1653 Ops/s $\textbf{\color{#35bf28}+6.73\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1117s 8.8931ms 112.4473 Ops/s 148.3898 Ops/s $\textbf{\color{#d91a1a}-24.22\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 16.2913ms 14.0095ms 71.3802 Ops/s 70.0752 Ops/s $\color{#35bf28}+1.86\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.3436ms 1.1956ms 836.3700 Ops/s 869.4117 Ops/s $\color{#d91a1a}-3.80\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1096s 7.1626ms 139.6148 Ops/s 107.1246 Ops/s $\textbf{\color{#35bf28}+30.33\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 16.6940ms 14.4449ms 69.2284 Ops/s 68.7534 Ops/s $\color{#35bf28}+0.69\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.8783ms 1.6527ms 605.0576 Ops/s 693.1828 Ops/s $\textbf{\color{#d91a1a}-12.71\%}$

@vmoens vmoens added bug Something isn't working enhancement New feature or request labels Mar 6, 2024
@vmoens
Copy link
Collaborator Author

vmoens commented Mar 7, 2024

@albertbou92 I also took this opportunity to refactor preemption. Now you can stack (with padding) or cat (with masking) the values coming from the collectors.
I noticed we preempt based on the proportion of collectors that have reached the end, not on the proportion of frames collected. IMO that would be more sensible.
We could use a mp.Value() shared between processes that tracks how many frames have been collected, and as soon as this goes above the threshold we break. Would you like to give it a shot once this PR lands? No worry if you don't have the time!
I suspect it can bring some decent speed-up over what we're doing.

In a nutshell:

# in SyncDataCollector.rollout
def rollout(self):
    ...
    if self.interruptor is not None:
        self._collected_frames += frame_count

# in MultiSyncDataCollector.iterator
def iterator(self):
    if self.preemptive_threshold and ...:
        if self._collected_frames > self.frames_per_batch * self.preemptive_threshold:
            break # or similar

@albertbou92
Copy link
Contributor

albertbou92 commented Mar 8, 2024

@albertbou92 I also took this opportunity to refactor preemption. Now you can stack (with padding) or cat (with masking) the values coming from the collectors. I noticed we preempt based on the proportion of collectors that have reached the end, not on the proportion of frames collected. IMO that would be more sensible. We could use a mp.Value() shared between processes that tracks how many frames have been collected, and as soon as this goes above the threshold we break. Would you like to give it a shot once this PR lands? No worry if you don't have the time! I suspect it can bring some decent speed-up over what we're doing.

In a nutshell:

# in SyncDataCollector.rollout
def rollout(self):
    ...
    if self.interruptor is not None:
        self._collected_frames += frame_count

# in MultiSyncDataCollector.iterator
def iterator(self):
    if self.preemptive_threshold and ...:
        if self._collected_frames > self.frames_per_batch * self.preemptive_threshold:
            break # or similar

Yes the preemption was based on the idea that all workers collect a fixed number of frames and can not communicate during collection (it was also the assumption in DDPPO), but if we can have a shared global value that tracks how many frames have been collected globally and that does not impact speed that would be much better. Yes, I can give it shot!

@vmoens vmoens marked this pull request as ready for review March 18, 2024 14:05
@vmoens
Copy link
Collaborator Author

vmoens commented Mar 19, 2024

@matteobettini This PR addresses an issue that you pointed a long time ago RE the stacking of results in MultiSync
Do you think it solves it in an adequate way?

The idea is to build the collector with cat_results="stack" if you want to stack or cat_results=-1 if you want to cat along the time dimension. cat_result=None results in cat_results=0 + warning since this is the previous behaviour but we want to discourage it.
We recommend "stack" > -1 > 0

Sorry it took so long to solve this!

@matteobettini
Copy link
Contributor

@matteobettini This PR addresses an issue that you pointed a long time ago RE the stacking of results in MultiSync Do you think it solves it in an adequate way?

The idea is to build the collector with cat_results="stack" if you want to stack or cat_results=-1 if you want to cat along the time dimension. cat_result=None results in cat_results=0 + warning since this is the previous behaviour but we want to discourage it. We recommend "stack" > -1 > 0

Sorry it took so long to solve this!

Amamzing!

So stack_results=True now is exactly like a single collector with the number of collectors B preappended to the batch size which feels so smooth!

This is what i wanted yes!

+--------------------+---------------------+-------------+---------------+------------------------------+
| Single env | [T] | `[B, T]` | `[B*(T//B)` | [T] |
+--------------------+---------------------+-------------+---------------+------------------------------+
| Batched env (n=P) | [P, T] | `[B, P, T]` | `[B * P, T]` | [P, T] |
Copy link
Contributor

@matteobettini matteobettini Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[B * P, T], this is given that cat_results=0

my personal preference, to avoid having cat_results be both int and str is to have 2 args

  • cat_results true or false
  • collectors_dim dimension where to cat or stack (or similar nicer name)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then what is collector_dim=0 and cat_results=False? That should not be allowed. So we need a complicated doc that lists the available configs
If we allow them we also need to test every single combination...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the goal is to get rid of this arg in v0.6 or v0.7 so adding 2 args is twice as bad as adding 1 given that we want to break things on the long run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collector_dim=0 and cat_results=False

isn't this what happens currently when cat_results="stack"?

Anyway yes how you prefer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry, I meant collector_dim=-1 or 1 or anything that isn't 0

you can't stack along any dim (and should not be able to do so), that's what I mean. Stack should be along dim 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't super see this as in the example table we are referring to, in the row for stacking in multisync i can see both

  • B, P, T (collector_dim=0 or -3, cat_results=False) (currently the only available option)
  • P, B, T (collector_dim=1 or -2, cat_results=False)
  • P, T, B (collector_dim=2 or -1, cat_results=False)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no cat_results=False, it can only be a string ("stack") or an int
cat_results="stack" => stack B collectors along 0 => [B, P, T]
cat_results=0 => cat B collectors along 0 => [B * P, T]
cat_results=-1 => cat B collectors along -1 => [B, P * T]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I didn't get what you were trying to say though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, that is the point of this thread, I am discussing why for me it makes sense to separate them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P, B, T (collector_dim=1 or -2, cat_results=False)
P, T, B (collector_dim=2 or -1, cat_results=False)

I don't think we should support these. If you want that you can transpose your resulting tensordict. Implementing it would require a lot of expensive tests for a feature we will deprecate in 2 releases.

Vincent Moens added 2 commits March 19, 2024 16:29
Vincent Moens and others added 7 commits March 19, 2024 16:32
@vmoens vmoens merged commit f6fbc44 into main Mar 19, 2024
@vmoens vmoens deleted the collector-shapes branch March 19, 2024 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants