Skip to content

[LLM] Add Countdown numbers-game environment#3571

Closed
vmoens wants to merge 1 commit intogh/vmoens/252/basefrom
gh/vmoens/252/head
Closed

[LLM] Add Countdown numbers-game environment#3571
vmoens wants to merge 1 commit intogh/vmoens/252/basefrom
gh/vmoens/252/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 26, 2026

Stack from ghstack (oldest at bottom):

Add CountdownEnv and CountdownRewardParser for the Countdown numbers
game, a popular lightweight problem for GRPO training.

Key features:

  • Procedural problem generation (no external dataset required)
  • Validates arithmetic expressions: correct operators, each source
    number used at most once, evaluates to target
  • Same 0/0.1/1.0 reward convention as GSM8K and MATH parsers
  • Configurable problem difficulty (num_count, max_number, max_target)
  • Includes unit tests and documentation

Made-with: Cursor
Pull-Request: #3545

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 26, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3571

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ PR Title Label Error

Unknown or invalid prefix [LLM].

Current title: [LLM] Add Countdown numbers-game environment

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function
[Quality] Quality [Quality] Fix typos and add codespell

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ PR Title Label Error

Unknown or invalid prefix [LLM].

Current title: [LLM] Add Countdown numbers-game environment

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function
[Quality] Quality [Quality] Fix typos and add codespell

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@vmoens vmoens closed this Mar 26, 2026
@vmoens vmoens deleted the gh/vmoens/252/head branch March 26, 2026 16:55
vmoens added a commit that referenced this pull request Mar 26, 2026
Add CountdownEnv and CountdownRewardParser for the Countdown numbers
game, a popular lightweight problem for GRPO training.

Key features:
- Procedural problem generation (no external dataset required)
- Validates arithmetic expressions: correct operators, each source
  number used at most once, evaluates to target
- Same 0/0.1/1.0 reward convention as GSM8K and MATH parsers
- Configurable problem difficulty (num_count, max_number, max_target)
- Includes unit tests and documentation

Made-with: Cursor
ghstack-source-id: 1cc93cb
Pull-Request: #3545

ghstack-source-id: 1cc93cb
Pull Request resolved: #3571
@github-actions
Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 83.6562μs 82.8049μs 12.0766 KOps/s 11.8322 KOps/s $\color{#35bf28}+2.07\%$
test_tensor_to_bytestream_speed[torch.save] 0.1438ms 0.1428ms 7.0039 KOps/s 6.9985 KOps/s $\color{#35bf28}+0.08\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1240s 0.1230s 8.1274 Ops/s 8.0545 Ops/s $\color{#35bf28}+0.90\%$
test_tensor_to_bytestream_speed[numpy] 2.5665μs 2.5587μs 390.8248 KOps/s 376.1038 KOps/s $\color{#35bf28}+3.91\%$
test_tensor_to_bytestream_speed[safetensors] 38.6829μs 38.6447μs 25.8767 KOps/s 25.7214 KOps/s $\color{#35bf28}+0.60\%$
test_simple 0.7949s 0.7932s 1.2607 Ops/s 1.2223 Ops/s $\color{#35bf28}+3.14\%$
test_transformed 1.3949s 1.3930s 0.7179 Ops/s 0.7057 Ops/s $\color{#35bf28}+1.73\%$
test_serial 2.3533s 2.3479s 0.4259 Ops/s 0.4224 Ops/s $\color{#35bf28}+0.84\%$
test_parallel 1.9183s 1.8735s 0.5338 Ops/s 0.5430 Ops/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[True-True-True-True-True] 0.3225ms 41.1695μs 24.2898 KOps/s 23.7215 KOps/s $\color{#35bf28}+2.40\%$
test_step_mdp_speed[True-True-True-True-False] 0.4357ms 23.3452μs 42.8353 KOps/s 43.2215 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-True-True-False-True] 0.4437ms 24.0829μs 41.5233 KOps/s 42.2287 KOps/s $\color{#d91a1a}-1.67\%$
test_step_mdp_speed[True-True-True-False-False] 42.3110μs 12.9362μs 77.3026 KOps/s 76.6090 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[True-True-False-True-True] 0.4723ms 44.9461μs 22.2489 KOps/s 22.4023 KOps/s $\color{#d91a1a}-0.68\%$
test_step_mdp_speed[True-True-False-True-False] 0.4729ms 25.8127μs 38.7407 KOps/s 39.3359 KOps/s $\color{#d91a1a}-1.51\%$
test_step_mdp_speed[True-True-False-False-True] 60.9810μs 26.6031μs 37.5896 KOps/s 38.1703 KOps/s $\color{#d91a1a}-1.52\%$
test_step_mdp_speed[True-True-False-False-False] 0.4636ms 15.5426μs 64.3395 KOps/s 63.9217 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[True-False-True-True-True] 0.4761ms 48.0181μs 20.8255 KOps/s 21.1779 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[True-False-True-True-False] 78.1120μs 28.4278μs 35.1768 KOps/s 35.2038 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[True-False-True-False-True] 59.5610μs 26.0575μs 38.3767 KOps/s 37.0109 KOps/s $\color{#35bf28}+3.69\%$
test_step_mdp_speed[True-False-True-False-False] 0.4372ms 15.7731μs 63.3989 KOps/s 63.5602 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[True-False-False-True-True] 0.4725ms 50.7006μs 19.7236 KOps/s 20.1613 KOps/s $\color{#d91a1a}-2.17\%$
test_step_mdp_speed[True-False-False-True-False] 63.5220μs 30.7129μs 32.5597 KOps/s 32.8159 KOps/s $\color{#d91a1a}-0.78\%$
test_step_mdp_speed[True-False-False-False-True] 0.4687ms 28.4403μs 35.1614 KOps/s 34.4016 KOps/s $\color{#35bf28}+2.21\%$
test_step_mdp_speed[True-False-False-False-False] 0.4501ms 18.0264μs 55.4741 KOps/s 54.9423 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[False-True-True-True-True] 94.7620μs 46.9038μs 21.3202 KOps/s 20.9879 KOps/s $\color{#35bf28}+1.58\%$
test_step_mdp_speed[False-True-True-True-False] 0.4562ms 27.8239μs 35.9404 KOps/s 35.3740 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[False-True-True-False-True] 2.4258ms 30.3070μs 32.9957 KOps/s 33.1373 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[False-True-True-False-False] 0.4388ms 17.0604μs 58.6151 KOps/s 57.8886 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[False-True-False-True-True] 91.9820μs 49.9223μs 20.0311 KOps/s 19.7052 KOps/s $\color{#35bf28}+1.65\%$
test_step_mdp_speed[False-True-False-True-False] 0.4502ms 30.8202μs 32.4462 KOps/s 32.2292 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-True-False-False-True] 0.4616ms 32.9594μs 30.3403 KOps/s 31.3904 KOps/s $\color{#d91a1a}-3.35\%$
test_step_mdp_speed[False-True-False-False-False] 0.4381ms 19.6921μs 50.7819 KOps/s 51.2957 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[False-False-True-True-True] 88.5120μs 53.3137μs 18.7569 KOps/s 18.9863 KOps/s $\color{#d91a1a}-1.21\%$
test_step_mdp_speed[False-False-True-True-False] 0.4631ms 34.1396μs 29.2915 KOps/s 30.1314 KOps/s $\color{#d91a1a}-2.79\%$
test_step_mdp_speed[False-False-True-False-True] 0.4650ms 32.8559μs 30.4359 KOps/s 30.5092 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-False-True-False-False] 0.4517ms 19.8526μs 50.3713 KOps/s 50.7474 KOps/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[False-False-False-True-True] 97.5720μs 55.1260μs 18.1403 KOps/s 18.1238 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-False-False-True-False] 0.4670ms 36.9278μs 27.0799 KOps/s 27.8148 KOps/s $\color{#d91a1a}-2.64\%$
test_step_mdp_speed[False-False-False-False-True] 0.4672ms 34.3894μs 29.0787 KOps/s 29.2528 KOps/s $\color{#d91a1a}-0.60\%$
test_step_mdp_speed[False-False-False-False-False] 0.4493ms 22.3141μs 44.8148 KOps/s 45.5645 KOps/s $\color{#d91a1a}-1.65\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7388s 0.7336s 1.3632 Ops/s 1.3196 Ops/s $\color{#35bf28}+3.31\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7168s 0.6146s 1.6271 Ops/s 1.6214 Ops/s $\color{#35bf28}+0.36\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7487s 1.6591s 0.6027 Ops/s 0.5986 Ops/s $\color{#35bf28}+0.69\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5214s 1.4409s 0.6940 Ops/s 0.6871 Ops/s $\color{#35bf28}+1.00\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9995s 1.9242s 0.5197 Ops/s 0.5159 Ops/s $\color{#35bf28}+0.74\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7819s 1.7050s 0.5865 Ops/s 0.5878 Ops/s $\color{#d91a1a}-0.23\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6760s 4.6085s 0.2170 Ops/s 0.2152 Ops/s $\color{#35bf28}+0.82\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5613s 4.4379s 0.2253 Ops/s 0.2241 Ops/s $\color{#35bf28}+0.54\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9688s 1.8884s 0.5295 Ops/s 0.5199 Ops/s $\color{#35bf28}+1.85\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7054s 1.6008s 0.6247 Ops/s 0.6125 Ops/s $\color{#35bf28}+1.99\%$
test_values[generalized_advantage_estimate-True-True] 21.8538ms 21.3973ms 46.7350 Ops/s 45.6840 Ops/s $\color{#35bf28}+2.30\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1539s 4.0058ms 249.6355 Ops/s 280.6252 Ops/s $\textbf{\color{#d91a1a}-11.04\%}$
test_values[td0_return_estimate-False-False] 0.1101ms 86.0552μs 11.6205 KOps/s 11.6192 KOps/s $\color{#35bf28}+0.01\%$
test_values[td1_return_estimate-False-False] 51.0513ms 50.6614ms 19.7389 Ops/s 19.7111 Ops/s $\color{#35bf28}+0.14\%$
test_values[vec_td1_return_estimate-False-False] 1.3544ms 1.0996ms 909.4151 Ops/s 901.3318 Ops/s $\color{#35bf28}+0.90\%$
test_values[td_lambda_return_estimate-True-False] 83.2278ms 82.5523ms 12.1135 Ops/s 12.0944 Ops/s $\color{#35bf28}+0.16\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3168ms 1.0990ms 909.8961 Ops/s 911.0187 Ops/s $\color{#d91a1a}-0.12\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 23.2524ms 23.0240ms 43.4329 Ops/s 45.6705 Ops/s $\color{#d91a1a}-4.90\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0997ms 0.8124ms 1.2309 KOps/s 1.2909 KOps/s $\color{#d91a1a}-4.65\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7385ms 0.6874ms 1.4548 KOps/s 1.4506 KOps/s $\color{#35bf28}+0.29\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5829ms 1.5115ms 661.5840 Ops/s 663.8857 Ops/s $\color{#d91a1a}-0.35\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7669ms 0.7106ms 1.4073 KOps/s 1.4189 KOps/s $\color{#d91a1a}-0.81\%$
test_dqn_speed[False-None] 1.7250ms 1.6100ms 621.1069 Ops/s 617.7673 Ops/s $\color{#35bf28}+0.54\%$
test_dqn_speed[False-backward] 2.7146ms 2.3347ms 428.3167 Ops/s 437.0294 Ops/s $\color{#d91a1a}-1.99\%$
test_dqn_speed[True-None] 1.0990ms 0.5983ms 1.6714 KOps/s 1.6855 KOps/s $\color{#d91a1a}-0.84\%$
test_dqn_speed[True-backward] 1.3196ms 1.2411ms 805.7197 Ops/s 795.4935 Ops/s $\color{#35bf28}+1.29\%$
test_dqn_speed[reduce-overhead-None] 0.7305ms 0.6247ms 1.6007 KOps/s 1.6101 KOps/s $\color{#d91a1a}-0.59\%$
test_ddpg_speed[False-None] 3.4929ms 3.0762ms 325.0736 Ops/s 328.2684 Ops/s $\color{#d91a1a}-0.97\%$
test_ddpg_speed[False-backward] 4.8267ms 4.5064ms 221.9077 Ops/s 222.3119 Ops/s $\color{#d91a1a}-0.18\%$
test_ddpg_speed[True-None] 1.4502ms 1.3625ms 733.9239 Ops/s 720.3126 Ops/s $\color{#35bf28}+1.89\%$
test_ddpg_speed[True-backward] 2.7011ms 2.5417ms 393.4302 Ops/s 394.2024 Ops/s $\color{#d91a1a}-0.20\%$
test_ddpg_speed[reduce-overhead-None] 1.5355ms 1.3957ms 716.5027 Ops/s 718.7049 Ops/s $\color{#d91a1a}-0.31\%$
test_sac_speed[False-None] 9.1294ms 8.7153ms 114.7414 Ops/s 115.8428 Ops/s $\color{#d91a1a}-0.95\%$
test_sac_speed[False-backward] 12.8729ms 12.0199ms 83.1952 Ops/s 83.2983 Ops/s $\color{#d91a1a}-0.12\%$
test_sac_speed[True-None] 2.1871ms 1.8784ms 532.3665 Ops/s 534.2507 Ops/s $\color{#d91a1a}-0.35\%$
test_sac_speed[True-backward] 3.7552ms 3.6637ms 272.9504 Ops/s 272.3681 Ops/s $\color{#35bf28}+0.21\%$
test_sac_speed[reduce-overhead-None] 17.0128ms 10.3458ms 96.6576 Ops/s 82.4685 Ops/s $\textbf{\color{#35bf28}+17.21\%}$
test_redq_deprec_speed[False-None] 10.6678ms 9.7145ms 102.9387 Ops/s 101.7628 Ops/s $\color{#35bf28}+1.16\%$
test_redq_deprec_speed[False-backward] 14.5175ms 13.1799ms 75.8731 Ops/s 75.3662 Ops/s $\color{#35bf28}+0.67\%$
test_redq_deprec_speed[True-None] 2.6815ms 2.5890ms 386.2498 Ops/s 377.4202 Ops/s $\color{#35bf28}+2.34\%$
test_redq_deprec_speed[True-backward] 4.8543ms 4.3111ms 231.9586 Ops/s 240.8377 Ops/s $\color{#d91a1a}-3.69\%$
test_redq_deprec_speed[reduce-overhead-None] 14.9076ms 9.7026ms 103.0648 Ops/s 102.6180 Ops/s $\color{#35bf28}+0.44\%$
test_td3_speed[False-None] 8.8783ms 8.5946ms 116.3524 Ops/s 117.2944 Ops/s $\color{#d91a1a}-0.80\%$
test_td3_speed[False-backward] 11.7005ms 11.2907ms 88.5688 Ops/s 90.6503 Ops/s $\color{#d91a1a}-2.30\%$
test_td3_speed[True-None] 1.6584ms 1.6358ms 611.3119 Ops/s 610.1236 Ops/s $\color{#35bf28}+0.19\%$
test_td3_speed[True-backward] 3.2095ms 3.1744ms 315.0160 Ops/s 323.5701 Ops/s $\color{#d91a1a}-2.64\%$
test_td3_speed[reduce-overhead-None] 50.3455ms 25.9361ms 38.5562 Ops/s 38.7469 Ops/s $\color{#d91a1a}-0.49\%$
test_cql_speed[False-None] 18.5019ms 18.0353ms 55.4469 Ops/s 55.5054 Ops/s $\color{#d91a1a}-0.11\%$
test_cql_speed[False-backward] 24.2648ms 23.8285ms 41.9666 Ops/s 42.6045 Ops/s $\color{#d91a1a}-1.50\%$
test_cql_speed[True-None] 3.4103ms 3.3216ms 301.0557 Ops/s 303.1724 Ops/s $\color{#d91a1a}-0.70\%$
test_cql_speed[True-backward] 6.0370ms 5.6085ms 178.2994 Ops/s 181.0487 Ops/s $\color{#d91a1a}-1.52\%$
test_cql_speed[reduce-overhead-None] 19.0677ms 12.0912ms 82.7045 Ops/s 83.0378 Ops/s $\color{#d91a1a}-0.40\%$
test_a2c_speed[False-None] 3.6549ms 3.4504ms 289.8177 Ops/s 292.1392 Ops/s $\color{#d91a1a}-0.79\%$
test_a2c_speed[False-backward] 8.0116ms 6.7327ms 148.5285 Ops/s 152.2775 Ops/s $\color{#d91a1a}-2.46\%$
test_a2c_speed[True-None] 1.4651ms 1.4031ms 712.7120 Ops/s 720.6560 Ops/s $\color{#d91a1a}-1.10\%$
test_a2c_speed[True-backward] 3.5765ms 3.1676ms 315.6918 Ops/s 317.1351 Ops/s $\color{#d91a1a}-0.46\%$
test_a2c_speed[reduce-overhead-None] 1.0933ms 1.0392ms 962.2685 Ops/s 944.6868 Ops/s $\color{#35bf28}+1.86\%$
test_ppo_speed[False-None] 4.2032ms 4.1045ms 243.6334 Ops/s 242.0179 Ops/s $\color{#35bf28}+0.67\%$
test_ppo_speed[False-backward] 7.9778ms 7.6032ms 131.5233 Ops/s 132.6254 Ops/s $\color{#d91a1a}-0.83\%$
test_ppo_speed[True-None] 1.6126ms 1.5290ms 654.0317 Ops/s 656.3433 Ops/s $\color{#d91a1a}-0.35\%$
test_ppo_speed[True-backward] 3.4291ms 3.3748ms 296.3105 Ops/s 293.7646 Ops/s $\color{#35bf28}+0.87\%$
test_ppo_speed[reduce-overhead-None] 1.3009ms 1.1087ms 901.9679 Ops/s 888.8316 Ops/s $\color{#35bf28}+1.48\%$
test_reinforce_speed[False-None] 2.6562ms 2.4538ms 407.5301 Ops/s 403.7609 Ops/s $\color{#35bf28}+0.93\%$
test_reinforce_speed[False-backward] 4.0337ms 3.6442ms 274.4054 Ops/s 273.1000 Ops/s $\color{#35bf28}+0.48\%$
test_reinforce_speed[True-None] 1.4841ms 1.3556ms 737.7071 Ops/s 730.7799 Ops/s $\color{#35bf28}+0.95\%$
test_reinforce_speed[True-backward] 3.3418ms 3.2149ms 311.0475 Ops/s 308.8894 Ops/s $\color{#35bf28}+0.70\%$
test_reinforce_speed[reduce-overhead-None] 16.1019ms 8.9166ms 112.1499 Ops/s 111.3198 Ops/s $\color{#35bf28}+0.75\%$
test_iql_speed[False-None] 10.4379ms 10.0031ms 99.9686 Ops/s 100.5270 Ops/s $\color{#d91a1a}-0.56\%$
test_iql_speed[False-backward] 14.8362ms 14.1186ms 70.8287 Ops/s 71.0008 Ops/s $\color{#d91a1a}-0.24\%$
test_iql_speed[True-None] 2.4142ms 2.2831ms 438.0008 Ops/s 435.5284 Ops/s $\color{#35bf28}+0.57\%$
test_iql_speed[True-backward] 5.3709ms 4.9963ms 200.1465 Ops/s 205.2995 Ops/s $\color{#d91a1a}-2.51\%$
test_iql_speed[reduce-overhead-None] 16.7861ms 10.2475ms 97.5849 Ops/s 98.6757 Ops/s $\color{#d91a1a}-1.11\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.4138ms 6.0299ms 165.8397 Ops/s 166.1867 Ops/s $\color{#d91a1a}-0.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8274ms 0.3017ms 3.3145 KOps/s 2.9706 KOps/s $\textbf{\color{#35bf28}+11.58\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5590ms 0.3043ms 3.2857 KOps/s 3.2502 KOps/s $\color{#35bf28}+1.09\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1577ms 5.8568ms 170.7420 Ops/s 171.5756 Ops/s $\color{#d91a1a}-0.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7730ms 0.2882ms 3.4694 KOps/s 2.9736 KOps/s $\textbf{\color{#35bf28}+16.67\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5952ms 0.2938ms 3.4039 KOps/s 3.2124 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5070ms 1.2868ms 777.1262 Ops/s 645.8797 Ops/s $\textbf{\color{#35bf28}+20.32\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4282ms 1.2008ms 832.7599 Ops/s 678.4482 Ops/s $\textbf{\color{#35bf28}+22.74\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.5359ms 6.1467ms 162.6878 Ops/s 166.3872 Ops/s $\color{#d91a1a}-2.22\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0480ms 0.4837ms 2.0675 KOps/s 1.9761 KOps/s $\color{#35bf28}+4.62\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7881ms 0.4831ms 2.0700 KOps/s 2.3494 KOps/s $\textbf{\color{#d91a1a}-11.89\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9768ms 5.8979ms 169.5518 Ops/s 169.6675 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9097ms 0.3144ms 3.1809 KOps/s 3.3942 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5785ms 0.3304ms 3.0269 KOps/s 3.6774 KOps/s $\textbf{\color{#d91a1a}-17.69\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0756ms 5.8100ms 172.1170 Ops/s 173.0460 Ops/s $\color{#d91a1a}-0.54\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0512ms 0.3521ms 2.8400 KOps/s 2.7850 KOps/s $\color{#35bf28}+1.97\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5424ms 0.3295ms 3.0353 KOps/s 2.9320 KOps/s $\color{#35bf28}+3.53\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.1911ms 6.0253ms 165.9672 Ops/s 167.4008 Ops/s $\color{#d91a1a}-0.86\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2332ms 0.4899ms 2.0413 KOps/s 2.0592 KOps/s $\color{#d91a1a}-0.87\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6147ms 0.4216ms 2.3718 KOps/s 2.2309 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.9634s 24.2849ms 41.1779 Ops/s 35.3321 Ops/s $\textbf{\color{#35bf28}+16.55\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 11.6245ms 2.0938ms 477.5980 Ops/s 536.3769 Ops/s $\textbf{\color{#d91a1a}-10.96\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1246ms 0.9774ms 1.0231 KOps/s 1.0246 KOps/s $\color{#d91a1a}-0.15\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 10.0310ms 5.1411ms 194.5101 Ops/s 192.2135 Ops/s $\color{#35bf28}+1.19\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9080ms 1.8240ms 548.2481 Ops/s 488.8971 Ops/s $\textbf{\color{#35bf28}+12.14\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1424ms 0.9761ms 1.0244 KOps/s 1.0204 KOps/s $\color{#35bf28}+0.40\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 9.2799ms 5.3137ms 188.1932 Ops/s 186.4900 Ops/s $\color{#35bf28}+0.91\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.4161ms 2.1686ms 461.1345 Ops/s 497.1771 Ops/s $\textbf{\color{#d91a1a}-7.25\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 10.4125ms 1.3399ms 746.3201 Ops/s 820.6258 Ops/s $\textbf{\color{#d91a1a}-9.05\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.2279ms 39.6608ms 25.2138 Ops/s 25.2843 Ops/s $\color{#d91a1a}-0.28\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.0365ms 18.6050ms 53.7491 Ops/s 53.9001 Ops/s $\color{#d91a1a}-0.28\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 43.6685ms 40.2521ms 24.8434 Ops/s 24.6274 Ops/s $\color{#35bf28}+0.88\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.9495ms 18.5922ms 53.7860 Ops/s 52.5462 Ops/s $\color{#35bf28}+2.36\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.5241ms 42.1466ms 23.7267 Ops/s 23.4191 Ops/s $\color{#35bf28}+1.31\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.2175ms 20.1079ms 49.7318 Ops/s 48.6039 Ops/s $\color{#35bf28}+2.32\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8806ms 0.2313ms 4.3228 KOps/s 4.2623 KOps/s $\color{#35bf28}+1.42\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6853ms 1.3650ms 732.5795 Ops/s 701.4381 Ops/s $\color{#35bf28}+4.44\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7129ms 2.2725ms 440.0453 Ops/s 433.1233 Ops/s $\color{#35bf28}+1.60\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.0397ms 2.9114ms 343.4803 Ops/s 336.7704 Ops/s $\color{#35bf28}+1.99\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5495ms 0.1663ms 6.0116 KOps/s 5.9930 KOps/s $\color{#35bf28}+0.31\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.4030ms 0.2385ms 4.1922 KOps/s 4.1464 KOps/s $\color{#35bf28}+1.11\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9865ms 1.8487ms 540.9118 Ops/s 586.4496 Ops/s $\textbf{\color{#d91a1a}-7.77\%}$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5401ms 1.3720ms 728.8592 Ops/s 736.9873 Ops/s $\color{#d91a1a}-1.10\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3310ms 1.1570ms 864.2739 Ops/s 858.3650 Ops/s $\color{#35bf28}+0.69\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.9501ms 3.5892ms 278.6173 Ops/s 273.4177 Ops/s $\color{#35bf28}+1.90\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.2145ms 5.8618ms 170.5962 Ops/s 171.8305 Ops/s $\color{#d91a1a}-0.72\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4371ms 7.0455ms 141.9340 Ops/s 140.1679 Ops/s $\color{#35bf28}+1.26\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4422ms 0.2787ms 3.5877 KOps/s 3.4620 KOps/s $\color{#35bf28}+3.63\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6000ms 1.4585ms 685.6344 Ops/s 650.5613 Ops/s $\textbf{\color{#35bf28}+5.39\%}$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8308ms 2.4007ms 416.5414 Ops/s 408.8597 Ops/s $\color{#35bf28}+1.88\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4449ms 3.1296ms 319.5323 Ops/s 313.4996 Ops/s $\color{#35bf28}+1.92\%$
test_collector_without_rb[100-img_shape0-atari] 34.7663ms 33.4464ms 29.8986 Ops/s 29.1393 Ops/s $\color{#35bf28}+2.61\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.0463ms 65.7630ms 15.2061 Ops/s 14.7296 Ops/s $\color{#35bf28}+3.23\%$
test_collector_with_rb[100-img_shape0-atari] 38.7329ms 38.0663ms 26.2700 Ops/s 25.7327 Ops/s $\color{#35bf28}+2.09\%$
test_collector_with_rb[200-img_shape1-large_batch] 75.2361ms 74.5165ms 13.4198 Ops/s 13.2452 Ops/s $\color{#35bf28}+1.32\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 56.3901ms 56.2898ms 17.7652 Ops/s 17.1379 Ops/s $\color{#35bf28}+3.66\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1161s 0.1127s 8.8741 Ops/s 8.6789 Ops/s $\color{#35bf28}+2.25\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 58.7277ms 58.3260ms 17.1450 Ops/s 16.4423 Ops/s $\color{#35bf28}+4.27\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1160s 0.1156s 8.6540 Ops/s 8.3931 Ops/s $\color{#35bf28}+3.11\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Documentation Improvements or additions to documentation llm/ LLM-related PR, triggers LLM CI tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant