Support dp_size in replay buffer #93

DNXie · 2025-08-28T21:15:37Z

Added a dp_size dimension in replay buffer sampling to enable data parallel.

Updated GRPO/main.py accordingly

Test:

pytest tests/unit_tests/test_replay_buffer.py

 7 passed, 7 warnings in 23.35s

python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml

All services initialized successfully!
Starting GRPO training loops...
Generated 10 rollouts w/ average reward 0.275
Completed 10 training steps
Latest loss: 0.01575610041618347
Generated 20 rollouts w/ average reward 0.05
Completed 20 training steps
Latest loss: 0.0045821815729141235

pbontrager

Looks good, just need to fix one thing to not return sorted samples

pbontrager · 2025-09-10T17:05:27Z

src/forge/actors/replay_buffer.py

We don't want to return a sorted sample here as that reduces variability in the sample. You need to get the index of the sorted array and then probably do this as a nested for loop to be easier to read.

batch = [] for rank in self.dp_size: local_batch = [] for i in bsz: e = sampled_episodes[sort_order[rank*i]] local_batch.append(e) batch.append(local_batch)

Thanks for pointing out this issue. I have updated this part. Please review.

pbontrager

This looks good now. There's a few more small things to fix but I'll pre-approve it

tests/unit_tests/test_replay_buffer.py

pbontrager · 2025-09-10T18:18:24Z

src/forge/actors/replay_buffer.py

This is cleaner but is it moving the data twice? It's probably fine.

I have updated the code to make it more efficient.

apps/grpo/main.py

add dp size in replay buffer

494654b

DNXie requested review from joecummings and pbontrager August 28, 2025 21:15

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 28, 2025

DNXie and others added 7 commits August 28, 2025 14:15

update metric name

88f0672

fix test

4647052

fix replay buffer tests

f55b4a4

fix inconsistencies

98649c1

Merge branch 'main' into replay_buffer_dp_size

0b008ac

update config.

4a64f9a

fix lint

3cb5d23

pbontrager reviewed Sep 10, 2025

View reviewed changes

DNXie added 2 commits September 10, 2025 10:20

updated sampling logic to not return sorted samples

1371400

fix lint

9357cab

DNXie requested a review from pbontrager September 10, 2025 17:53

pbontrager approved these changes Sep 10, 2025

View reviewed changes

DNXie added 3 commits September 10, 2025 11:39

make sampling more efficient

16a1b97

add test case

7605132

add dprank to trainer

2a4a7ac

DNXie merged commit 029b1bc into meta-pytorch:main Sep 10, 2025
5 checks passed

DNXie deleted the replay_buffer_dp_size branch September 10, 2025 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support dp_size in replay buffer #93

Support dp_size in replay buffer #93

Uh oh!

DNXie commented Aug 28, 2025 •

edited

Loading

Uh oh!

pbontrager left a comment

Uh oh!

pbontrager Sep 10, 2025

Uh oh!

DNXie Sep 10, 2025

Uh oh!

pbontrager left a comment

Uh oh!

Uh oh!

Uh oh!

pbontrager Sep 10, 2025

Uh oh!

DNXie Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support dp_size in replay buffer #93

Support dp_size in replay buffer #93

Uh oh!

Conversation

DNXie commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

pbontrager Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

DNXie Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pbontrager Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

DNXie Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DNXie commented Aug 28, 2025 •

edited

Loading