GRPO Titan RL Trainer #161

joecummings · 2025-09-15T21:34:37Z

What does this PR do?

This PR integrates torchtitan as the backend for our trainer actor and our reference model actor. Crucially, this gives us the ability to run w/ multiple model parallelisms, which is needed for multi-node setup.
This PR also incorporates some syntactic sugar in the form of hf://, which allows users to specify a model from the Hugging Face hub and it will automatically either download it if it does not exist locally or find it in the catch and point the trainer / reference model to that directory.

How was this PR tested?

This PR was tested for "runnability" using the following combinations of parallelisms on a single node.

Trainer	Reference	Policy	Tested
single	single	single	✅
single	single	TP 2	✅
single	single	TP 4	✅
DP 2	DP 2	single	✅
DP 2	DP 2	TP 2	✅
TP 2	TP 2	TP 2	✅

In addition, I incorporated a unit test for the config hf:// specification.

FAQs

Why is the titan trainer and reference model slower than Hugging Face trainer and reference model? Presumably, this is b/c we have to a) run in fp32 for now (see Update titan weights to load in bfloat16 #166 for updates on changing this) and b) b/c we now have to convert first back to the Hugging Face format before pushing weights.
Why does loss parallel not work? Loss parallel is only guaranteed to work OOTB with PyTorch's cross entropy loss b/c PyTorch distributed does some automatic sharding logic for all underlying aten ops here. Since we write our own GRPO loss, we don't get this for free. A potential follow-up would be to look into enabling this through our own resharding logic in Forge or upstreaming a more general fix to PyTorch for the underlying aten ops we need.
How does the user know what the various knobs from Titan they can play with? Currently, there are no docs available for ForgeEnginer, ForgeTrainer, etc. This is a huge risk from the UX side of using Forge b/c the user will have to navigate themselves through the torchtitan codebase to figure this out. cc @mjtrm
**Why do we push compute logprobs onto the controller GPU? ** I agree this is not ideal. In an effort to keep the ReferenceModel idempotent, I changed it so that it just returns logits. In addition, this lets us reuse the compute_logprobs found in grpo/main.py. However, this is incredible slow and has a lot more computation on rank0 than I would prefer. cc @Jack-Khuu we should perhaps change this s.t. we create a ReferenceLogprobs Actor that does the logits and logprobs calculation. Less idempotent, but more efficient.

Is this blocked by anything?

YES: meta-pytorch/torchstore#32 cc @LucasLLC

Merge remote-tracking branch 'upstream/main' into titan-rl-trainer

pradeepfn · 2025-09-16T19:10:32Z

awesome results!.

@joecummings would you be able to copy/paste the configs we have to change for DP/TP. thanks.

vidhyav · 2025-09-16T21:22:34Z

Curious, what data did you test this with?

joecummings · 2025-09-16T22:16:58Z

Curious, what data did you test this with?

Everything is with GSM8K dataset

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

allenwang28

Great! I think there are areas we can improve on but way out of scope of this PR and I'll note a few more of them down

allenwang28 · 2025-09-18T19:02:55Z

apps/grpo/main.py

not for this PR, I would prefer this conversion to be a classmethod on Episode cc @Jack-Khuu

allenwang28 · 2025-09-18T19:07:45Z

src/forge/actors/trainer.py

keep this, the logging will work if you super().__init__() in __post_init__()

I'm confused tho - why do we need this if there's already a logger defined on the Actor?

allenwang28 · 2025-09-18T19:10:34Z

src/forge/actors/replay_buffer.py

not for this PR, but it seems really fragile for sample to be returning something defined by a function in the app, with no standard interface.

cc @Jack-Khuu - can we keep a note of this?

[WIP] GRPO Titan RL Trainer

938250a

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 15, 2025

joecummings added 5 commits September 16, 2025 07:15

Update config for reference model

fe35d56

wq

c3f423b

Merge remote-tracking branch 'upstream/main' into titan-rl-trainer

Works w/ single devices (kinda)

b535ffe

It runs!

6284e2c

Cleanup

bd8812d

joecummings added 4 commits September 17, 2025 11:24

Make DP work and simplify RefModel

5584dbe

EVERYTHING RUNS

0e1fe80

<Replace this line with a title. Use 1 line only, 67 chars or less>

790931a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merged

d1fde1e

joecummings changed the title ~~[WIP] GRPO Titan RL Trainer~~ GRPO Titan RL Trainer Sep 17, 2025

joecummings added 2 commits September 18, 2025 09:47

Add tests for hf hub auto download

abd6702

Add inference mode tag

0566f6f

joecummings marked this pull request as ready for review September 18, 2025 18:49

allenwang28 approved these changes Sep 18, 2025

View reviewed changes

joecummings merged commit d55de5b into meta-pytorch:main Sep 18, 2025
5 checks passed

joecummings deleted the titan-rl-trainer branch September 18, 2025 21:21

joecummings mentioned this pull request Sep 22, 2025

Update titan weights to load in bfloat16 #166

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO Titan RL Trainer #161

GRPO Titan RL Trainer #161

Uh oh!

joecummings commented Sep 15, 2025 •

edited

Loading

Uh oh!

pradeepfn commented Sep 16, 2025

Uh oh!

vidhyav commented Sep 16, 2025

Uh oh!

joecummings commented Sep 16, 2025

Uh oh!

allenwang28 left a comment

Uh oh!

allenwang28 Sep 18, 2025

Uh oh!

allenwang28 Sep 18, 2025

Uh oh!

joecummings Sep 18, 2025

Uh oh!

allenwang28 Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GRPO Titan RL Trainer #161

GRPO Titan RL Trainer #161

Uh oh!

Conversation

joecummings commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How was this PR tested?

FAQs

Is this blocked by anything?

Uh oh!

pradeepfn commented Sep 16, 2025

Uh oh!

vidhyav commented Sep 16, 2025

Uh oh!

joecummings commented Sep 16, 2025

Uh oh!

allenwang28 left a comment

Choose a reason for hiding this comment

Uh oh!

allenwang28 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

allenwang28 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

allenwang28 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joecummings commented Sep 15, 2025 •

edited

Loading