Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334

sergiopaniego · 2025-10-01T16:10:04Z

What does this PR do?

Add Efficient Online Training with GRPO and vLLM in TRL recipe to showcase online training possibilities in TRL.
This recipe is a modification of Post training an LLM for reasoning with GRPO in TRL and I aim to include it in the vLLM docs here

Who can review?

Feel free to tag members/contributors who may be interested in your PR.

review-notebook-app · 2025-10-01T16:10:09Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

HuggingFaceDocBuilderDev · 2025-10-01T16:15:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego · 2025-10-02T16:28:30Z

@qgallouedec, in case you want to take a look. I still need to run the full training to get the final results, but the key takeaways are already visible.

sergiopaniego · 2025-10-03T12:50:00Z

Recipe ready for review, now with training results added 😃

review-notebook-app · 2025-10-06T10:22:46Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:45Z
----------------------------------------------------------------

I think I'd like to frame this as follows sequentially:

state the problem: GRPO and DPO require doing inference during training (first define online part)
solution: this is super inefficient unless we use something like vLLM, which provides efficient inference and there's colocation on TRL
you can use this notebook for any online method

(reverse the order)

review-notebook-app · 2025-10-06T10:22:47Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:46Z
----------------------------------------------------------------

do we need to state differences between PPO and GRPO in this notebook? imo let's only define GRPO to keep the focus in vLLM + online methods. it would be confusing otherwise. if people want to learn more about it they can check a guide that goes through the differences

review-notebook-app · 2025-10-06T10:22:47Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:47Z
----------------------------------------------------------------

nice!

review-notebook-app · 2025-10-06T10:22:48Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:47Z
----------------------------------------------------------------

small newline between import and the function is more readable (same goes above)

review-notebook-app · 2025-10-06T10:22:49Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:48Z
----------------------------------------------------------------

actually would be cool for people to push their trackio logs to Hub and see it there instead of in notebook. also helps with growth there

review-notebook-app · 2025-10-06T10:22:49Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:49Z
----------------------------------------------------------------

some logs are unnecessary, any way we can change verbosity here?

review-notebook-app · 2025-10-06T10:22:50Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:50Z
----------------------------------------------------------------

we could let them push to Hub automatically and let them check there

review-notebook-app · 2025-10-06T10:22:51Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:50Z
----------------------------------------------------------------

super cool!

merveenoyan

very nice! feel free to re-ping me! 🫡

review-notebook-app · 2025-10-07T08:53:13Z

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-07T08:53:12Z
----------------------------------------------------------------

nice!

merveenoyan

let's go!

notebooks/en/index.md

Add Efficient Online Training with GRPO and vLLM in TRL recipe

5926c13

Updated notebook

463f240

sergiopaniego marked this pull request as ready for review October 2, 2025 16:25

sergiopaniego requested review from merveenoyan and stevhliu October 2, 2025 16:26

sergiopaniego added 2 commits October 3, 2025 13:04

Notebook updated

4c8c219

Updated notebook

f6da4fc

merveenoyan reviewed Oct 6, 2025

View reviewed changes

Updated based on review

faa973d

merveenoyan approved these changes Oct 7, 2025

View reviewed changes

notebooks/en/index.md Show resolved Hide resolved

merveenoyan merged commit 4d89ce5 into main Oct 7, 2025
2 checks passed

merveenoyan deleted the vllm_notebook branch October 7, 2025 09:07

Add Efficient Online Training with GRPO and vLLM in TRL recipe #334

Add Efficient Online Training with GRPO and vLLM in TRL recipe #334

Uh oh!

Conversation

sergiopaniego commented Oct 1, 2025

What does this PR do?

Who can review?

Uh oh!

review-notebook-app bot commented Oct 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

sergiopaniego commented Oct 2, 2025

Uh oh!

sergiopaniego commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merveenoyan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

review-notebook-app bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334

Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334

sergiopaniego commented Oct 3, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

review-notebook-app bot commented Oct 6, 2025 •

edited

Loading

merveenoyan left a comment •

edited

Loading

review-notebook-app bot commented Oct 7, 2025 •

edited

Loading