Skip to content

Conversation

@sergiopaniego
Copy link
Member

What does this PR do?

Add Efficient Online Training with GRPO and vLLM in TRL recipe to showcase online training possibilities in TRL.
This recipe is a modification of Post training an LLM for reasoning with GRPO in TRL and I aim to include it in the vLLM docs here

Who can review?

Feel free to tag members/contributors who may be interested in your PR.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sergiopaniego sergiopaniego marked this pull request as ready for review October 2, 2025 16:25
@sergiopaniego
Copy link
Member Author

@qgallouedec, in case you want to take a look. I still need to run the full training to get the final results, but the key takeaways are already visible.

@sergiopaniego
Copy link
Member Author

sergiopaniego commented Oct 3, 2025

Recipe ready for review, now with training results added 😃

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:45Z
----------------------------------------------------------------

I think I'd like to frame this as follows sequentially:

  1. state the problem: GRPO and DPO require doing inference during training (first define online part)
  2. solution: this is super inefficient unless we use something like vLLM, which provides efficient inference and there's colocation on TRL
  3. you can use this notebook for any online method

(reverse the order)


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:46Z
----------------------------------------------------------------

do we need to state differences between PPO and GRPO in this notebook? imo let's only define GRPO to keep the focus in vLLM + online methods. it would be confusing otherwise. if people want to learn more about it they can check a guide that goes through the differences


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:47Z
----------------------------------------------------------------

nice!


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:47Z
----------------------------------------------------------------

small newline between import and the function is more readable (same goes above)


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:48Z
----------------------------------------------------------------

actually would be cool for people to push their trackio logs to Hub and see it there instead of in notebook. also helps with growth there


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:49Z
----------------------------------------------------------------

some logs are unnecessary, any way we can change verbosity here?


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:50Z
----------------------------------------------------------------

we could let them push to Hub automatically and let them check there


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 6, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-06T10:22:50Z
----------------------------------------------------------------

super cool!


Copy link
Collaborator

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! feel free to re-ping me! 🫡

@review-notebook-app
Copy link

review-notebook-app bot commented Oct 7, 2025

View / edit / reply to this conversation on ReviewNB

merveenoyan commented on 2025-10-07T08:53:12Z
----------------------------------------------------------------

nice!


Copy link
Collaborator

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go!

@merveenoyan merveenoyan merged commit 4d89ce5 into main Oct 7, 2025
2 checks passed
@merveenoyan merveenoyan deleted the vllm_notebook branch October 7, 2025 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants