-
Notifications
You must be signed in to change notification settings - Fork 358
Add Efficient Online Training with GRPO and vLLM in TRL recipe
#334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@qgallouedec, in case you want to take a look. I still need to run the full training to get the final results, but the key takeaways are already visible. |
|
Recipe ready for review, now with training results added 😃 |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:45Z I think I'd like to frame this as follows sequentially:
(reverse the order) |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:46Z do we need to state differences between PPO and GRPO in this notebook? imo let's only define GRPO to keep the focus in vLLM + online methods. it would be confusing otherwise. if people want to learn more about it they can check a guide that goes through the differences |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:47Z nice! |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:47Z small newline between import and the function is more readable (same goes above) |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:48Z actually would be cool for people to push their trackio logs to Hub and see it there instead of in notebook. also helps with growth there |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:49Z some logs are unnecessary, any way we can change verbosity here? |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:50Z we could let them push to Hub automatically and let them check there |
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-06T10:22:50Z super cool! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice! feel free to re-ping me! 🫡
|
View / edit / reply to this conversation on ReviewNB merveenoyan commented on 2025-10-07T08:53:12Z nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go!
What does this PR do?
Add
Efficient Online Training with GRPO and vLLM in TRLrecipe to showcase online training possibilities in TRL.This recipe is a modification of Post training an LLM for reasoning with GRPO in TRL and I aim to include it in the vLLM docs here
Who can review?
Feel free to tag members/contributors who may be interested in your PR.