-
Notifications
You must be signed in to change notification settings - Fork 24
GRPO Titan RL Trainer #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
awesome results!. @joecummings would you be able to copy/paste the configs we have to change for DP/TP. thanks. |
Curious, what data did you test this with? |
Everything is with GSM8K dataset |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think there are areas we can improve on but way out of scope of this PR and I'll note a few more of them down
apps/grpo/main.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, I would prefer this conversion to be a classmethod on Episode cc @Jack-Khuu
src/forge/actors/trainer.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this, the logging will work if you super().__init__()
in __post_init__()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused tho - why do we need this if there's already a logger defined on the Actor?
src/forge/actors/replay_buffer.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but it seems really fragile for sample to be returning something defined by a function in the app, with no standard interface.
cc @Jack-Khuu - can we keep a note of this?
What does this PR do?
hf://
, which allows users to specify a model from the Hugging Face hub and it will automatically either download it if it does not exist locally or find it in the catch and point the trainer / reference model to that directory.How was this PR tested?
This PR was tested for "runnability" using the following combinations of parallelisms on a single node.
In addition, I incorporated a unit test for the config
hf://
specification.FAQs
Is this blocked by anything?
YES: meta-pytorch/torchstore#32 cc @LucasLLC