-
Notifications
You must be signed in to change notification settings - Fork 17
[Cleanup] [Policy] Ensure Policy is documented and matches vLLM #382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cleanup] [Policy] Ensure Policy is documented and matches vLLM #382
Conversation
trace_headers=None, | ||
priority=priority, | ||
data_parallel_rank=None, | ||
data_parallel_rank=None, # We do not support DP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible someone tries to set DP through the vLLM args? Or in other words, any possibility this fails silently because this is a comment instead of an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha there's a lot of that b/c we choose not to have a shim layer between what we support and the span of config options offered through vLLM.
I can try to call out the egregious cases like this one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair haha
I think this isn't as important for today, and maybe we can cover ourselves by just commenting that our design choice broadly comes with implications that things don't work / this is experimental...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we choose not to have a shim layer between what we support and the span of config options
This is the right move for simplifying things, I'd be shocked if we don't end up talking about adding shim back by EOY haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be easy to add an assert in the post init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If grpo.main
works, ship it
Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>
Co-authored-by: Jack-Khuu <jack.khuu.7@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a number of context and args comments, but should be all relatively small
TORCHSTORE_USE_RDMA.get_value() == 0 | ||
) # torchstore currently only accepts 0 or 1 | ||
# Gets set up by setup | ||
# Remaining variables are initialized in self.setup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary. Everything up here are config like objects. So dataclasses, direct values, or callables. You shouldn't have to specify that variables are created in setup as that's the pattern everywhere. Also, I'd be fine with removing lora_request if we don't support it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair, just didn't know if you had left this in here for readability at a glance.
prompt_str, request = self.processor.process_inputs( | ||
request_id=request_id, | ||
prompt=prompt_dict, | ||
prompt={"prompt": prompt}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably fine but it takes us a bit further from vllm
trace_headers=None, | ||
priority=priority, | ||
data_parallel_rank=None, | ||
data_parallel_rank=None, # We do not support DP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be easy to add an assert in the post init
|
||
@dataclass | ||
class PolicyWorker(ForgeActor): | ||
"""Mirrors a vLLM GPUWorker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this match the parent worker so that we can just pipe the args, kwargs though in launch and then build vllm config here as well.
Context
(See specific changes & comments in the PR files themselves)
How do you know I didn't break things?
python tests/integration_tests/test_vllm_policy_correctness.py
Results:
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
Results: https://wandb.ai/jcummings/grpo-training/runs/kt1m006u?nw=nwuserjcummings