-
Notifications
You must be signed in to change notification settings - Fork 50
Add Structured Output Support to Policy #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, otherwise looks good
| sampling_params = None | ||
| if guided_decoding: | ||
| # Add config for structured output | ||
| vllm_args = await policy_actor.get_vllm_args.choose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noob q: why are we running .choose here? It seems to me the more idiomatic thing to do here would be .call but maybe I'm missing the point (also I assume calling all actors is not a bottleneck here anyways as this just returns an instance variable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use choose when you just want one of the actors to do something. Basically like "if rank == 0: do"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding call also triggers the entire Mesh, while like @pbontrager mentioned choose and call_one are more direct (or in the case of no arg it just picks a random)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| sampling_params = None | ||
| if guided_decoding: | ||
| # Add config for structured output | ||
| vllm_args = await policy_actor.get_vllm_args.choose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use choose when you just want one of the actors to do something. Basically like "if rank == 0: do"
| print("Model running") | ||
|
|
||
| prompt = "Tell me a joke" | ||
| prompt = "What is 3+5?" if guided_decoding else "Tell me a joke" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you spin these things off as proper tests at some point? @ebsmothers is running something like this too big for our current test setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can throw them on the H100s and it should be fine once we get things more fleshed out
| router = await router_mesh.spawn("policy_router", PolicyRouter, policy=policy_actor) | ||
|
|
||
| sampling_params = None | ||
| if guided_decoding: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think of how we want to do this from the config in the future. Maybe some sampling param builders that the user can call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kinda down to just lean into the vllm convention to start (Read from generation.json to populate, then let users override the kwargs)
This is probably something we ask for upstream
Add structured output (aka guided decoding) example to policy
python src/forge/actors/policy.py(With structured output toggled)