Skip to content

Conversation

@Jack-Khuu
Copy link
Contributor

Add structured output (aka guided decoding) example to policy


python src/forge/actors/policy.py (With structured output toggled)

User: What is 3+5?
Assistant: Positive

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 13, 2025
Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, otherwise looks good

sampling_params = None
if guided_decoding:
# Add config for structured output
vllm_args = await policy_actor.get_vllm_args.choose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noob q: why are we running .choose here? It seems to me the more idiomatic thing to do here would be .call but maybe I'm missing the point (also I assume calling all actors is not a bottleneck here anyways as this just returns an instance variable)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use choose when you just want one of the actors to do something. Basically like "if rank == 0: do"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding call also triggers the entire Mesh, while like @pbontrager mentioned choose and call_one are more direct (or in the case of no arg it just picks a random)

Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

sampling_params = None
if guided_decoding:
# Add config for structured output
vllm_args = await policy_actor.get_vllm_args.choose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use choose when you just want one of the actors to do something. Basically like "if rank == 0: do"

print("Model running")

prompt = "Tell me a joke"
prompt = "What is 3+5?" if guided_decoding else "Tell me a joke"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you spin these things off as proper tests at some point? @ebsmothers is running something like this too big for our current test setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can throw them on the H100s and it should be fine once we get things more fleshed out

router = await router_mesh.spawn("policy_router", PolicyRouter, policy=policy_actor)

sampling_params = None
if guided_decoding:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think of how we want to do this from the config in the future. Maybe some sampling param builders that the user can call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda down to just lean into the vllm convention to start (Read from generation.json to populate, then let users override the kwargs)

This is probably something we ask for upstream

@Jack-Khuu Jack-Khuu merged commit d7f0c35 into main Aug 13, 2025
1 check passed
@Jack-Khuu Jack-Khuu deleted the guided-decoding branch August 13, 2025 18:13
photomz pushed a commit to photomz/forge that referenced this pull request Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants