Skip to content

Conversation

Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Sep 19, 2025

LLMJudges and RewardModels (LLM's finetuned for evals) can both be used as "Verifiers" or "Graders".

This PR creates a Judge class which helps manage the pre/post processing that may be required.

They will take as input (prompts + responses) generated from a model, and return the evaluated quality of these samples. Results can then be used to make decisions on which responses to utilize (e.g. to-user or as a training metric)


Testing in progress

Outdated PR: #167

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025
print(f"Responses: {responses}\n")

try:
async with policy.session():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, will remove


print("Spawning service...")
policy = await Policy.options(**cfg.services.policy).as_service(**cfg.policy)
evaluate = GenerativeJudge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand why can't we actor'ify the generative judge? policy/service config can be passed to the GenerativeJudgeActor, and let it figure put the the actor creator, session semantics (if that is needed) etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jack's intention here is that the generators can also be used as the judge (e.g. how it's done here)

but intuitively I feel like they should be kept separate. I feel like a judge specific actor and a reward model specific actor makes sense too, with all of the boilerplate kept with those implementations

Copy link
Contributor Author

@Jack-Khuu Jack-Khuu Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand why can't we actor'ify the generative judge?
generators can also be used as the judge

The idea behind taking in a hydrated generator was to make it easy to enable uses of an existing policy (or policy version) as a discriminator.

That said we absolutely can make this an actor and push the setup inside. I wanted to avoid some of the boiler plate, but if we're fine with it, I'll send up a JudgeActor, RewardModelActor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable uses of an existing policy (or policy version) as a discriminator.
Is this something that's common in the literature? If it's not I sort of would wanna wait and see until this is requested

I'll send up a JudgeActor, RewardModelActor
I think JudgeActor and RewardModelActor are reasonable. I also think it's reasonable if you want to do another PR now that renames Policy to VLLM / VLLMWorker etc. as a PR before this

Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prompt: What is the capital of Japan?
Responses: ['Aardvark', 'Durian', 'Tokyo']

Generation Results:
================================================================================
Sample 1
Evaluation: 3
--------------------------------------------------------------------------------
Sample 2
Evaluation: 3
--------------------------------------------------------------------------------
Sample 3
Evaluation: 3
--------------------------------------------------------------------------------
Sample 4
Evaluation: 3
--------------------------------------------------------------------------------

lol is this working correctly?

Note: This is not a "good" prompt setup, it just demonstrates how to make one
"""

def _wrapper(prompt: str, responses: list[str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is a good start and I think we can improve as well. IIUC in LLMs as verifiers we have two tracks:

  1. Reward models
  2. LLM as a judge

what seems to differ is the prompt you input and whether or not you have to massage the outputs? Is there a way we can minimize the user code to focus on just that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can minimize the user code to focus on just that?

Agreed that the wrapper seems funky since i wanted to test generic models. In practice a user would just pass in a (str, list[str]) -> str to the constructor

We can bake in a default of "LLM as a judge" Judge(model_name, policy_config) and reduce the scope of class?


print("Spawning service...")
policy = await Policy.options(**cfg.services.policy).as_service(**cfg.policy)
evaluate = GenerativeJudge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jack's intention here is that the generators can also be used as the judge (e.g. how it's done here)

but intuitively I feel like they should be kept separate. I feel like a judge specific actor and a reward model specific actor makes sense too, with all of the boilerplate kept with those implementations

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Sep 23, 2025

lol is this working correctly?

I wrote this prompt from the deep archives of my mind and I'm also shocked that the prompting worked.

@Jack-Khuu Jack-Khuu changed the title Creates GenerativeJudge as an interface for LLM Judges [WIP] Creates GenerativeJudge as an interface for LLM Judges Sep 24, 2025
cls,
prompt_wrapper=prompt_wrapper,
output_postprocessor=output_postprocessor,
generator=policy,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running into a pickling issue with passing around ServiceInterfaces

  File "/home/jackkhuu/forge/src/forge/actors/generative_judge.py", line 51, in launch
    llm_judge = await judge_procs.spawn(
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/proc_mesh.py", line 254, in spawn
    return self._spawn_nonblocking(name, Class, *args, **kwargs)
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/proc_mesh.py", line 366, in _spawn_nonblocking
    return self._spawn_nonblocking_on(self._proc_mesh, name, Class, *args, **kwargs)
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/proc_mesh.py", line 386, in _spawn_nonblocking_on
    service = ActorMesh._create(
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/actor_mesh.py", line 1048, in _create
    send(ep, (mesh._class, proc_mesh, controller_controller, *args), kwargs)
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/actor_mesh.py", line 603, in send
    endpoint._send(args, kwargs, port, selection)
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/actor_mesh.py", line 465, in _send
    objects, bytes = flatten((args, kwargs), _is_ref_or_mailbox)
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/pickle.py", line 73, in flatten
    pickler.dump(obj)
  File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump
    return super().dump(obj)
TypeError: cannot pickle '_asyncio.Future' object

@Jack-Khuu Jack-Khuu changed the title [WIP] Creates GenerativeJudge as an interface for LLM Judges [WIP] Creates Judges as a wrapper on Policy Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants