[WIP] Creates Judges as a wrapper on Policy #202

Jack-Khuu · 2025-09-19T23:27:01Z

LLMJudges and RewardModels (LLM's finetuned for evals) can both be used as "Verifiers" or "Graders".

This PR creates a Judge class which helps manage the pre/post processing that may be required.

They will take as input (prompts + responses) generated from a model, and return the evaluated quality of these samples. Results can then be used to make decisions on which responses to utilize (e.g. to-user or as a training metric)

Testing in progress

Outdated PR: #167

Ritesh1905 · 2025-09-20T04:10:04Z

apps/vllm/judge.py

+    print(f"Responses: {responses}\n")
+
+    try:
+        async with policy.session():


why do you need this?

We don't, will remove

src/forge/actors/generative_judge.py

Ritesh1905 · 2025-09-20T04:16:36Z

apps/vllm/judge.py

+
+    print("Spawning service...")
+    policy = await Policy.options(**cfg.services.policy).as_service(**cfg.policy)
+    evaluate = GenerativeJudge(


Could you help me understand why can't we actor'ify the generative judge? policy/service config can be passed to the GenerativeJudgeActor, and let it figure put the the actor creator, session semantics (if that is needed) etc?

I think Jack's intention here is that the generators can also be used as the judge (e.g. how it's done here)

but intuitively I feel like they should be kept separate. I feel like a judge specific actor and a reward model specific actor makes sense too, with all of the boilerplate kept with those implementations

Could you help me understand why can't we actor'ify the generative judge?
generators can also be used as the judge

The idea behind taking in a hydrated generator was to make it easy to enable uses of an existing policy (or policy version) as a discriminator.

That said we absolutely can make this an actor and push the setup inside. I wanted to avoid some of the boiler plate, but if we're fine with it, I'll send up a JudgeActor, RewardModelActor

enable uses of an existing policy (or policy version) as a discriminator.
Is this something that's common in the literature? If it's not I sort of would wanna wait and see until this is requested

I'll send up a JudgeActor, RewardModelActor
I think JudgeActor and RewardModelActor are reasonable. I also think it's reasonable if you want to do another PR now that renames Policy to VLLM / VLLMWorker etc. as a PR before this

allenwang28

Prompt: What is the capital of Japan?
Responses: ['Aardvark', 'Durian', 'Tokyo']

Generation Results:
================================================================================
Sample 1
Evaluation: 3
--------------------------------------------------------------------------------
Sample 2
Evaluation: 3
--------------------------------------------------------------------------------
Sample 3
Evaluation: 3
--------------------------------------------------------------------------------
Sample 4
Evaluation: 3
--------------------------------------------------------------------------------

lol is this working correctly?

allenwang28 · 2025-09-20T15:31:01Z

apps/vllm/judge.py

+    Note: This is not a "good" prompt setup, it just demonstrates how to make one
+    """
+
+    def _wrapper(prompt: str, responses: list[str]) -> str:


hmm, this is a good start and I think we can improve as well. IIUC in LLMs as verifiers we have two tracks:

Reward models

LLM as a judge

what seems to differ is the prompt you input and whether or not you have to massage the outputs? Is there a way we can minimize the user code to focus on just that?

Is there a way we can minimize the user code to focus on just that?

Agreed that the wrapper seems funky since i wanted to test generic models. In practice a user would just pass in a (str, list[str]) -> str to the constructor

We can bake in a default of "LLM as a judge" Judge(model_name, policy_config) and reduce the scope of class?

allenwang28 · 2025-09-20T15:33:57Z

apps/vllm/judge.py

+
+    print("Spawning service...")
+    policy = await Policy.options(**cfg.services.policy).as_service(**cfg.policy)
+    evaluate = GenerativeJudge(


I think Jack's intention here is that the generators can also be used as the judge (e.g. how it's done here)

but intuitively I feel like they should be kept separate. I feel like a judge specific actor and a reward model specific actor makes sense too, with all of the boilerplate kept with those implementations

Jack-Khuu · 2025-09-23T18:28:23Z

lol is this working correctly?

I wrote this prompt from the deep archives of my mind and I'm also shocked that the prompting worked.

Jack-Khuu · 2025-09-24T02:37:05Z

src/forge/actors/generative_judge.py

+            cls,
+            prompt_wrapper=prompt_wrapper,
+            output_postprocessor=output_postprocessor,
+            generator=policy,


Running into a pickling issue with passing around ServiceInterfaces

File "/home/jackkhuu/forge/src/forge/actors/generative_judge.py", line 51, in launch llm_judge = await judge_procs.spawn( File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/proc_mesh.py", line 254, in spawn return self._spawn_nonblocking(name, Class, *args, **kwargs) File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/proc_mesh.py", line 366, in _spawn_nonblocking return self._spawn_nonblocking_on(self._proc_mesh, name, Class, *args, **kwargs) File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/proc_mesh.py", line 386, in _spawn_nonblocking_on service = ActorMesh._create( File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/actor_mesh.py", line 1048, in _create send(ep, (mesh._class, proc_mesh, controller_controller, *args), kwargs) File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/actor_mesh.py", line 603, in send endpoint._send(args, kwargs, port, selection) File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/actor_mesh.py", line 465, in _send objects, bytes = flatten((args, kwargs), _is_ref_or_mailbox) File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/monarch/_src/actor/pickle.py", line 73, in flatten pickler.dump(obj) File "/home/jackkhuu/.fbpkg_conda_envs/forge-a7401c7/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) TypeError: cannot pickle '_asyncio.Future' object

Push basic GenerativeJudge example

249a6ab

Jack-Khuu requested review from allenwang28, Ritesh1905, joecummings and felipemello1 September 19, 2025 23:27

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025

Ritesh1905 reviewed Sep 20, 2025

View reviewed changes

src/forge/actors/generative_judge.py Outdated Show resolved Hide resolved

Ritesh1905 reviewed Sep 20, 2025

View reviewed changes

allenwang28 reviewed Sep 20, 2025

View reviewed changes

Jack-Khuu added 2 commits September 23, 2025 13:25

Merge remote-tracking branch 'origin/main' into judge2

be26c39

[Debug] Individual LLM/Reward Actors

e88f58f

Jack-Khuu changed the title ~~Creates GenerativeJudge as an interface for LLM Judges~~ [WIP] Creates GenerativeJudge as an interface for LLM Judges Sep 24, 2025

Jack-Khuu commented Sep 24, 2025

View reviewed changes

Jack-Khuu added 6 commits September 24, 2025 09:55

Merge remote-tracking branch 'origin/main' into judge2

634fe59

remove unused

336c997

debug

6a01bd7

Merge remote-tracking branch 'origin/main' into judge2

8c87d42

Refactor to subclass policy

f80ff68

Light cleanup-still testing

53607fd

Jack-Khuu changed the title ~~[WIP] Creates GenerativeJudge as an interface for LLM Judges~~ [WIP] Creates Judges as a wrapper on Policy Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Creates Judges as a wrapper on Policy #202

[WIP] Creates Judges as a wrapper on Policy #202

Uh oh!

Jack-Khuu commented Sep 19, 2025 •

edited

Loading

Uh oh!

Ritesh1905 Sep 20, 2025

Uh oh!

Jack-Khuu Sep 23, 2025

Uh oh!

Uh oh!

Ritesh1905 Sep 20, 2025

Uh oh!

allenwang28 Sep 20, 2025

Uh oh!

Jack-Khuu Sep 23, 2025 •

edited

Loading

Uh oh!

allenwang28 Sep 23, 2025

Uh oh!

allenwang28 left a comment

Uh oh!

allenwang28 Sep 20, 2025

Uh oh!

Jack-Khuu Sep 23, 2025

Uh oh!

allenwang28 Sep 20, 2025

Uh oh!

Jack-Khuu commented Sep 23, 2025 •

edited

Loading

Uh oh!

Jack-Khuu Sep 24, 2025

Uh oh!

Uh oh!

[WIP] Creates Judges as a wrapper on Policy #202

Are you sure you want to change the base?

[WIP] Creates Judges as a wrapper on Policy #202

Uh oh!

Conversation

Jack-Khuu commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing in progress

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allenwang28 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jack-Khuu commented Sep 19, 2025 •

edited

Loading

Jack-Khuu Sep 23, 2025 •

edited

Loading

Jack-Khuu commented Sep 23, 2025 •

edited

Loading