feat(judges): Add harness context to judge API by dcramer · Pull Request #46 · getsentry/vitest-evals

dcramer · 2026-05-03T21:37:09Z

Give every judge the same JudgeContext populated from the configured harness. LLM-backed judges now reuse the suite model seam through the required harness.prompt(...) method, while adapter-specific runtime objects stay scoped to app execution internals such as tools and events.

Single Judge API

Automatic judges, explicit toSatisfyJudge(...) calls, and built-in deterministic judges all receive the same normalized context. The harness-specific judge context and judge-facing runtime object are gone, so custom judges read run data, metadata, tool calls, and the configured harness from one place.

Implicit Matcher Context

Fixture-backed runs register their run, session, and output objects so matcher calls can infer input, metadata, tool calls, and harness without repetitive options. Exact registered objects win over the latest-run fallback, which keeps expect(result.output).toSatisfyJudge(...) concise without hardcoding one narrow case.

Required Harness Prompt

Harness.prompt is required across the root type and first-party harness constructors. Rubric and factuality judges call harness.prompt(...); harness.run(...) remains the explicit escape hatch for intentionally running the application again.

API Policy

Add a small API design policy for this lesson: prefer one shared contextual API, keep owned capabilities mandatory, put capabilities on the object that owns their configuration, and avoid parallel public objects with overlapping lifecycle names such as harness and runtime.

Fixes #45

Pass configured harness context into automatic and explicit judge calls so rubric judges can reuse the suite prompt seam without duplicating provider setup. Register fixture run context for matcher assertions, including raw output and session objects, while keeping explicit matcher overrides available for manual values. Make harness prompt configuration required and keep judge prompting on context.harness.prompt(...) so the API does not split judge capabilities across harness and runtime objects. Fixes GH-45 Co-Authored-By: OpenAI Codex <codex@openai.com>

dcramer force-pushed the codex/judge-harness-context branch 2 times, most recently from bc202a9 to 7897a6d Compare May 3, 2026 21:56

dcramer force-pushed the codex/judge-harness-context branch from 7897a6d to df19255 Compare May 3, 2026 22:09

dcramer marked this pull request as ready for review May 3, 2026 22:22

dcramer merged commit 760ea18 into main May 3, 2026
8 checks passed

dcramer deleted the codex/judge-harness-context branch May 3, 2026 22:23

dcramer mentioned this pull request May 4, 2026

Add a first-party harness-backed RubricJudge #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(judges): Add harness context to judge API#46

feat(judges): Add harness context to judge API#46
dcramer merged 1 commit into
mainfrom
codex/judge-harness-context

dcramer commented May 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dcramer commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dcramer commented May 3, 2026 •

edited

Loading