Skip to content

build(evals): Upgrade vitest-evals to 0.9.0-beta.6#332

Merged
dcramer merged 7 commits into
mainfrom
jr/upgrade-vitest-evals-beta4
May 18, 2026
Merged

build(evals): Upgrade vitest-evals to 0.9.0-beta.6#332
dcramer merged 7 commits into
mainfrom
jr/upgrade-vitest-evals-beta4

Conversation

@sentry-junior
Copy link
Copy Markdown
Contributor

@sentry-junior sentry-junior Bot commented May 18, 2026

Upgrade Warden evals from vitest-evals 0.9.0-beta.3 to 0.9.0-beta.5 and add the matching @vitest-evals/github-reporter package. The eval workflow now publishes reporter-generated summaries and annotations, while keeping Warden’s aggregate baseline check as the gating signal.

Harness API Migration

The eval and verification harnesses now use createJudge(...), typed JudgeContext inputs, typed harness output, and the normalized session shape expected by the beta. The judge tests call .assess(...) against the updated context.

GitHub Reporter

The workflow now runs vitest-evals-github-report against eval-results.json after pnpm evals. This replaces the hand-written summary script and the old JUnit annotation action.

Manual Eval Trigger

The workflow no longer treats the generic evals label as a manual trigger. Maintainers should add run-evals to a same-repository PR when they want CI evals without touching eval files.

Adapt to breaking API changes in vitest-evals beta.4:

- namedJudge → createJudge (returns Judge object with .assess())
- JudgeContext.inputValue → JudgeContext.input (typed TInput)
- JudgeContext generic order: <TInput, TOutput, TMetadata, THarness>
- Harness generic order: <TInput, TOutput, TMetadata>
- Remove Harness.prompt (no longer part of the interface)
- Remove NormalizedSession.outputText (dropped upstream)
- Update tests: judge(ctx) → judge.assess(ctx), fix context shape

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Prepare for replacing the manual CI reporting steps with the new
vitest-evals GitHub reporter CLI.

- Add @vitest-evals/github-reporter devDependency (pending npm publish)
- Drop JUnit reporter config from vitest.config.ts (no longer needed)

The workflow changes (.github/workflows/evals.yml) must be applied
separately due to GitHub App permissions. See PR description for the
diff.

Depends on: getsentry/vitest-evals#58

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2a6ca13. Configure here.

Comment thread packages/evals/vitest.config.ts
Update the eval runner packages to 0.9.0-beta.5 so the PR uses the release that includes the GitHub reporter package.

Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
@dcramer dcramer changed the title chore(evals): upgrade vitest-evals to 0.9.0-beta.4 build(evals): Upgrade vitest-evals to 0.9.0-beta.5 May 18, 2026
Use only the canonical run-evals label for manually triggering Warden evals on pull requests.

Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
Replace the hand-written eval summary and JUnit annotation path with the vitest-evals GitHub reporter CLI.

Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
@dcramer
Copy link
Copy Markdown
Member

dcramer commented May 18, 2026

Going to get beta.6 out first with the new github action

@dcramer dcramer changed the title build(evals): Upgrade vitest-evals to 0.9.0-beta.5 build(evals): Upgrade vitest-evals to 0.9.0-beta.6 May 18, 2026
Update the eval package to vitest-evals 0.9.0-beta.6 and use the native GitHub Action for CI report publishing.

Adjust usage metadata for the beta.6 usage summary contract and refresh eval docs to match the reporter path.

Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
Include Vitest task locations in eval JSON so the beta.6 reporter can emit workflow annotations for failed evals.

Force JavaScript actions to run on Node 24 in the eval workflow to avoid the Node 20 runtime path.

Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
@dcramer dcramer merged commit 9547a0e into main May 18, 2026
20 checks passed
@dcramer dcramer deleted the jr/upgrade-vitest-evals-beta4 branch May 18, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant