build(evals): Upgrade vitest-evals to 0.9.0-beta.6 by sentry-junior[bot] · Pull Request #332 · getsentry/warden

sentry-junior · 2026-05-18T05:27:14Z

Upgrade Warden evals from vitest-evals 0.9.0-beta.3 to 0.9.0-beta.5 and add the matching @vitest-evals/github-reporter package. The eval workflow now publishes reporter-generated summaries and annotations, while keeping Warden’s aggregate baseline check as the gating signal.

Harness API Migration

The eval and verification harnesses now use createJudge(...), typed JudgeContext inputs, typed harness output, and the normalized session shape expected by the beta. The judge tests call .assess(...) against the updated context.

GitHub Reporter

The workflow now runs vitest-evals-github-report against eval-results.json after pnpm evals. This replaces the hand-written summary script and the old JUnit annotation action.

Manual Eval Trigger

The workflow no longer treats the generic evals label as a manual trigger. Maintainers should add run-evals to a same-repository PR when they want CI evals without touching eval files.

Adapt to breaking API changes in vitest-evals beta.4: - namedJudge → createJudge (returns Judge object with .assess()) - JudgeContext.inputValue → JudgeContext.input (typed TInput) - JudgeContext generic order: <TInput, TOutput, TMetadata, THarness> - Harness generic order: <TInput, TOutput, TMetadata> - Remove Harness.prompt (no longer part of the interface) - Remove NormalizedSession.outputText (dropped upstream) - Update tests: judge(ctx) → judge.assess(ctx), fix context shape Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Prepare for replacing the manual CI reporting steps with the new vitest-evals GitHub reporter CLI. - Add @vitest-evals/github-reporter devDependency (pending npm publish) - Drop JUnit reporter config from vitest.config.ts (no longer needed) The workflow changes (.github/workflows/evals.yml) must be applied separately due to GitHub App permissions. See PR description for the diff. Depends on: getsentry/vitest-evals#58 Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2a6ca13. Configure here.}

Update the eval runner packages to 0.9.0-beta.5 so the PR uses the release that includes the GitHub reporter package. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

Use only the canonical run-evals label for manually triggering Warden evals on pull requests. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

Replace the hand-written eval summary and JUnit annotation path with the vitest-evals GitHub reporter CLI. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

dcramer · 2026-05-18T17:42:12Z

Going to get beta.6 out first with the new github action

Update the eval package to vitest-evals 0.9.0-beta.6 and use the native GitHub Action for CI report publishing. Adjust usage metadata for the beta.6 usage summary contract and refresh eval docs to match the reporter path. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

Include Vitest task locations in eval JSON so the beta.6 reporter can emit workflow annotations for failed evals. Force JavaScript actions to run on Node 24 in the eval workflow to avoid the Node 20 runtime path. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

vercel Bot deployed to Preview May 18, 2026 05:27 View deployment

vercel Bot had a problem deploying to Preview May 18, 2026 05:46 Failure

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread packages/evals/vitest.config.ts

build(evals): Bump vitest-evals to beta.5

7e5568a

Update the eval runner packages to 0.9.0-beta.5 so the PR uses the release that includes the GitHub reporter package. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

vercel Bot deployed to Preview May 18, 2026 15:36 View deployment

dcramer changed the title ~~chore(evals): upgrade vitest-evals to 0.9.0-beta.4~~ build(evals): Upgrade vitest-evals to 0.9.0-beta.5 May 18, 2026

dcramer added the run-evals label May 18, 2026

ci(evals): Require run-evals label

9aa6bff

Use only the canonical run-evals label for manually triggering Warden evals on pull requests. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

vercel Bot deployed to Preview May 18, 2026 15:41 View deployment

ci(evals): Use vitest-evals GitHub reporter

0c81af1

Replace the hand-written eval summary and JUnit annotation path with the vitest-evals GitHub reporter CLI. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

vercel Bot deployed to Preview May 18, 2026 15:51 View deployment

dcramer changed the title ~~build(evals): Upgrade vitest-evals to 0.9.0-beta.5~~ build(evals): Upgrade vitest-evals to 0.9.0-beta.6 May 18, 2026

vercel Bot deployed to Preview May 18, 2026 20:57 View deployment

vercel Bot deployed to Preview May 18, 2026 21:12 View deployment

dcramer merged commit 9547a0e into main May 18, 2026
20 checks passed

dcramer deleted the jr/upgrade-vitest-evals-beta4 branch May 18, 2026 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build(evals): Upgrade vitest-evals to 0.9.0-beta.6#332

build(evals): Upgrade vitest-evals to 0.9.0-beta.6#332
dcramer merged 7 commits into
mainfrom
jr/upgrade-vitest-evals-beta4

sentry-junior Bot commented May 18, 2026 •

edited by dcramer

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

dcramer commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sentry-junior Bot commented May 18, 2026 • edited by dcramer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcramer commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sentry-junior Bot commented May 18, 2026 •

edited by dcramer

Loading