build(evals): Upgrade vitest-evals to 0.9.0-beta.6#332
Merged
Conversation
Adapt to breaking API changes in vitest-evals beta.4: - namedJudge → createJudge (returns Judge object with .assess()) - JudgeContext.inputValue → JudgeContext.input (typed TInput) - JudgeContext generic order: <TInput, TOutput, TMetadata, THarness> - Harness generic order: <TInput, TOutput, TMetadata> - Remove Harness.prompt (no longer part of the interface) - Remove NormalizedSession.outputText (dropped upstream) - Update tests: judge(ctx) → judge.assess(ctx), fix context shape Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Prepare for replacing the manual CI reporting steps with the new vitest-evals GitHub reporter CLI. - Add @vitest-evals/github-reporter devDependency (pending npm publish) - Drop JUnit reporter config from vitest.config.ts (no longer needed) The workflow changes (.github/workflows/evals.yml) must be applied separately due to GitHub App permissions. See PR description for the diff. Depends on: getsentry/vitest-evals#58 Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2a6ca13. Configure here.
Update the eval runner packages to 0.9.0-beta.5 so the PR uses the release that includes the GitHub reporter package. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
Use only the canonical run-evals label for manually triggering Warden evals on pull requests. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
Replace the hand-written eval summary and JUnit annotation path with the vitest-evals GitHub reporter CLI. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
Member
|
Going to get beta.6 out first with the new github action |
Update the eval package to vitest-evals 0.9.0-beta.6 and use the native GitHub Action for CI report publishing. Adjust usage metadata for the beta.6 usage summary contract and refresh eval docs to match the reporter path. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
Include Vitest task locations in eval JSON so the beta.6 reporter can emit workflow annotations for failed evals. Force JavaScript actions to run on Node 24 in the eval workflow to avoid the Node 20 runtime path. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Upgrade Warden evals from
vitest-evals0.9.0-beta.3to0.9.0-beta.5and add the matching@vitest-evals/github-reporterpackage. The eval workflow now publishes reporter-generated summaries and annotations, while keeping Warden’s aggregate baseline check as the gating signal.Harness API Migration
The eval and verification harnesses now use
createJudge(...), typedJudgeContextinputs, typed harness output, and the normalized session shape expected by the beta. The judge tests call.assess(...)against the updated context.GitHub Reporter
The workflow now runs
vitest-evals-github-reportagainsteval-results.jsonafterpnpm evals. This replaces the hand-written summary script and the old JUnit annotation action.Manual Eval Trigger
The workflow no longer treats the generic
evalslabel as a manual trigger. Maintainers should addrun-evalsto a same-repository PR when they want CI evals without touching eval files.