Skip to content

feat(eval): add verbose CLI reporting and CI failure annotations#107

Merged
thegovind merged 1 commit intomainfrom
improve-eval-reporting
Feb 7, 2026
Merged

feat(eval): add verbose CLI reporting and CI failure annotations#107
thegovind merged 1 commit intomainfrom
improve-eval-reporting

Conversation

@thegovind
Copy link
Collaborator

Summary

  • tests/harness/runner.ts: Add detailed verbose output for scenario evaluations and ralph loop results — prints pattern checks, acceptance criteria matches, individual findings with severity/suggestion/code-snippet, and per-skill failure summaries in the CLI. Enrich the markdown report's Failed Scenarios section with severity labels, suggestions, and matched/incorrect acceptance-criteria sections.
  • .github/workflows/skill-evaluation.yml: Add a new step that emits ::error:: annotations for every failed scenario (surfacing top-3 errors inline in the Actions UI). Set artifact retention-days: 7 and if-no-files-found: warn on the results upload step.

The goal is to make evaluation failures visible directly in the CLI and in the GitHub Actions summary without needing to download artifacts.

Testing

  • pnpm typecheck — passed locally.
  • Note: LSP diagnostics could not run because typescript-language-server is not installed in this environment.
  • No runtime tests were executed; changes are additive console output and workflow steps.

- Print scenario pattern checks, acceptance criteria, and findings in verbose mode
- Add per-skill failure summaries with error details
- Show ralph loop iteration score trails
- Enrich markdown report with severity labels, suggestions, and matched sections
- Add GitHub Actions error annotations for failed scenarios
- Set artifact retention to 7 days with if-no-files-found warning

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@thegovind thegovind merged commit af29b8b into main Feb 7, 2026
2 checks passed
@thegovind thegovind deleted the improve-eval-reporting branch February 7, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant