Skip to content

fix: mock engine echoes file content for CI evals (#227)#228

Merged
github-actions[bot] merged 3 commits into
mainfrom
squad/227-fix-mock-eval-ci
Apr 28, 2026
Merged

fix: mock engine echoes file content for CI evals (#227)#228
github-actions[bot] merged 3 commits into
mainfrom
squad/227-fix-mock-eval-ci

Conversation

@spboyer
Copy link
Copy Markdown
Member

@spboyer spboyer commented Apr 28, 2026

Closes #227

Summary

The Run Waza Evaluation CI job runs examples/code-explainer/eval.yaml with the mock engine. The mock previously returned only Mock response for: <prompt> plus a file count, so _output_contains expectations against file contents (e.g., "async", "fetch", "recursive") failed every time — 0/4 tasks passed.

What Changed

  • internal/execution/engine.go — Added TaskName and TaskDescription fields to ExecutionRequest so task metadata flows to engines.
  • internal/execution/mock.go — Mock response now echoes: task name/description, context metadata key/values, file paths, and up to 1KB of file content per resource.
  • internal/orchestration/runner.go — Populates the new TaskName/TaskDescription fields from TestCase.
  • internal/execution/mock_engine_test.go — Updated existing test for new format; added TestMockEngine_Execute_IncludesResourceContent and TestMockEngine_Execute_TruncatesLargeContent.

Testing

  • All unit tests pass (go test ./...)
  • go vet ./... clean
  • examples/code-explainer/eval.yaml now passes 4/4 tasks (was 0/4)
  • No changes to example files — only the mock engine was modified

Copilot AI added 2 commits April 28, 2026 15:12
…work in CI

The waza-eval.yml CI job runs examples/code-explainer/eval.yaml with the mock
engine. The mock previously returned only "Mock response for: <prompt>" + a
file count, so realistic _output_contains expectations against file contents
(e.g., "async", "fetch" for fetch_user.js) failed every time.

Now the mock includes task metadata (name, description), context values, file
paths, and a 1KB content preview per resource. This lets evals validate the
full pipeline (discovery → execution → grading) in CI without requiring a
real model.

Closes #227

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot enabled auto-merge (squash) April 28, 2026 19:14
The mock engine, runner, and graders all directly affect eval execution.
Without this, fixes like #227 wouldn't run the eval workflow on the PR
that introduced them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot merged commit dfec036 into main Apr 28, 2026
7 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 28, 2026
- Add .agent.md coverage to quick-start.mdx, getting-started.mdx,
  docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226
- Add custom-agent, required-skills-demo, rubrics to examples/README.md
- Update mock engine description in docs/INTEGRATION-TESTING.md and
  eval-yaml.mdx to reflect #228 file content echo behavior
- No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer mentioned this pull request Apr 28, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Waza Evaluation CI fails on main — code-explainer mock eval returns 0% pass rate

2 participants