fix: mock engine echoes file content for CI evals (#227) by spboyer · Pull Request #228 · microsoft/waza

spboyer · 2026-04-28T19:13:43Z

Closes #227

Summary

The Run Waza Evaluation CI job runs examples/code-explainer/eval.yaml with the mock engine. The mock previously returned only Mock response for: <prompt> plus a file count, so _output_contains expectations against file contents (e.g., "async", "fetch", "recursive") failed every time — 0/4 tasks passed.

What Changed

internal/execution/engine.go — Added TaskName and TaskDescription fields to ExecutionRequest so task metadata flows to engines.
internal/execution/mock.go — Mock response now echoes: task name/description, context metadata key/values, file paths, and up to 1KB of file content per resource.
internal/orchestration/runner.go — Populates the new TaskName/TaskDescription fields from TestCase.
internal/execution/mock_engine_test.go — Updated existing test for new format; added TestMockEngine_Execute_IncludesResourceContent and TestMockEngine_Execute_TruncatesLargeContent.

Testing

All unit tests pass (go test ./...)
go vet ./... clean
examples/code-explainer/eval.yaml now passes 4/4 tasks (was 0/4)
No changes to example files — only the mock engine was modified

…work in CI The waza-eval.yml CI job runs examples/code-explainer/eval.yaml with the mock engine. The mock previously returned only "Mock response for: <prompt>" + a file count, so realistic _output_contains expectations against file contents (e.g., "async", "fetch" for fetch_user.js) failed every time. Now the mock includes task metadata (name, description), context values, file paths, and a 1KB content preview per resource. This lets evals validate the full pipeline (discovery → execution → grading) in CI without requiring a real model. Closes #227 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The mock engine, runner, and graders all directly affect eval execution. Without this, fixes like #227 wouldn't run the eval workflow on the PR that introduced them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add .agent.md coverage to quick-start.mdx, getting-started.mdx, docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226 - Add custom-agent, required-skills-demo, rubrics to examples/README.md - Update mock engine description in docs/INTEGRATION-TESTING.md and eval-yaml.mdx to reflect #228 file content echo behavior - No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI added 2 commits April 28, 2026 15:12

docs: update linus history and decision for mock engine change

7a74771

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot enabled auto-merge (squash) April 28, 2026 19:14

github-actions Bot merged commit dfec036 into main Apr 28, 2026
7 checks passed

spboyer mentioned this pull request Apr 28, 2026

docs: cross-reference audit for recent renames and feature additions #230

Merged

spboyer mentioned this pull request Apr 28, 2026

Release v0.31.0 #231

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mock engine echoes file content for CI evals (#227)#228

fix: mock engine echoes file content for CI evals (#227)#228
github-actions[bot] merged 3 commits into
mainfrom
squad/227-fix-mock-eval-ci

spboyer commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spboyer commented Apr 28, 2026

Summary

What Changed

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants