fix: mock engine echoes file content for CI evals (#227)#228
Merged
Conversation
…work in CI The waza-eval.yml CI job runs examples/code-explainer/eval.yaml with the mock engine. The mock previously returned only "Mock response for: <prompt>" + a file count, so realistic _output_contains expectations against file contents (e.g., "async", "fetch" for fetch_user.js) failed every time. Now the mock includes task metadata (name, description), context values, file paths, and a 1KB content preview per resource. This lets evals validate the full pipeline (discovery → execution → grading) in CI without requiring a real model. Closes #227 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The mock engine, runner, and graders all directly affect eval execution. Without this, fixes like #227 wouldn't run the eval workflow on the PR that introduced them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
github-actions Bot
pushed a commit
that referenced
this pull request
Apr 28, 2026
- Add .agent.md coverage to quick-start.mdx, getting-started.mdx, docs/GETTING-STARTED.md, docs/GUIDE.md, docs/TUTORIAL.md for #226 - Add custom-agent, required-skills-demo, rubrics to examples/README.md - Update mock engine description in docs/INTEGRATION-TESTING.md and eval-yaml.mdx to reflect #228 file content echo behavior - No stale BenchmarkSpec/TestRunner refs found (#222 rename was thorough) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #227
Summary
The
Run Waza EvaluationCI job runsexamples/code-explainer/eval.yamlwith the mock engine. The mock previously returned onlyMock response for: <prompt>plus a file count, so_output_containsexpectations against file contents (e.g., "async", "fetch", "recursive") failed every time — 0/4 tasks passed.What Changed
internal/execution/engine.go— AddedTaskNameandTaskDescriptionfields toExecutionRequestso task metadata flows to engines.internal/execution/mock.go— Mock response now echoes: task name/description, context metadata key/values, file paths, and up to 1KB of file content per resource.internal/orchestration/runner.go— Populates the newTaskName/TaskDescriptionfields fromTestCase.internal/execution/mock_engine_test.go— Updated existing test for new format; addedTestMockEngine_Execute_IncludesResourceContentandTestMockEngine_Execute_TruncatesLargeContent.Testing
go test ./...)go vet ./...cleanexamples/code-explainer/eval.yamlnow passes 4/4 tasks (was 0/4)