feat: Add eval scaffolding command (waza eval new)#94
feat: Add eval scaffolding command (waza eval new)#94spboyer wants to merge 3 commits intomicrosoft:mainfrom
Conversation
|
Should this go in |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #94 +/- ##
=======================================
Coverage ? 72.42%
=======================================
Files ? 129
Lines ? 14440
Branches ? 0
=======================================
Hits ? 10458
Misses ? 3210
Partials ? 772
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a new waza eval new <skill-name> CLI subcommand to scaffold an evaluation suite from a skill’s SKILL.md frontmatter, plus docs/tests to make it discoverable and verifiable.
Changes:
- Introduce
waza eval newcommand that parsesSKILL.mdtriggers and generateseval.yaml+ starter trigger/anti-trigger task YAMLs. - Add unit tests validating scaffold generation, custom output path behavior, and missing
SKILL.mderror handling. - Update CLI docs + README + command metadata expectations to include the new
evaltop-level command.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| site/src/content/docs/reference/cli.mdx | Documents the new waza eval new command in the CLI reference. |
| cmd/waza/root.go | Registers the new eval command in the root CLI. |
| cmd/waza/cmd_metadata_test.go | Updates metadata test to expect the new top-level eval command. |
| cmd/waza/cmd_eval_test.go | Adds tests covering scaffold generation, --output, and error cases. |
| cmd/waza/cmd_eval.go | Implements waza eval new scaffold generation logic. |
| README.md | Adds usage docs for waza eval new. |
| .squad/log/2026-03-05T00-36-issue-assignment-pipeline.md | Adds squad session log (non-functional change). |
| .squad/log/2026-03-05T00-26-rusty-token-diff-design.md | Adds squad session log (non-functional change). |
| .squad/decisions.md | Records squad decisions (non-functional change). |
Comments suppressed due to low confidence (2)
cmd/waza/cmd_eval.go:74
- When
--outputis not provided, the default output path is hard-coded toevals/<skill-name>/eval.yaml. If the project uses a custompaths.evalsin.waza.yaml, this will write scaffolding into the wrong directory. It would be more consistent with other commands to derive the default evals directory from project config/workspace detection.
if outputPath == "" {
outputPath = filepath.Join("evals", skillName, "eval.yaml")
}
tasksDir := filepath.Join(filepath.Dir(outputPath), "tasks")
site/src/content/docs/reference/cli.mdx:202
- Flag docs list
--outputwithout indicating it takes a path argument. For consistency with the README and the actual flag help, consider documenting it as--output <path>.
| Flag | Description |
|------|-------------|
| `--output` | Custom path for generated `eval.yaml` |
cmd/waza/cmd_eval_test.go
Outdated
| origDir, err := os.Getwd() | ||
| require.NoError(t, err) | ||
| require.NoError(t, os.Chdir(dir)) | ||
| t.Cleanup(func() { _ = os.Chdir(origDir) }) |
There was a problem hiding this comment.
Tests manually manage os.Getwd/os.Chdir + cleanup. The rest of the cmd/waza tests typically use t.Chdir(dir) (available in this Go version) which is simpler and avoids missing cleanup paths. Consider switching to t.Chdir for consistency and maintainability.
| origDir, err := os.Getwd() | |
| require.NoError(t, err) | |
| require.NoError(t, os.Chdir(dir)) | |
| t.Cleanup(func() { _ = os.Chdir(origDir) }) | |
| require.NoError(t, t.Chdir(dir)) |
spboyer
left a comment
There was a problem hiding this comment.
LGTM — Rusty. waza eval new is a clean scaffolding command. Good use of SKILL.md frontmatter parsing for positive/negative trigger generation. extractKeywords with stop words is smart. Tests validate generated YAML through validation.ValidateEvalBytes/ValidateTaskBytes — nice. README + CLI reference updated. Ship it. (Self-authored PR — cannot self-approve via API.)
79893e4 to
02fb261
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f016c5f to
c00d9c3
Compare
| - `evals/[skill-name]/eval.yaml` | ||
| - `evals/[skill-name]/tasks/positive-trigger-1.yaml` | ||
| - `evals/[skill-name]/tasks/positive-trigger-2.yaml` | ||
| - `evals/[skill-name]/tasks/negative-trigger-1.yaml` | ||
|
|
||
| ### Flags | ||
|
|
||
| | Flag | Description | | ||
| |------|-------------| | ||
| | `--output` | Custom path for generated `eval.yaml` | |
There was a problem hiding this comment.
The docs contain two issues that will likely render incorrectly / be confusing: (1) the paths use [skill-name] while the rest of the CLI docs use <skill-name>, and (2) the flags table rows start with ||, which Markdown interprets as an extra empty column. Consider switching to evals/<skill-name>/... and using single leading pipes (| Flag | Description |). Also, it would help to note that when --output is set, task YAMLs are generated under the sibling tasks/ directory next to the provided eval path.
| - `evals/[skill-name]/eval.yaml` | |
| - `evals/[skill-name]/tasks/positive-trigger-1.yaml` | |
| - `evals/[skill-name]/tasks/positive-trigger-2.yaml` | |
| - `evals/[skill-name]/tasks/negative-trigger-1.yaml` | |
| ### Flags | |
| | Flag | Description | | |
| |------|-------------| | |
| | `--output` | Custom path for generated `eval.yaml` | | |
| - `evals/<skill-name>/eval.yaml` | |
| - `evals/<skill-name>/tasks/positive-trigger-1.yaml` | |
| - `evals/<skill-name>/tasks/positive-trigger-2.yaml` | |
| - `evals/<skill-name>/tasks/negative-trigger-1.yaml` | |
| ### Flags | |
| | Flag | Description | | |
| |------|-------------| | |
| | `--output` | Custom path for generated `eval.yaml`; task YAMLs are created in a sibling `tasks/` directory next to this file. | |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes #83
Working as Linus (Backend Developer)
⚠️ This task was flagged as "needs review" — please have a squad member review before merging.