feat: Add eval scaffolding command (waza eval new) by spboyer · Pull Request #94 · microsoft/waza

spboyer · 2026-03-05T02:23:40Z

Closes #83

Working as Linus (Backend Developer)
⚠️ This task was flagged as "needs review" — please have a squad member review before merging.

chlowell · 2026-03-05T02:26:17Z

Should this go in init or new instead of a new verb?

codecov-commenter · 2026-03-05T02:27:54Z

Codecov Report

❌ Patch coverage is 88.29787% with 22 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@a75477e). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
cmd/waza/cmd_eval.go	88.23%	14 Missing and 8 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #94   +/-   ##
=======================================
  Coverage        ?   72.42%           
=======================================
  Files           ?      129           
  Lines           ?    14440           
  Branches        ?        0           
=======================================
  Hits            ?    10458           
  Misses          ?     3210           
  Partials        ?      772

Flag	Coverage Δ
go-implementation	`72.42% <88.29%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new waza eval new <skill-name> CLI subcommand to scaffold an evaluation suite from a skill’s SKILL.md frontmatter, plus docs/tests to make it discoverable and verifiable.

Changes:

Introduce waza eval new command that parses SKILL.md triggers and generates eval.yaml + starter trigger/anti-trigger task YAMLs.
Add unit tests validating scaffold generation, custom output path behavior, and missing SKILL.md error handling.
Update CLI docs + README + command metadata expectations to include the new eval top-level command.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
site/src/content/docs/reference/cli.mdx	Documents the new `waza eval new` command in the CLI reference.
cmd/waza/root.go	Registers the new `eval` command in the root CLI.
cmd/waza/cmd_metadata_test.go	Updates metadata test to expect the new top-level `eval` command.
cmd/waza/cmd_eval_test.go	Adds tests covering scaffold generation, `--output`, and error cases.
cmd/waza/cmd_eval.go	Implements `waza eval new` scaffold generation logic.
README.md	Adds usage docs for `waza eval new`.
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md	Adds squad session log (non-functional change).
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md	Adds squad session log (non-functional change).
.squad/decisions.md	Records squad decisions (non-functional change).

Comments suppressed due to low confidence (2)

cmd/waza/cmd_eval.go:74

When --output is not provided, the default output path is hard-coded to evals/<skill-name>/eval.yaml. If the project uses a custom paths.evals in .waza.yaml, this will write scaffolding into the wrong directory. It would be more consistent with other commands to derive the default evals directory from project config/workspace detection.

	if outputPath == "" {
		outputPath = filepath.Join("evals", skillName, "eval.yaml")
	}
	tasksDir := filepath.Join(filepath.Dir(outputPath), "tasks")

site/src/content/docs/reference/cli.mdx:202

Flag docs list --output without indicating it takes a path argument. For consistency with the README and the actual flag help, consider documenting it as --output <path>.

| Flag | Description |
|------|-------------|
| `--output` | Custom path for generated `eval.yaml` |

cmd/waza/cmd_eval.go

site/src/content/docs/reference/cli.mdx

Copilot · 2026-03-05T02:29:30Z

cmd/waza/cmd_eval_test.go

+	origDir, err := os.Getwd()
+	require.NoError(t, err)
+	require.NoError(t, os.Chdir(dir))
+	t.Cleanup(func() { _ = os.Chdir(origDir) })


Tests manually manage os.Getwd/os.Chdir + cleanup. The rest of the cmd/waza tests typically use t.Chdir(dir) (available in this Go version) which is simpler and avoids missing cleanup paths. Consider switching to t.Chdir for consistency and maintainability.

Suggested change

origDir, err := os.Getwd()

require.NoError(t, err)

require.NoError(t, os.Chdir(dir))

t.Cleanup(func() { _ = os.Chdir(origDir) })

require.NoError(t, t.Chdir(dir))

spboyer

LGTM — Rusty. waza eval new is a clean scaffolding command. Good use of SKILL.md frontmatter parsing for positive/negative trigger generation. extractKeywords with stop words is smart. Tests validate generated YAML through validation.ValidateEvalBytes/ValidateTaskBytes — nice. README + CLI reference updated. Ship it. (Self-authored PR — cannot self-approve via API.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Copilot · 2026-03-05T18:52:06Z

site/src/content/docs/reference/cli.mdx

+- `evals/[skill-name]/eval.yaml`
+- `evals/[skill-name]/tasks/positive-trigger-1.yaml`
+- `evals/[skill-name]/tasks/positive-trigger-2.yaml`
+- `evals/[skill-name]/tasks/negative-trigger-1.yaml`
+
+### Flags
+
+| Flag | Description |
+|------|-------------|
+| `--output` | Custom path for generated `eval.yaml` |


The docs contain two issues that will likely render incorrectly / be confusing: (1) the paths use [skill-name] while the rest of the CLI docs use <skill-name>, and (2) the flags table rows start with ||, which Markdown interprets as an extra empty column. Consider switching to evals/<skill-name>/... and using single leading pipes (| Flag | Description |). Also, it would help to note that when --output is set, task YAMLs are generated under the sibling tasks/ directory next to the provided eval path.

Suggested change

- `evals/[skill-name]/eval.yaml`

- `evals/[skill-name]/tasks/positive-trigger-1.yaml`

- `evals/[skill-name]/tasks/positive-trigger-2.yaml`

- `evals/[skill-name]/tasks/negative-trigger-1.yaml`

### Flags

| Flag | Description |

|------|-------------|

| `--output` | Custom path for generated `eval.yaml` |

- `evals/<skill-name>/eval.yaml`

- `evals/<skill-name>/tasks/positive-trigger-1.yaml`

- `evals/<skill-name>/tasks/positive-trigger-2.yaml`

- `evals/<skill-name>/tasks/negative-trigger-1.yaml`

### Flags

| Flag | Description |

|------|-------------|

| `--output` | Custom path for generated `eval.yaml`; task YAMLs are created in a sibling `tasks/` directory next to this file. |

README.md

cmd/waza/cmd_eval.go

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer requested review from chlowell and richardpark-msft as code owners March 5, 2026 02:23

Copilot AI review requested due to automatic review settings March 5, 2026 02:23

spboyer self-assigned this Mar 5, 2026

github-actions bot enabled auto-merge (squash) March 5, 2026 02:24

Copilot started reviewing on behalf of spboyer March 5, 2026 02:24 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

spboyer commented Mar 5, 2026

View reviewed changes

spboyer force-pushed the squad/83-eval-new branch from 79893e4 to 02fb261 Compare March 5, 2026 17:12

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #94

f016c5f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 17:38

spboyer and others added 2 commits March 5, 2026 12:46

feat: add eval scaffolding command microsoft#83

3f612b1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: address review feedback on PR microsoft#94

c00d9c3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/83-eval-new branch from f016c5f to c00d9c3 Compare March 5, 2026 17:46

Copilot AI reviewed Mar 5, 2026

View reviewed changes

fix: address PR microsoft#94 eval scaffold review comments

c28fc26

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add eval scaffolding command (waza eval new)#94

feat: Add eval scaffolding command (waza eval new)#94
spboyer wants to merge 3 commits intomicrosoft:mainfrom
spboyer:squad/83-eval-new

spboyer commented Mar 5, 2026

Uh oh!

chlowell commented Mar 5, 2026

Uh oh!

codecov-commenter commented Mar 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

spboyer left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

spboyer commented Mar 5, 2026

Uh oh!

chlowell commented Mar 5, 2026

Uh oh!

codecov-commenter commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Mar 5, 2026 •

edited

Loading