feat: Add eval coverage grid generator by spboyer · Pull Request #92 · microsoft/waza

spboyer · 2026-03-05T01:56:26Z

Closes #82

codecov-commenter · 2026-03-05T02:00:11Z

Codecov Report

❌ Patch coverage is 66.11570% with 82 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@a75477e). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
cmd/waza/cmd_coverage.go	65.97%	62 Missing and 20 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #92   +/-   ##
=======================================
  Coverage        ?   72.11%           
=======================================
  Files           ?      129           
  Lines           ?    14494           
  Branches        ?        0           
=======================================
  Hits            ?    10452           
  Misses          ?     3258           
  Partials        ?      784

Flag	Coverage Δ
go-implementation	`72.11% <66.11%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new waza coverage CLI command to generate an eval-coverage “grid” across discovered skills, with supporting docs and tests.

Changes:

Introduces waza coverage [root] with text, markdown, and json output.
Implements skill/eval discovery and a coverage classification (none/partial/full) based on tasks + grader types.
Updates CLI reference docs and README, and adds unit tests for report building + markdown rendering.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
site/src/content/docs/reference/cli.mdx	Documents the new `waza coverage` command, flags, and examples.
cmd/waza/root.go	Registers the new `coverage` subcommand on the root CLI.
cmd/waza/cmd_coverage.go	Implements discovery, report generation, and output renderers (text/markdown/json).
cmd/waza/cmd_coverage_test.go	Adds unit tests for report classification, markdown output, and command registration.
README.md	Adds `waza coverage` usage example and CLI reference entry.
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md	Adds team process log (non-functional).
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md	Adds team process log (non-functional).
.squad/decisions.md	Records team workflow/design decisions (non-functional).

Copilot · 2026-03-05T02:01:33Z

cmd/waza/cmd_coverage.go

+	for _, evalPath := range evalPaths {
+		spec, parseErr := parseEvalSpec(evalPath)
+		if parseErr != nil {
+			continue
+		}


buildCoverageReport silently ignores eval.yaml parsing failures (parseEvalSpec errors are dropped with continue). This can misreport skills as uncovered/partial when the eval file exists but is malformed or uses unsupported fields. Consider returning an error (or at least collecting these parse errors and surfacing a warning/summary to stderr and/or in the JSON report) so coverage results remain trustworthy.

Copilot · 2026-03-05T02:01:33Z

cmd/waza/cmd_coverage.go

+			if d.Name() == "eval.yaml" {
+				absPath, _ := filepath.Abs(path)
+				candidates[absPath] = struct{}{}
+			}
+			return nil


This command only discovers eval.yaml files, but other parts of the repo treat both eval.yaml and eval.yml as valid (e.g., init path detection). To avoid undercounting coverage, include eval.yml in discovery (both the directory walk and the per-skill candidate list).

Copilot · 2026-03-05T02:01:33Z

cmd/waza/cmd_coverage.go

+	cmd.Flags().StringVarP(&outputFormat, "format", "f", "text", "Output format: text, markdown, or json")
+	cmd.Flags().StringArrayVar(&discoverPaths, "discover", nil, "Additional directories to scan for skills/evals (repeatable)")
+	return cmd


--discover is defined here as a repeatable string array of additional search roots, but waza run --discover already exists as a boolean flag with different semantics. Reusing the same flag name with a different type/meaning across commands is likely to confuse users and makes CLI behavior less consistent; consider renaming this flag (e.g., --path/--search-root) or aligning semantics with the existing --discover behavior.

spboyer

Verified by Rusty (Opus 4.6) — LGTM ✅

Clean eval coverage grid generator:

New \waza coverage\ command with text/markdown/json output
Smart skill/eval discovery with deduplication, hidden dir skipping
Coverage classification (Full/Partial/None) is conservative and correct
Tests cover no-eval, partial/full, markdown rendering, root command integration
Docs updated: README, CLI reference
CI green on ubuntu + windows + lint

Minor: \�valSpecLite.Tasks\ is []string\ while real eval YAML has structured task objects — means task count defaults to 0, showing Partial instead of Full for real evals. Conservative for a reporting tool. Worth fixing to []any\ in a follow-up.

Note: Can't self-approve via API. Setting auto-merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Copilot · 2026-03-05T18:47:30Z

README.md

+| Flag | Short | Description |
+|------|-------|-------------|
+| `--format <fmt>` | `-f` | Output format: `text`, `markdown`, or `json` (default: `text`) |
+| `--discover <dir>` | | Additional directory to scan for skills/evals (repeatable) |


The README documents a --discover flag, but the implementation registers --path (repeatable). Please align the docs with the actual CLI (either rename the flag in code or update the README entry to --path <dir>).

Suggested change

| `--discover <dir>` | | Additional directory to scan for skills/evals (repeatable) |

| `--path <dir>` | | Additional directory to scan for skills/evals (repeatable) |

Copilot · 2026-03-05T18:47:30Z

cmd/waza/cmd_coverage.go

+			absPath, _ := filepath.Abs(path)
+			if _, ok := seenPaths[absPath]; ok {
+				return nil
+			}
+			seenPaths[absPath] = struct{}{}


These filepath.Abs(...) calls ignore errors (_ :=). Since discovery correctness depends on the resulting canonical path, handle the error (and either return it or fall back to filepath.Clean(path)), or remove the Abs call if inputs are already absolute (they appear to be, given WalkDir roots are built from absRoot).

Copilot · 2026-03-05T18:47:31Z

cmd/waza/cmd_coverage.go

+				absPath, _ := filepath.Abs(path)
+				candidates[absPath] = struct{}{}


These filepath.Abs(...) calls ignore errors (_ :=). Since discovery correctness depends on the resulting canonical path, handle the error (and either return it or fall back to filepath.Clean(path)), or remove the Abs call if inputs are already absolute (they appear to be, given WalkDir roots are built from absRoot).

Copilot · 2026-03-05T18:47:31Z

cmd/waza/cmd_coverage.go

+				absPath, _ := filepath.Abs(p)
+				candidates[absPath] = struct{}{}


These filepath.Abs(...) calls ignore errors (_ :=). Since discovery correctness depends on the resulting canonical path, handle the error (and either return it or fall back to filepath.Clean(path)), or remove the Abs call if inputs are already absolute (they appear to be, given WalkDir roots are built from absRoot).

Copilot · 2026-03-05T18:47:31Z

cmd/waza/cmd_coverage.go

+		}
+		err := filepath.WalkDir(sr, func(path string, d fs.DirEntry, err error) error {
+			if err != nil {
+				return nil


The WalkDir callbacks swallow filesystem traversal errors by returning nil. This can silently produce incomplete/incorrect coverage results (e.g., permission errors causing missed skills/evals) with no signal to the user. Prefer returning the err (possibly wrapped with the path) or collecting and reporting traversal failures similarly to parseFailures.

Suggested change

return nil

return fmt.Errorf("error walking %s: %w", path, err)

Copilot · 2026-03-05T18:47:31Z

cmd/waza/cmd_coverage.go

+		}
+		if err := filepath.WalkDir(evalRoot, func(path string, d fs.DirEntry, err error) error {
+			if err != nil {
+				return nil


The WalkDir callbacks swallow filesystem traversal errors by returning nil. This can silently produce incomplete/incorrect coverage results (e.g., permission errors causing missed skills/evals) with no signal to the user. Prefer returning the err (possibly wrapped with the path) or collecting and reporting traversal failures similarly to parseFailures.

Suggested change

return nil

return fmt.Errorf("error walking %s: %w", path, err)

Copilot · 2026-03-05T18:47:32Z

cmd/waza/cmd_coverage.go

+			switch outputFormat {
+			case "text":
+				renderCoverageText(cmd.OutOrStdout(), report)
+			case "markdown":
+				renderCoverageMarkdown(cmd.OutOrStdout(), report)
+			case "json":
+				if err := renderCoverageJSON(cmd.OutOrStdout(), report); err != nil {
+					return err
+				}
+			default:
+				return fmt.Errorf("unsupported format %q: must be text, markdown, or json", outputFormat)
+			}


There are tests for report building and markdown rendering, but no tests for (a) JSON rendering output shape/indentation and (b) unsupported --format error behavior. Adding targeted tests for these paths would help prevent regressions in this new user-facing command.

Copilot · 2026-03-05T18:47:32Z

cmd/waza/cmd_coverage.go

+func renderCoverageJSON(w io.Writer, report *coverageReport) error {
+	enc := json.NewEncoder(w)
+	enc.SetIndent("", "  ")
+	return enc.Encode(report)
+}


There are tests for report building and markdown rendering, but no tests for (a) JSON rendering output shape/indentation and (b) unsupported --format error behavior. Adding targeted tests for these paths would help prevent regressions in this new user-facing command.

Copilot · 2026-03-05T18:47:32Z

site/src/content/docs/reference/cli.mdx

+
+| Flag | Description |
+|------|-------------|
+| `--format` | Output format: `text` (default), `markdown`, `json` |


The CLI docs list --format but omit the -f shorthand that’s supported by the command. Consider updating the flag table to include the short form for consistency with other CLI docs and to match the actual UX.

Suggested change

| `--format` | Output format: `text` (default), `markdown`, `json` |

| `-f, --format` | Output format: `text` (default), `markdown`, `json` |

Copilot AI review requested due to automatic review settings March 5, 2026 01:56

spboyer requested review from chlowell and richardpark-msft as code owners March 5, 2026 01:56

spboyer self-assigned this Mar 5, 2026

github-actions bot enabled auto-merge (squash) March 5, 2026 01:57

Copilot started reviewing on behalf of spboyer March 5, 2026 01:57 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

spboyer commented Mar 5, 2026

View reviewed changes

spboyer force-pushed the squad/82-eval-coverage-grid branch from 40b1f4c to e1365f2 Compare March 5, 2026 17:12

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #92

a6ca52d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 17:42

spboyer and others added 2 commits March 5, 2026 12:46

feat: add eval coverage grid generator microsoft#82

c6ab59b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: address review feedback on PR microsoft#92

f3aa0c4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/82-eval-coverage-grid branch from a6ca52d to f3aa0c4 Compare March 5, 2026 17:46

Copilot AI reviewed Mar 5, 2026

View reviewed changes

	\| `--discover <dir>` \| \| Additional directory to scan for skills/evals (repeatable) \|
	\| `--path <dir>` \| \| Additional directory to scan for skills/evals (repeatable) \|

		absPath, _ := filepath.Abs(path)
		candidates[absPath] = struct{}{}

		absPath, _ := filepath.Abs(p)
		candidates[absPath] = struct{}{}

	return nil
	return fmt.Errorf("error walking %s: %w", path, err)

	\| `--format` \| Output format: `text` (default), `markdown`, `json` \|
	\| `-f, --format` \| Output format: `text` (default), `markdown`, `json` \|

Conversation

spboyer commented Mar 5, 2026

Uh oh!

codecov-commenter commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Mar 5, 2026 •

edited

Loading