fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount by Copilot · Pull Request #32633 · github/gh-aw

Copilot · 2026-05-16T14:24:36Z

key_findings for failed workflows always reported "failed with 0 error(s)" when metrics.ErrorCount was 0, even when the errors slice was populated — the two sources were computed independently and could diverge.

Changes

audit_report_analysis.go: In generateFindings, derive the error count from len(errors) (the ground-truth slice included in audit output) instead of metrics.ErrorCount. Falls back to metrics.ErrorCount only when errors is empty (preserving existing behavior for runs where log details were unavailable but a summary count exists).

// Before
desc = fmt.Sprintf("Workflow '%s' failed with %d error(s)", run.WorkflowName, metrics.ErrorCount)

// After
errorCount := len(errors)
if errorCount == 0 {
    errorCount = metrics.ErrorCount
}
desc = fmt.Sprintf("Workflow '%s' failed with %d error(s)", run.WorkflowName, errorCount)

audit_report_test.go: Adds a regression test for the exact bug scenario — metrics.ErrorCount == 0 with a non-empty errors slice — asserting the description reflects the actual count.

When a workflow fails with errors in the errors slice, the description previously used metrics.ErrorCount (which could be 0 even when errors were present), producing the contradictory "failed with 0 error(s)". Fix: prefer len(errors) as the ground-truth count; fall back to metrics.ErrorCount only when no individual error details were collected. Adds a regression test covering the exact scenario from the bug report (metrics.ErrorCount==0 but errors slice has 2 entries). Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Fixes a bug where the audit report's key findings would show "failed with 0 error(s)" when metrics.ErrorCount was stale/zero but the errors slice contained actual error entries. The fix prefers the length of the errors slice as ground truth, falling back to metrics.ErrorCount only when no error details are available.

Changes:

Derive error count from len(errors) in generateFindings, falling back to metrics.ErrorCount when empty.
Add regression test covering the divergence scenario (zero metrics count, populated errors slice).

Show a summary per file

File	Description
pkg/cli/audit_report_analysis.go	Uses `len(errors)` as the primary source for the error count in the failure finding description.
pkg/cli/audit_report_test.go	Adds a regression test asserting the description reflects the actual errors slice size.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — this is a bug fix with a regression test, the classic case for both skills.

Key Themes

Root cause addressed correctly: metrics.ErrorCount and len(errors) are computed independently and can diverge; using the slice length as ground truth (with a fallback) is the right fix.
Regression test present: The new test case faithfully reproduces the exact bug scenario. The existing fallback test (metrics.ErrorCount: 1, empty errors) is also preserved and still passes.
Minor gap: One divergence sub-case is untested — when both len(errors) > 0 and metrics.ErrorCount > 0 but they differ. See inline comment.

Positive Highlights

✅ PR description includes a clear before/after snippet — easy to understand at a glance
✅ The fallback preserves existing behaviour for runs where log details were unavailable
✅ Test assertions use both Contains and NotContains to pin the exact symptom, not just "something changed"
✅ Minimal diff — only the line that needed changing, plus the regression test

Verdict

Approving. The one missing test case (see inline) is a nice-to-have rather than a blocker.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 3.9M

github-actions · 2026-05-16T14:46:57Z

+				assert.NotContains(t, finding.Description, "0 error(s)",
+					"Description must not show 0 errors when the errors slice is non-empty")
+			},
+		},


[/tdd] The new test covers the key scenario (ErrorCount == 0, non-empty errors), but there is a third case not yet covered: len(errors) > 0 and metrics.ErrorCount > 0 with different values (e.g. 3 actual errors vs a stale ErrorCount: 5). The fix always prefers len(errors), but without a test the priority rule is invisible to future maintainers.

Consider adding:

{ name: "len(errors) beats non-zero metrics.ErrorCount when both present", metrics: MetricsData{ErrorCount: 5}, errors: []ErrorInfo{ {Type: "step_failure", Message: "err 1"}, {Type: "step_failure", Message: "err 2"}, {Type: "step_failure", Message: "err 3"}, }, // assert description contains "3 error(s)" and not "5 error(s)" }

This locks down the priority contract for the nonzero-vs-different-nonzero divergence case too.

github-actions · 2026-05-16T14:47:46Z

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

✅ Excellent test quality

Metric	Value
New/modified tests analyzed	1
✅ Design tests (behavioral contracts)	1 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	1 (100%)
Duplicate test clusters	0
Test inflation detected	Yes (25 test lines / 9 production lines = 2.8×)
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`TestGenerateFindings` — row: "failed workflow uses actual error count not stale metrics.ErrorCount"	`pkg/cli/audit_report_test.go:185`	✅ Design	None — excellent regression coverage

Analysis

This PR adds a single new table row to the existing TestGenerateFindings table-driven test. The new case is a direct regression test for the bug being fixed:

Sets metrics.ErrorCount = 0 (stale/wrong) but populates the errors slice with 2 real entries
Asserts that the finding description contains "2 error(s)" (from len(errors))
Asserts it does not contain "0 error(s)" (the old broken output)

This perfectly covers the behavioral contract: "when errors are present, the error count in the findings description must reflect the actual number of errors, not a potentially stale metrics field."

Checklist:

✅ //go:build !integration build tag present on line 1
✅ No mock libraries used
✅ All assertions include descriptive messages
✅ Table-driven test pattern (preferred codebase style)
✅ Both positive (Contains "2 error(s)") and negative (NotContains "0 error(s)") assertions

Score note: The 10-point inflation penalty applies mechanically (25 test lines added vs. 9 production lines = 2.8× ratio), but this is expected and justified — regression tests for small fixes routinely require more setup code than the fix itself.

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). The added test is a high-quality regression test that directly enforces the behavioral contract fixed by this PR.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

Language Support

Tests analyzed:

🐹 Go (*_test.go): 1 test — unit (//go:build !integration)
🟨 JavaScript (*.test.cjs, *.test.js): 0 tests

References: §25964660443

🧪 Test quality analysis by Test Quality Sentinel · ● 8M · ◷

Initial plan

0af724b

Copilot AI assigned Copilot and pelikhan May 16, 2026

Copilot started work on behalf of pelikhan May 16, 2026 14:29 View session

Copilot AI linked an issue May 16, 2026 that may be closed by this pull request

[cli-tools-test] bug: audit key_finding incorrectly reports "0 error(s)" for failed workflows with errors #32579

Closed

Copilot AI changed the title ~~[WIP] Fix audit key_finding report for failed workflows~~ fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount May 16, 2026

Copilot finished work on behalf of pelikhan May 16, 2026 14:38

Copilot AI requested a review from pelikhan May 16, 2026 14:38

pelikhan marked this pull request as ready for review May 16, 2026 14:39

Copilot AI review requested due to automatic review settings May 16, 2026 14:39

Copilot started reviewing on behalf of pelikhan May 16, 2026 14:40 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

pelikhan merged commit 0ad5bdd into main May 16, 2026
42 of 46 checks passed

pelikhan deleted the copilot/cli-tools-test-fix-audit-key-finding branch May 16, 2026 14:41

github-actions Bot mentioned this pull request May 16, 2026

[aw] No-Op Runs #32279

Open

github-actions Bot approved these changes May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount#32633

fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount#32633
pelikhan merged 2 commits into
mainfrom
copilot/cli-tools-test-fix-audit-key-finding

Copilot AI commented May 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Verdict

Uh oh!

github-actions Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 16, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

Test Classification Details

Analysis

Verdict

Language Support

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 16, 2026 •

edited

Loading