fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount#32633
Conversation
When a workflow fails with errors in the errors slice, the description previously used metrics.ErrorCount (which could be 0 even when errors were present), producing the contradictory "failed with 0 error(s)". Fix: prefer len(errors) as the ground-truth count; fall back to metrics.ErrorCount only when no individual error details were collected. Adds a regression test covering the exact scenario from the bug report (metrics.ErrorCount==0 but errors slice has 2 entries). Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a bug where the audit report's key findings would show "failed with 0 error(s)" when metrics.ErrorCount was stale/zero but the errors slice contained actual error entries. The fix prefers the length of the errors slice as ground truth, falling back to metrics.ErrorCount only when no error details are available.
Changes:
- Derive error count from
len(errors)ingenerateFindings, falling back tometrics.ErrorCountwhen empty. - Add regression test covering the divergence scenario (zero metrics count, populated errors slice).
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/audit_report_analysis.go | Uses len(errors) as the primary source for the error count in the failure finding description. |
| pkg/cli/audit_report_test.go | Adds a regression test asserting the description reflects the actual errors slice size. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — this is a bug fix with a regression test, the classic case for both skills.
Key Themes
- Root cause addressed correctly:
metrics.ErrorCountandlen(errors)are computed independently and can diverge; using the slice length as ground truth (with a fallback) is the right fix. - Regression test present: The new test case faithfully reproduces the exact bug scenario. The existing fallback test (
metrics.ErrorCount: 1, empty errors) is also preserved and still passes. - Minor gap: One divergence sub-case is untested — when both
len(errors) > 0andmetrics.ErrorCount > 0but they differ. See inline comment.
Positive Highlights
- ✅ PR description includes a clear before/after snippet — easy to understand at a glance
- ✅ The fallback preserves existing behaviour for runs where log details were unavailable
- ✅ Test assertions use both
ContainsandNotContainsto pin the exact symptom, not just "something changed" - ✅ Minimal diff — only the line that needed changing, plus the regression test
Verdict
Approving. The one missing test case (see inline) is a nice-to-have rather than a blocker.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 3.9M
| assert.NotContains(t, finding.Description, "0 error(s)", | ||
| "Description must not show 0 errors when the errors slice is non-empty") | ||
| }, | ||
| }, |
There was a problem hiding this comment.
[/tdd] The new test covers the key scenario (ErrorCount == 0, non-empty errors), but there is a third case not yet covered: len(errors) > 0 and metrics.ErrorCount > 0 with different values (e.g. 3 actual errors vs a stale ErrorCount: 5). The fix always prefers len(errors), but without a test the priority rule is invisible to future maintainers.
Consider adding:
{
name: "len(errors) beats non-zero metrics.ErrorCount when both present",
metrics: MetricsData{ErrorCount: 5},
errors: []ErrorInfo{
{Type: "step_failure", Message: "err 1"},
{Type: "step_failure", Message: "err 2"},
{Type: "step_failure", Message: "err 3"},
},
// assert description contains "3 error(s)" and not "5 error(s)"
}This locks down the priority contract for the nonzero-vs-different-nonzero divergence case too.
🧪 Test Quality Sentinel ReportTest Quality Score: 90/100✅ Excellent test quality
Test Classification Details
AnalysisThis PR adds a single new table row to the existing
This perfectly covers the behavioral contract: "when errors are present, the error count in the findings description must reflect the actual number of errors, not a potentially stale metrics field." Checklist:
Score note: The 10-point inflation penalty applies mechanically (25 test lines added vs. 9 production lines = 2.8× ratio), but this is expected and justified — regression tests for small fixes routinely require more setup code than the fix itself. Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. Language SupportTests analyzed:
References: §25964660443
|
key_findingsfor failed workflows always reported"failed with 0 error(s)"whenmetrics.ErrorCountwas 0, even when theerrorsslice was populated — the two sources were computed independently and could diverge.Changes
audit_report_analysis.go: IngenerateFindings, derive the error count fromlen(errors)(the ground-truth slice included in audit output) instead ofmetrics.ErrorCount. Falls back tometrics.ErrorCountonly whenerrorsis empty (preserving existing behavior for runs where log details were unavailable but a summary count exists).audit_report_test.go: Adds a regression test for the exact bug scenario —metrics.ErrorCount == 0with a non-emptyerrorsslice — asserting the description reflects the actual count.