Skip to content

fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount#32633

Merged
pelikhan merged 2 commits into
mainfrom
copilot/cli-tools-test-fix-audit-key-finding
May 16, 2026
Merged

fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount#32633
pelikhan merged 2 commits into
mainfrom
copilot/cli-tools-test-fix-audit-key-finding

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 16, 2026

key_findings for failed workflows always reported "failed with 0 error(s)" when metrics.ErrorCount was 0, even when the errors slice was populated — the two sources were computed independently and could diverge.

Changes

  • audit_report_analysis.go: In generateFindings, derive the error count from len(errors) (the ground-truth slice included in audit output) instead of metrics.ErrorCount. Falls back to metrics.ErrorCount only when errors is empty (preserving existing behavior for runs where log details were unavailable but a summary count exists).
// Before
desc = fmt.Sprintf("Workflow '%s' failed with %d error(s)", run.WorkflowName, metrics.ErrorCount)

// After
errorCount := len(errors)
if errorCount == 0 {
    errorCount = metrics.ErrorCount
}
desc = fmt.Sprintf("Workflow '%s' failed with %d error(s)", run.WorkflowName, errorCount)
  • audit_report_test.go: Adds a regression test for the exact bug scenario — metrics.ErrorCount == 0 with a non-empty errors slice — asserting the description reflects the actual count.

When a workflow fails with errors in the errors slice, the description
previously used metrics.ErrorCount (which could be 0 even when errors
were present), producing the contradictory "failed with 0 error(s)".

Fix: prefer len(errors) as the ground-truth count; fall back to
metrics.ErrorCount only when no individual error details were collected.

Adds a regression test covering the exact scenario from the bug report
(metrics.ErrorCount==0 but errors slice has 2 entries).

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix audit key_finding report for failed workflows fix(audit): key_findings error count uses len(errors) not metrics.ErrorCount May 16, 2026
Copilot AI requested a review from pelikhan May 16, 2026 14:38
@pelikhan pelikhan marked this pull request as ready for review May 16, 2026 14:39
Copilot AI review requested due to automatic review settings May 16, 2026 14:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a bug where the audit report's key findings would show "failed with 0 error(s)" when metrics.ErrorCount was stale/zero but the errors slice contained actual error entries. The fix prefers the length of the errors slice as ground truth, falling back to metrics.ErrorCount only when no error details are available.

Changes:

  • Derive error count from len(errors) in generateFindings, falling back to metrics.ErrorCount when empty.
  • Add regression test covering the divergence scenario (zero metrics count, populated errors slice).
Show a summary per file
File Description
pkg/cli/audit_report_analysis.go Uses len(errors) as the primary source for the error count in the failure finding description.
pkg/cli/audit_report_test.go Adds a regression test asserting the description reflects the actual errors slice size.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@pelikhan pelikhan merged commit 0ad5bdd into main May 16, 2026
42 of 46 checks passed
@pelikhan pelikhan deleted the copilot/cli-tools-test-fix-audit-key-finding branch May 16, 2026 14:41
@github-actions github-actions Bot mentioned this pull request May 16, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — this is a bug fix with a regression test, the classic case for both skills.

Key Themes

  • Root cause addressed correctly: metrics.ErrorCount and len(errors) are computed independently and can diverge; using the slice length as ground truth (with a fallback) is the right fix.
  • Regression test present: The new test case faithfully reproduces the exact bug scenario. The existing fallback test (metrics.ErrorCount: 1, empty errors) is also preserved and still passes.
  • Minor gap: One divergence sub-case is untested — when both len(errors) > 0 and metrics.ErrorCount > 0 but they differ. See inline comment.

Positive Highlights

  • ✅ PR description includes a clear before/after snippet — easy to understand at a glance
  • ✅ The fallback preserves existing behaviour for runs where log details were unavailable
  • ✅ Test assertions use both Contains and NotContains to pin the exact symptom, not just "something changed"
  • ✅ Minimal diff — only the line that needed changing, plus the regression test

Verdict

Approving. The one missing test case (see inline) is a nice-to-have rather than a blocker.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 3.9M

assert.NotContains(t, finding.Description, "0 error(s)",
"Description must not show 0 errors when the errors slice is non-empty")
},
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The new test covers the key scenario (ErrorCount == 0, non-empty errors), but there is a third case not yet covered: len(errors) > 0 and metrics.ErrorCount > 0 with different values (e.g. 3 actual errors vs a stale ErrorCount: 5). The fix always prefers len(errors), but without a test the priority rule is invisible to future maintainers.

Consider adding:

{
    name: "len(errors) beats non-zero metrics.ErrorCount when both present",
    metrics: MetricsData{ErrorCount: 5},
    errors: []ErrorInfo{
        {Type: "step_failure", Message: "err 1"},
        {Type: "step_failure", Message: "err 2"},
        {Type: "step_failure", Message: "err 3"},
    },
    // assert description contains "3 error(s)" and not "5 error(s)"
}

This locks down the priority contract for the nonzero-vs-different-nonzero divergence case too.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

Excellent test quality

Metric Value
New/modified tests analyzed 1
✅ Design tests (behavioral contracts) 1 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 1 (100%)
Duplicate test clusters 0
Test inflation detected Yes (25 test lines / 9 production lines = 2.8×)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
TestGenerateFindings — row: "failed workflow uses actual error count not stale metrics.ErrorCount" pkg/cli/audit_report_test.go:185 ✅ Design None — excellent regression coverage

Analysis

This PR adds a single new table row to the existing TestGenerateFindings table-driven test. The new case is a direct regression test for the bug being fixed:

  • Sets metrics.ErrorCount = 0 (stale/wrong) but populates the errors slice with 2 real entries
  • Asserts that the finding description contains "2 error(s)" (from len(errors))
  • Asserts it does not contain "0 error(s)" (the old broken output)

This perfectly covers the behavioral contract: "when errors are present, the error count in the findings description must reflect the actual number of errors, not a potentially stale metrics field."

Checklist:

  • //go:build !integration build tag present on line 1
  • ✅ No mock libraries used
  • ✅ All assertions include descriptive messages
  • ✅ Table-driven test pattern (preferred codebase style)
  • ✅ Both positive (Contains "2 error(s)") and negative (NotContains "0 error(s)") assertions

Score note: The 10-point inflation penalty applies mechanically (25 test lines added vs. 9 production lines = 2.8× ratio), but this is expected and justified — regression tests for small fixes routinely require more setup code than the fix itself.


Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). The added test is a high-quality regression test that directly enforces the behavioral contract fixed by this PR.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 1 test — unit (//go:build !integration)
  • 🟨 JavaScript (*.test.cjs, *.test.js): 0 tests

References: §25964660443

🧪 Test quality analysis by Test Quality Sentinel · ● 8M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cli-tools-test] bug: audit key_finding incorrectly reports "0 error(s)" for failed workflows with errors

3 participants