Skip to content

feat(audit): upgrade audit diff with MCP tool invocations, token usage, and duration diffs#23118

Merged
pelikhan merged 2 commits intomainfrom
copilot/upgrade-audit-diff-subcommand
Mar 26, 2026
Merged

feat(audit): upgrade audit diff with MCP tool invocations, token usage, and duration diffs#23118
pelikhan merged 2 commits intomainfrom
copilot/upgrade-audit-diff-subcommand

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 26, 2026

audit diff only compared firewall domain behavior. This upgrades it to also surface MCP tool invocation changes and run-level metrics when cached run summaries are available.

New diff sections

  • MCP tool invocations — new/removed/changed tools per server, with call count deltas and anomaly flags (new tools with errors, increased error rates)
  • Run metrics — token usage, duration, and turns comparison with percentage/absolute deltas

Data model

Introduces AuditDiff as the new top-level output, wrapping existing FirewallDiff plus two new sections:

type AuditDiff struct {
    Run1ID         int64           `json:"run1_id"`
    Run2ID         int64           `json:"run2_id"`
    FirewallDiff   *FirewallDiff   `json:"firewall_diff,omitempty"`
    MCPToolsDiff   *MCPToolsDiff   `json:"mcp_tools_diff,omitempty"`
    RunMetricsDiff *RunMetricsDiff `json:"run_metrics_diff,omitempty"`
}

MCPToolsDiff tracks new/removed/changed (server, tool) pairs with call counts and error counts. RunMetricsDiff is omitted when all values are zero (i.e., no cached summary available).

Loading strategy

loadFirewallAnalysisForRun is replaced by loadRunSummaryForDiff, which returns the full RunSummary from cache (giving MCP + metrics data) and falls back to a firewall-only partial summary on cache miss — preserving backward compatibility.

Output formats

All three output formats (pretty, markdown, JSON) render the new sections as sub-sections after the existing firewall diff. The pretty format aggregates a cross-section summary line.

Copilot AI and others added 2 commits March 26, 2026 07:14
…usage, and time diffs

- Add AuditDiff top-level struct combining FirewallDiff, MCPToolsDiff, and RunMetricsDiff
- Add MCPToolDiffEntry, MCPToolsDiff, MCPToolsDiffSummary types for MCP tool comparison
- Add RunMetricsDiff type for token usage, duration, and turns comparison
- Add computeAuditDiff, computeMCPToolsDiff, computeRunMetricsDiff functions
- Add formatCountChange helper for absolute delta formatting
- Replace loadFirewallAnalysisForRun with loadRunSummaryForDiff (returns full RunSummary)
- Update render functions (JSON, markdown, pretty) for all new diff sections
- Add isEmptyMCPToolsDiff and isEmptyAuditDiff helpers
- Add 17 new tests covering all new compute functions and edge cases
- Update command help text to describe new diff capabilities

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d24c8923-28dd-4082-98b1-435af2aa7d2a

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review March 26, 2026 13:01
Copilot AI review requested due to automatic review settings March 26, 2026 13:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances audit diff to compare additional run behavior beyond firewall domains by introducing a unified AuditDiff output that can include MCP tool invocation changes and run-level metrics when cached summaries are available.

Changes:

  • Introduces AuditDiff and computes MCP tool usage diffs and run metrics diffs alongside the existing firewall diff.
  • Updates CLI command flow to load full cached RunSummary when possible and fall back to firewall-only analysis on cache miss.
  • Extends JSON/markdown/pretty renderers to display MCP tool changes and run metrics sections.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pkg/cli/audit_diff.go Adds new diff data models and computation logic for MCP tools and run metrics; replaces firewall-only loader with loadRunSummaryForDiff.
pkg/cli/audit_diff_command.go Updates command help text and switches runtime behavior to compute/render the new AuditDiff.
pkg/cli/audit_diff_render.go Reworks renderers to output a full audit diff across firewall, MCP tools, and run metrics.
pkg/cli/audit_diff_test.go Adds unit tests covering MCP tools diffing, run metrics diffs, audit diff composition, and JSON serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +392 to +396
if s2.ErrorCount > s1.ErrorCount {
entry.IsAnomaly = true
entry.AnomalyNote = "error count increased"
anomalyCount++
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description calls out anomaly flags for "increased error rates", but the current anomaly detection only checks ErrorCount increasing (absolute count). This can miss cases where call volume increases and the error rate increases even if error count is flat, or where error count increases but rate improves. Consider computing and comparing error rate (ErrorCount/CallCount) when both call counts are >0 and flagging anomalies based on rate deltas (in addition to absolute error count if desired).

Copilot uses AI. Check for mistakes.
fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Changes: "+strings.Join(summaryParts, " | ")))
}
if anomalyCount > 0 {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("⚠️ %d anomalies detected", anomalyCount)))
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

console.FormatWarningMessage already prefixes the message with a warning icon (see pkg/console/console.go:156-159). Including "⚠️" in the message results in a duplicated warning marker in the pretty output. Consider removing the emoji from the formatted string and just pass the plain text (e.g., " anomalies detected").

Suggested change
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("⚠️ %d anomalies detected", anomalyCount)))
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("%d anomalies detected", anomalyCount)))

Copilot uses AI. Check for mistakes.
type AuditDiff struct {
Run1ID int64 `json:"run1_id"`
Run2ID int64 `json:"run2_id"`
FirewallDiff *FirewallDiff `json:"firewall_diff,omitempty"`
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AuditDiff.FirewallDiff is tagged with omitempty, but computeAuditDiff always assigns a non-nil FirewallDiff (even when there are no firewall changes). This means firewall_diff will always be serialized, defeating omitempty and making the JSON shape less consistent with the intent of optional sections. Either set FirewallDiff to nil when isEmptyDiff(...) is true, or remove omitempty if you want it always present.

Suggested change
FirewallDiff *FirewallDiff `json:"firewall_diff,omitempty"`
FirewallDiff *FirewallDiff `json:"firewall_diff"`

Copilot uses AI. Check for mistakes.
@pelikhan pelikhan merged commit ff88a66 into main Mar 26, 2026
146 checks passed
@pelikhan pelikhan deleted the copilot/upgrade-audit-diff-subcommand branch March 26, 2026 13:08
github-actions bot added a commit that referenced this pull request Mar 26, 2026
Document the `gh aw audit diff <run-id-1> <run-id-2>` command, which
compares firewall behavior, MCP tool invocations, and run metrics between
two workflow runs. Added as part of coverage for #23118 (MCP tool/token/
duration diffs) and the original audit diff addition.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants