feat(audit): upgrade `audit diff` with MCP tool invocations, token usage, and duration diffs by Copilot · Pull Request #23118 · github/gh-aw

Copilot · 2026-03-26T13:00:04Z

audit diff only compared firewall domain behavior. This upgrades it to also surface MCP tool invocation changes and run-level metrics when cached run summaries are available.

New diff sections

MCP tool invocations — new/removed/changed tools per server, with call count deltas and anomaly flags (new tools with errors, increased error rates)
Run metrics — token usage, duration, and turns comparison with percentage/absolute deltas

Data model

Introduces AuditDiff as the new top-level output, wrapping existing FirewallDiff plus two new sections:

type AuditDiff struct {
    Run1ID         int64           `json:"run1_id"`
    Run2ID         int64           `json:"run2_id"`
    FirewallDiff   *FirewallDiff   `json:"firewall_diff,omitempty"`
    MCPToolsDiff   *MCPToolsDiff   `json:"mcp_tools_diff,omitempty"`
    RunMetricsDiff *RunMetricsDiff `json:"run_metrics_diff,omitempty"`
}

MCPToolsDiff tracks new/removed/changed (server, tool) pairs with call counts and error counts. RunMetricsDiff is omitted when all values are zero (i.e., no cached summary available).

Loading strategy

loadFirewallAnalysisForRun is replaced by loadRunSummaryForDiff, which returns the full RunSummary from cache (giving MCP + metrics data) and falls back to a firewall-only partial summary on cache miss — preserving backward compatibility.

Output formats

All three output formats (pretty, markdown, JSON) render the new sections as sub-sections after the existing firewall diff. The pretty format aggregates a cross-section summary line.

…usage, and time diffs - Add AuditDiff top-level struct combining FirewallDiff, MCPToolsDiff, and RunMetricsDiff - Add MCPToolDiffEntry, MCPToolsDiff, MCPToolsDiffSummary types for MCP tool comparison - Add RunMetricsDiff type for token usage, duration, and turns comparison - Add computeAuditDiff, computeMCPToolsDiff, computeRunMetricsDiff functions - Add formatCountChange helper for absolute delta formatting - Replace loadFirewallAnalysisForRun with loadRunSummaryForDiff (returns full RunSummary) - Update render functions (JSON, markdown, pretty) for all new diff sections - Add isEmptyMCPToolsDiff and isEmptyAuditDiff helpers - Add 17 new tests covering all new compute functions and edge cases - Update command help text to describe new diff capabilities Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d24c8923-28dd-4082-98b1-435af2aa7d2a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d24c8923-28dd-4082-98b1-435af2aa7d2a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Enhances audit diff to compare additional run behavior beyond firewall domains by introducing a unified AuditDiff output that can include MCP tool invocation changes and run-level metrics when cached summaries are available.

Changes:

Introduces AuditDiff and computes MCP tool usage diffs and run metrics diffs alongside the existing firewall diff.
Updates CLI command flow to load full cached RunSummary when possible and fall back to firewall-only analysis on cache miss.
Extends JSON/markdown/pretty renderers to display MCP tool changes and run metrics sections.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
pkg/cli/audit_diff.go	Adds new diff data models and computation logic for MCP tools and run metrics; replaces firewall-only loader with `loadRunSummaryForDiff`.
pkg/cli/audit_diff_command.go	Updates command help text and switches runtime behavior to compute/render the new `AuditDiff`.
pkg/cli/audit_diff_render.go	Reworks renderers to output a full audit diff across firewall, MCP tools, and run metrics.
pkg/cli/audit_diff_test.go	Adds unit tests covering MCP tools diffing, run metrics diffs, audit diff composition, and JSON serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T13:06:18Z

pkg/cli/audit_diff.go

+			if s2.ErrorCount > s1.ErrorCount {
+				entry.IsAnomaly = true
+				entry.AnomalyNote = "error count increased"
+				anomalyCount++
+			}


The PR description calls out anomaly flags for "increased error rates", but the current anomaly detection only checks ErrorCount increasing (absolute count). This can miss cases where call volume increases and the error rate increases even if error count is flat, or where error count increases but rate improves. Consider computing and comparing error rate (ErrorCount/CallCount) when both call counts are >0 and flagging anomalies based on rate deltas (in addition to absolute error count if desired).

Copilot · 2026-03-26T13:06:18Z

pkg/cli/audit_diff_render.go

+		fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Changes: "+strings.Join(summaryParts, " | ")))
+	}
+	if anomalyCount > 0 {
+		fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("⚠️  %d anomalies detected", anomalyCount)))


console.FormatWarningMessage already prefixes the message with a warning icon (see pkg/console/console.go:156-159). Including "⚠️" in the message results in a duplicated warning marker in the pretty output. Consider removing the emoji from the formatted string and just pass the plain text (e.g., " anomalies detected").

Suggested change

fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("⚠️ %d anomalies detected", anomalyCount)))

fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("%d anomalies detected", anomalyCount)))

Copilot · 2026-03-26T13:06:18Z

pkg/cli/audit_diff.go

+type AuditDiff struct {
+	Run1ID         int64           `json:"run1_id"`
+	Run2ID         int64           `json:"run2_id"`
+	FirewallDiff   *FirewallDiff   `json:"firewall_diff,omitempty"`


AuditDiff.FirewallDiff is tagged with omitempty, but computeAuditDiff always assigns a non-nil FirewallDiff (even when there are no firewall changes). This means firewall_diff will always be serialized, defeating omitempty and making the JSON shape less consistent with the intent of optional sections. Either set FirewallDiff to nil when isEmptyDiff(...) is true, or remove omitempty if you want it always present.

Suggested change

FirewallDiff *FirewallDiff `json:"firewall_diff,omitempty"`

FirewallDiff *FirewallDiff `json:"firewall_diff"`

Document the `gh aw audit diff <run-id-1> <run-id-2>` command, which compares firewall behavior, MCP tool invocations, and run metrics between two workflow runs. Added as part of coverage for #23118 (MCP tool/token/ duration diffs) and the original audit diff addition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI and others added 2 commits March 26, 2026 07:14

refactor: extract mcpToolKey helper for MCP tool map key construction

1acd46b

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d24c8923-28dd-4082-98b1-435af2aa7d2a Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Mar 26, 2026

Copilot created this pull request from a session on behalf of pelikhan March 26, 2026 13:00 View session

pelikhan marked this pull request as ready for review March 26, 2026 13:01

Copilot AI review requested due to automatic review settings March 26, 2026 13:01

Copilot started reviewing on behalf of pelikhan March 26, 2026 13:02 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

pelikhan merged commit ff88a66 into main Mar 26, 2026
146 checks passed

pelikhan deleted the copilot/upgrade-audit-diff-subcommand branch March 26, 2026 13:08

github-actions bot mentioned this pull request Mar 26, 2026

Smoke Test: Codex - 23595989086 #23122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audit): upgrade `audit diff` with MCP tool invocations, token usage, and duration diffs#23118

feat(audit): upgrade `audit diff` with MCP tool invocations, token usage, and duration diffs#23118
pelikhan merged 2 commits intomainfrom
copilot/upgrade-audit-diff-subcommand

Copilot AI commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("⚠️ %d anomalies detected", anomalyCount)))
	fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("%d anomalies detected", anomalyCount)))

	FirewallDiff *FirewallDiff `json:"firewall_diff,omitempty"`
	FirewallDiff *FirewallDiff `json:"firewall_diff"`

Conversation

Copilot AI commented Mar 26, 2026

New diff sections

Data model

Loading strategy

Output formats

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants