feat: add tool calls diff and tokens-per-turn to audit diff mode by Copilot · Pull Request #28494 · github/gh-aw

Copilot · 2026-04-25T20:42:04Z

Agents investigating cost regressions in diff mode lacked the granular context needed to pinpoint why token usage changed. This adds precomputed per-turn token efficiency and a full engine-level tool call breakdown — including bash command detail — directly into the diff output.

New data in `RunMetricsDiff`

Tokens per turn — uses effective tokens from the firewall proxy when available, falls back to engine-level count:

Run1TokensPerTurn, Run2TokensPerTurn, TokensPerTurnChange
Shown as a row in the Run Metrics table in both pretty and markdown renderers

Tool calls diff (ToolCallsDiff) — sourced from RunSummary.Metrics.ToolCalls (populated by engine log parsers):

AllTools — complete view of every tool type with run1/run2 call counts and change
NewTools / RemovedTools / ChangedTools — bucketed diffs
Per-entry MaxInputSize / MaxOutputSize for size context

Bash-specific breakdown (BashCommandsDiff):

Aggregates bash, Bash (Claude), and bash_* per-command entries (Codex)
Shows total bash call count delta and a per-command table with call counts + max input/output sizes

New output sections (pretty + markdown)

#### Tool Call Breakdown
| Tool       | Run #100 | Run #200 | Change |
|------------|----------|----------|--------|
| bash       | 5        | 12       | +7     |
| edit       | 0        | 4        | —      |
| gh         | 3        | 3        | —      |

#### Bash Commands
Total bash calls: Run #100=5, Run #200=12 (+7)
| Command         | Run #100 | Run #200 | Change | Max Input | Max Output |
|-----------------|----------|----------|--------|-----------|------------|
| bash            | 5        | 12       | +7     | 800 / 950 | 300 / 600  |

Implementation notes

Bash tools collected during the main iteration in computeToolCallsDiff so computeBashCommandsDiff receives pre-filtered maps (no second traversal)
formatMaxSizeCell() helper extracted to deduplicate size formatting across pretty and markdown renderers
ToolCallInfo exported as a type alias in logs_models.go (alongside existing LogMetrics alias)

- Add ToolCallDiffEntry, ToolCallsDiff, ToolCallsDiffSummary, BashCommandsDiff types - Add tokens-per-turn fields (Run1TokensPerTurn, Run2TokensPerTurn, TokensPerTurnChange) to RunMetricsDiff - Add ToolCallsDiff field to RunMetricsDiff for engine-level tool call analysis - Implement computeToolCallsDiff() diffing LogMetrics.ToolCalls between runs - Implement computeBashCommandsDiff() for bash-specific analysis (handles generic bash/Bash and per-command bash_* entries from Codex) - Implement isBashTool() helper for bash tool name matching - Update computeRunMetricsDiff() to compute tokens-per-turn and include tool calls diff - Add renderToolCallsDiffPrettySection() and renderBashCommandsDiffPrettySection() - Add renderToolCallsDiffMarkdownSection() and renderBashCommandsDiffMarkdownSection() - Update Run Metrics table in both renderers to show Tokens/turn row - Export ToolCallInfo type alias in pkg/cli/logs_models.go - Add 15 new unit tests covering all new functionality Agent-Logs-Url: https://github.com/github/gh-aw/sessions/dbe42488-aa10-4336-bfeb-170f75f44adf Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…raversal in bash diff - Extract formatMaxSizeCell() helper to remove duplicated max-size formatting in pretty and markdown renderers - Collect bash tools during main iteration in computeToolCallsDiff() so computeBashCommandsDiff() receives pre-filtered maps, avoiding a second traversal - Update tests to pass pre-filtered bash tool maps to computeBashCommandsDiff() directly Agent-Logs-Url: https://github.com/github/gh-aw/sessions/dbe42488-aa10-4336-bfeb-170f75f44adf Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Adds more granular cost-regression context to audit diff by surfacing per-turn token efficiency and a detailed engine-level tool call breakdown (including bash command detail) in both pretty and markdown output.

Changes:

Extend RunMetricsDiff with tokens-per-turn metrics and a new ToolCallsDiff/BashCommandsDiff structure.
Compute tool call diffs (new/removed/changed/all) and bash command aggregation during diff generation.
Render new rows/sections in both markdown and pretty output; add comprehensive unit tests.

Show a summary per file

File	Description
pkg/cli/logs_models.go	Exposes `ToolCallInfo` as a CLI-level alias to the shared workflow type.
pkg/cli/audit_diff.go	Adds tokens-per-turn computation plus tool calls + bash command diff computation and new diff structs.
pkg/cli/audit_diff_render.go	Renders tokens/turn row and new tool-call + bash-command breakdown sections (pretty + markdown).
pkg/cli/audit_diff_test.go	Adds unit tests for tool-call diffs, bash-command diffs, and tokens-per-turn behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/cli/audit_diff.go:547

The early-return "no meaningful data" check ignores engine-level tool call metrics. If both runs have 0 tokens/duration/turns and no token/rate-limit details, but summary*.Metrics.ToolCalls is populated, this function returns nil and the new Tool Call Breakdown / Bash Commands sections can never render. Include tool-call presence in the meaningful-data check (e.g., hasToolCalls := (m1 != nil && len(m1.ToolCalls) > 0) || (m2 != nil && len(m2.ToolCalls) > 0)), or compute computeToolCallsDiff first and consider it in the condition.

	// Skip if there is no meaningful data
	hasTokenDetails := tu1 != nil || tu2 != nil
	hasRateLimitDetails := rl1 != nil || rl2 != nil
	if run1Tokens == 0 && run2Tokens == 0 && run1Duration == 0 && run2Duration == 0 && run1Turns == 0 && run2Turns == 0 && !hasTokenDetails && !hasRateLimitDetails {
		return nil

Files reviewed: 4/4 changed files
Comments generated: 2

Copilot · 2026-04-25T20:46:14Z

+	if m1 != nil {
+		for _, tc := range m1.ToolCalls {
+			run1Tools[tc.Name] = tc
+		}
+	}
+	if m2 != nil {
+		for _, tc := range m2.ToolCalls {
+			run2Tools[tc.Name] = tc


computeToolCallsDiff builds run1Tools/run2Tools by overwriting entries with the same tool name (run1Tools[tc.Name] = tc). In this codebase, LogMetrics.ToolCalls can contain duplicate tool names because metrics are appended across multiple log files (pkg/cli/logs_metrics.go:208-210), and other consumers aggregate duplicates (e.g., buildToolUsageInfo sums counts). This overwrite will undercount call totals and lose max-size info. When inserting into the map, aggregate: sum CallCount, and take the max of MaxInputSize/MaxOutputSize (and any other relevant fields) across duplicates.

Suggested change

if m1 != nil {

for _, tc := range m1.ToolCalls {

run1Tools[tc.Name] = tc

}

}

if m2 != nil {

for _, tc := range m2.ToolCalls {

run2Tools[tc.Name] = tc

aggregateToolCall := func(tools map[string]ToolCallInfo, tc ToolCallInfo) {

if existing, ok := tools[tc.Name]; ok {

existing.CallCount += tc.CallCount

if tc.MaxInputSize > existing.MaxInputSize {

existing.MaxInputSize = tc.MaxInputSize

}

if tc.MaxOutputSize > existing.MaxOutputSize {

existing.MaxOutputSize = tc.MaxOutputSize

}

tools[tc.Name] = existing

return

}

tools[tc.Name] = tc

}

if m1 != nil {

for _, tc := range m1.ToolCalls {

aggregateToolCall(run1Tools, tc)

}

}

if m2 != nil {

for _, tc := range m2.ToolCalls {

aggregateToolCall(run2Tools, tc)

Copilot · 2026-04-25T20:46:15Z

+		}
+		for _, entry := range diff.AllTools {
+			change := entry.CallCountChange
+			if change == "" && entry.Status == "unchanged" {


In the pretty renderer, new/removed tools currently render an empty "Change" cell because CallCountChange is only set for Status == "changed", and the code only substitutes "—" for Status == "unchanged". This leads to inconsistent output vs. the markdown renderer (which shows "—" whenever change is empty) and leaves a blank table cell in the console. Consider using the same rule as markdown here: if change == "" set it to "—" regardless of status.

Suggested change

if change == "" && entry.Status == "unchanged" {

if change == "" {

github-actions · 2026-04-25T20:47:41Z

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

✅ Excellent test quality

Metric	Value
New/modified tests analyzed	15
✅ Design tests (behavioral contracts)	15 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	10 (67%)
Duplicate test clusters	0
Test inflation detected	No (test: 364 lines, prod: 433 lines, ratio 0.84)
🚨 Coding-guideline violations	None

Test Classification Details

View all 15 test classifications

Test	File	Classification	Issues Detected
`TestComputeToolCallsDiff_BothNil`	`pkg/cli/audit_diff_test.go`	✅ Design	Edge case: nil inputs → nil output
`TestComputeToolCallsDiff_BothEmpty`	`pkg/cli/audit_diff_test.go`	✅ Design	Edge case: empty metrics → nil output
`TestComputeToolCallsDiff_NewTools`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: new tools detection, counts, statuses, summary
`TestComputeToolCallsDiff_RemovedTools`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: removed tools detection and summary counts
`TestComputeToolCallsDiff_ChangedTools`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: changed tool detection, delta string (+7)
`TestComputeToolCallsDiff_AllToolsContainsEverything`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: AllTools aggregation across new/removed/changed
`TestComputeToolCallsDiff_SortedOutput`	`pkg/cli/audit_diff_test.go`	✅ Design	Invariant: AllTools is sorted alphabetically
`TestComputeBashCommandsDiff_NoBash`	`pkg/cli/audit_diff_test.go`	✅ Design	Edge case: empty maps → nil
`TestComputeBashCommandsDiff_GenericBash`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: generic bash tool diff and totals
`TestComputeBashCommandsDiff_PerCommandTracking`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: per-command status (new/removed/unchanged/changed)
`TestComputeBashCommandsDiff_BashCapitalized`	`pkg/cli/audit_diff_test.go`	✅ Design	Edge case: capitalized "Bash" (Claude engine) is detected
`TestComputeRunMetricsDiff_TokensPerTurn`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: tokens/turn calculation and % change string
`TestComputeRunMetricsDiff_TokensPerTurnFromEffective`	`pkg/cli/audit_diff_test.go`	✅ Design	Behavioral contract: prefers effective tokens when available
`TestComputeRunMetricsDiff_TokensPerTurnZeroTurns`	`pkg/cli/audit_diff_test.go`	✅ Design	Edge case: zero-turn guard against division by zero
`TestComputeRunMetricsDiff_WithToolCallsDiff`	`pkg/cli/audit_diff_test.go`	✅ Design	Integration: `computeRunMetricsDiff` wires ToolCallsDiff and BashDiff correctly

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All 15 tests verify behavioral contracts — observable outputs, edge cases (nil/empty/zero-division/capitalization), and sorting invariants. No mock libraries, no missing build tags, and all assertions carry descriptive messages.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §24940226955

🧪 Test quality analysis by Test Quality Sentinel · ● 626.2K · ◷

github-actions

✅ Test Quality Sentinel: 90/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 15 tests verify behavioral contracts with good edge case coverage (nil inputs, empty metrics, zero-division guard, capitalization handling, sorting invariants).

…-turn in audit diff Generated by the Design Decision Gate workflow for PR #28494. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-25T20:48:02Z

Commit pushed: 4fbf628

🏗️ ADR gate enforced by Design Decision Gate 🏗️

github-actions · 2026-04-25T20:48:03Z

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (797 new lines in pkg/) but does not have a linked Architecture Decision Record (ADR).

AI has analyzed the PR diff and generated a draft ADR to help you get started:

📄 Draft ADR: docs/adr/28494-tool-calls-and-tokens-per-turn-in-audit-diff.md

What to do next

Review the draft ADR committed to your branch — it was generated from the PR diff
Complete the missing sections — add context the AI couldn't infer, refine the decision rationale, and list real alternatives you considered
Commit the finalized ADR to docs/adr/ on your branch
Reference the ADR in this PR body by adding a line such as:

ADR: ADR-28494: Embed Tool-Call Breakdown and Tokens-per-Turn Metrics in Audit Diff Output

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

Why ADRs Matter

"AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say 'I'll deal with this later.' Deferring decisions corroded my ability to think clearly."

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

Context — What is the problem? What forces are at play?
Decision — What did you decide? Why?
Alternatives Considered — What else could have been done?
Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 28494-tool-calls-and-tokens-per-turn-in-audit-diff.md for this PR).

🔒 This PR cannot merge until an ADR is linked in the PR body.

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 186.5K · ◷

pelikhan · 2026-04-25T20:48:12Z

@copilot review all comments

…ange cells - In computeToolCallsDiff: aggregate duplicate tool entries (sum CallCount, take max of sizes) instead of overwriting, matching how other consumers handle metrics appended from multiple log files - In renderToolCallsDiffPrettySection: always substitute "—" for empty change cells regardless of status, consistent with markdown renderer - Add TestComputeToolCallsDiff_DuplicateToolNames test Agent-Logs-Url: https://github.com/github/gh-aw/sessions/60183031-f19f-4034-bf21-75f6983738ad Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-25T20:53:24Z

@copilot review all comments

Both review issues addressed in 874e7a8:

Duplicate tool names — computeToolCallsDiff now uses an aggregateToolCall helper that sums CallCount and takes the max of MaxInputSize, MaxOutputSize, and MaxDuration when the same tool name appears more than once, matching how other consumers handle metrics appended across log files. Added TestComputeToolCallsDiff_DuplicateToolNames to cover this.
Empty change cell in pretty renderer — the if change == "" guard in renderToolCallsDiffPrettySection no longer checks entry.Status == "unchanged", so new and removed tools also get "—" in the Change column, consistent with the markdown renderer.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

https://api.github.com/graphql
- Triggering command: /usr/bin/gh gh repo view --json owner,name --jq .owner.login + "/" + .name e/git (http block)
- Triggering command: /usr/bin/gh gh repo view owner/repo --no�� --noprofile (http block)
- Triggering command: /usr/bin/gh gh repo view owner/repo -ato�� -bool -buildtags /usr/bin/git -errorsas -ifaceassert -nilfunc git clon�� /tmp/TestParseDefaultBranchFromLsRemoteWithRealGitmain_branch4242789745/001 /tmp/TestParseDefaultBranchFromLsRemoteWithRealGitmain_branch4242789745/002/work /opt/hostedtoolcache/node/24.14.1/x64/bin/node ath ../../../.prgit (http block)
https://api.github.com/orgs/test-owner/actions/secrets
- Triggering command: /usr/bin/gh gh api /orgs/test-owner/actions/secrets --jq .secrets[].name (http block)
https://api.github.com/repos/actions/ai-inference/git/ref/tags/v1
- Triggering command: /usr/bin/gh gh api /repos/actions/ai-inference/git/ref/tags/v1 --jq [.object.sha, .object.type] | @tsv --show-toplevel git /usr/bin/git 11747470/.githubgit --global ache/uv/0.11.7/x--show-toplevel git rev-�� --show-toplevel git /usr/bin/git ithub/workflows --local 64/pkg/tool/linuinstall git (http block)
https://api.github.com/repos/actions/checkout/git/ref/tags/v3
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v3 --jq [.object.sha, .object.type] | @tsv --show-toplevel (http block)
https://api.github.com/repos/actions/checkout/git/ref/tags/v5
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq [.object.sha, .object.type] | @tsv get --local sh nore (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq [.object.sha, .object.type] | @tsv WorkflowFiles_SimpleWorkflow851111918/001 git /usr/bin/git ai-moderator.md credential.usern\n k/gh-aw/gh-aw/ac: git rev-�� --show-toplevel git /usr/bin/git te '../../../**/git gpg.program /usr/sbin/iptabl--show-toplevel git (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v5 --jq [.object.sha, .object.type] | @tsv --show-toplevel /opt/hostedtoolcache/go/1.25.8/x64/pkg/tool/linux_amd64/vet /usr/bin/git -bool -buildtags ache/node/24.14.--show-toplevel git rev-�� --show-toplevel ache/node/24.14.1/x64/bin/node /usr/bin/git b.actor }}, Repogit -tests /usr/bin/git git (http block)
https://api.github.com/repos/actions/checkout/git/ref/tags/v6
- Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv /tmp/TestHashConsistency_GoAndJavaScript2148129813/001/test-inlined-imports-enabled-with-body-cogit (http block)
https://api.github.com/repos/actions/github-script/git/ref/tags/v8
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v8 --jq [.object.sha, .object.type] | @tsv --show-toplevel git /usr/bin/git Onlymin-integritbash HEAD (http block)
https://api.github.com/repos/actions/github-script/git/ref/tags/v9
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv ath ../../../.pr**/*.json (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/github-script/git/ref/tags/v9 --jq [.object.sha, .object.type] | @tsv 835775250/001 835775250/002/work 64/bin/git (http block)
https://api.github.com/repos/actions/setup-go/git/ref/tags/v4
- Triggering command: /usr/bin/gh gh api /repos/actions/setup-go/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv --show-toplevel (http block)
https://api.github.com/repos/actions/setup-node/git/ref/tags/v4
- Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv list --json /usr/bin/git --repo owner/repo (http block)
- Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv --show-toplevel (http block)
https://api.github.com/repos/actions/upload-artifact/git/ref/tags/v4
- Triggering command: /usr/bin/gh gh api /repos/actions/upload-artifact/git/ref/tags/v4 --jq [.object.sha, .object.type] | @tsv testing.testBinary=1 -extld=gcc /usr/bin/git h ../../../.pretgit (http block)
https://api.github.com/repos/github/gh-aw-actions/git/ref/tags/v0.1.2
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw-actions/git/ref/tags/v0.1.2 --jq [.object.sha, .object.type] | @tsv --show-toplevel (http block)
https://api.github.com/repos/github/gh-aw-actions/git/ref/tags/v1.0.0
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw-actions/git/ref/tags/v1.0.0 --jq [.object.sha, .object.type] | @tsv testing.testBinary=1 -extld=gcc /usr/bin/git h ../../../.pretgit (http block)
https://api.github.com/repos/github/gh-aw-actions/git/ref/tags/v1.2.3
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw-actions/git/ref/tags/v1.2.3 --jq [.object.sha, .object.type] | @tsv /v1.0.0 --initial-branch=develop sv h ../../../.pretgit (http block)
https://api.github.com/repos/github/gh-aw/actions/runs/1/artifacts
- Triggering command: /usr/bin/gh gh api --paginate repos/{owner}/{repo}/actions/runs/1/artifacts --jq .artifacts[].name user.name k/_temp/uv-python-dir/node (http block)
- Triggering command: /usr/bin/gh gh run download 1 --dir test-logs/run-1 committer.name ache/uv/0.11.7/x86_64/bash (http block)
https://api.github.com/repos/github/gh-aw/actions/runs/12345/artifacts
- Triggering command: /usr/bin/gh gh api --paginate repos/{owner}/{repo}/actions/runs/12345/artifacts --jq .artifacts[].name credential.username in/sh (http block)
- Triggering command: /usr/bin/gh gh run download 12345 --dir test-logs/run-12345 credential.helper rgo/bin/sh (http block)
https://api.github.com/repos/github/gh-aw/actions/runs/12346/artifacts
- Triggering command: /usr/bin/gh gh api --paginate repos/{owner}/{repo}/actions/runs/12346/artifacts --jq .artifacts[].name credential.username At,event,headBranch,headSha,displayTitle (http block)
- Triggering command: /usr/bin/gh gh run download 12346 --dir test-logs/run-12346 credential.helper .cfg (http block)
https://api.github.com/repos/github/gh-aw/actions/runs/2/artifacts
- Triggering command: /usr/bin/gh gh api --paginate repos/{owner}/{repo}/actions/runs/2/artifacts --jq .artifacts[].name efaultBranchFromLsRemoteWithRealGitmaster_branch2835775250/001' 1/x64/bin/node (http block)
- Triggering command: /usr/bin/gh gh run download 2 --dir test-logs/run-2 committer.name 1/x64/bin/node (http block)
https://api.github.com/repos/github/gh-aw/actions/runs/3/artifacts
- Triggering command: /usr/bin/gh gh api --paginate repos/{owner}/{repo}/actions/runs/3/artifacts --jq .artifacts[].name sRemoteWithRealGitcustom_branch1272897534/002/work .cfg (http block)
- Triggering command: /usr/bin/gh gh run download 3 --dir test-logs/run-3 committer.name bin/bash (http block)
https://api.github.com/repos/github/gh-aw/actions/runs/4/artifacts
- Triggering command: /usr/bin/gh gh api --paginate repos/{owner}/{repo}/actions/runs/4/artifacts --jq .artifacts[].name efaultBranchFromLsRemoteWithRealGitcustom_branch1272897534/001' .cfg (http block)
- Triggering command: /usr/bin/gh gh run download 4 --dir test-logs/run-4 committer.name k/_temp/uv-python-dir/bash (http block)
https://api.github.com/repos/github/gh-aw/actions/runs/5/artifacts
- Triggering command: /usr/bin/gh gh api --paginate repos/{owner}/{repo}/actions/runs/5/artifacts --jq .artifacts[].name pull.rebase 1/x64/bin/node (http block)
- Triggering command: /usr/bin/gh gh run download 5 --dir test-logs/run-5 committer.email nfig/composer/vendor/bin/bash (http block)
https://api.github.com/repos/github/gh-aw/actions/workflows
- Triggering command: /usr/bin/gh gh workflow list --json name,state,path 42789745/001' 42789745/001' _TOKEN"; }; f (http block)
- Triggering command: /usr/bin/gh gh run list --json databaseId,number,url,status,conclusion,workflowName,createdAt,startedAt,updatedAt,event,headBranch,headSha,displayTitle --workflow nonexistent-workflow-12345 --limit 100 (http block)
- Triggering command: /usr/bin/gh gh run list --json databaseId,number,url,status,conclusion,workflowName,createdAt,startedAt,updatedAt,event,headBranch,headSha,displayTitle --workflow nonexistent-workflow-12345 --limit 6 (http block)
https://api.github.com/repos/github/gh-aw/contents/.github/workflows/shared/reporting.md
- Triggering command: /tmp/go-build2494875849/b404/cli.test /tmp/go-build2494875849/b404/cli.test -test.testlogfile=/tmp/go-build2494875849/b404/testlog.txt -test.paniconexit0 -test.v=true -test.parallel=4 -test.timeout=10m0s -test.run=^Test -test.short=true (http block)
https://api.github.com/repos/github/gh-aw/git/ref/tags/v0.47.4
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v0.47.4 --jq [.object.sha, .object.type] | @tsv download 4 /usr/bin/git test-logs/run-4 committer.name k/_temp/uv-pytho--show-toplevel git rev-�� --show-toplevel git /usr/bin/git ExpressionCompills (http block)
https://api.github.com/repos/github/gh-aw/git/ref/tags/v1.0.0
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v1.0.0 --jq [.object.sha, .object.type] | @tsv */*.ts' '**/*.json' --ignore-path ../../../.pret.prettierignore (http block)
https://api.github.com/repos/github/gh-aw/git/ref/tags/v1.2.3
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v1.2.3 --jq [.object.sha, .object.type] | @tsv (http block)
https://api.github.com/repos/github/gh-aw/git/ref/tags/v2.0.0
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v2.0.0 --jq [.object.sha, .object.type] | @tsv json' --ignore-path ../../../.pr**/*.json (http block)
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v2.0.0 --jq [.object.sha, .object.type] | @tsv json' --ignore-p-errorsas (http block)
https://api.github.com/repos/github/gh-aw/git/ref/tags/v3.0.0
- Triggering command: /usr/bin/gh gh api /repos/github/gh-aw/git/ref/tags/v3.0.0 --jq [.object.sha, .object.type] | @tsv json' --ignore-path ../../../.pr**/*.json (http block)
https://api.github.com/repos/nonexistent/action/git/ref/tags/v999.999.999
- Triggering command: /usr/bin/gh gh api /repos/nonexistent/action/git/ref/tags/v999.999.999 --jq [.object.sha, .object.type] | @tsv hub/workflows ame tions/node_modules/.bin/sh (http block)
https://api.github.com/repos/nonexistent/repo/actions/runs/12345
- Triggering command: /usr/bin/gh gh run view 12345 --repo nonexistent/repo --json status,conclusion (http block)
https://api.github.com/repos/owner/repo/actions/workflows
- Triggering command: /usr/bin/gh gh workflow list --json name,state,path --repo owner/repo _TOKEN"; }; f (http block)
- Triggering command: /usr/bin/gh gh workflow list --json name,state,path --repo owner/repo (http block)
- Triggering command: /usr/bin/gh gh workflow list --repo owner/repo --json name,path,state /opt/hostedtoolcache/go/1.25.8/x64/pkg/tool/linux_amd64/vet l ../../../.prettirev-parse (http block)
https://api.github.com/repos/test-owner/test-repo/actions/secrets
- Triggering command: /usr/bin/gh gh api /repos/test-owner/test-repo/actions/secrets --jq .secrets[].name (http block)
https://api.github.com/repos/test/repo
- Triggering command: /usr/bin/gh gh api /repos/test/repo --jq .default_branch --bare --initial-branch=my-default /usr/bin/git (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

github-actions · 2026-04-25T21:02:51Z

✅ Smoke CI completed successfully!

github-actions · 2026-04-25T21:04:11Z

✅ smoke-ci: safeoutputs CLI comment + comment-memory run (24940602418)

Generated by Smoke CI for issue #28494 · ◷

github-actions · 2026-04-25T21:04:12Z

Comment Memory

CI lights the path\nGreen checks bloom at dawn\nQuiet bots still sing

Note

This comment is managed by comment memory.

It stores persistent context for this thread in the code block at the top of this comment.
Edit only the text inside the backtick fences; workflow metadata and the footer are regenerated automatically.

Learn more about comment memory

Generated by Smoke CI for issue #28494 · ◷

- audit.md: document tokens-per-turn and tool calls/bash command breakdown in diff output (PR #28494) - frontmatter.md: add Observability section documenting observability.otlp configuration, object-form OTLP headers (PR #28524), and gen_ai.* span attribute conventions replacing gh-aw.* (PR #28511) - cli.md: update audit diff description to include new output fields (PR #28494) - cache-memory.md: document cache path misconfiguration detection via missing_data signal (PR #28516) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI and others added 2 commits April 25, 2026 20:21

Copilot AI assigned Copilot and pelikhan Apr 25, 2026

Copilot created this pull request from a session on behalf of pelikhan April 25, 2026 20:42 View session

pelikhan marked this pull request as ready for review April 25, 2026 20:42

Copilot AI review requested due to automatic review settings April 25, 2026 20:42

Copilot started reviewing on behalf of pelikhan April 25, 2026 20:43 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

github-actions Bot approved these changes Apr 25, 2026

View reviewed changes

docs(adr): add draft ADR-28494 for tool-call breakdown and tokens-per…

4fbf628

…-turn in audit diff Generated by the Design Decision Gate workflow for PR #28494. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot started work on behalf of pelikhan April 25, 2026 20:48 View session

Copilot finished work on behalf of pelikhan April 25, 2026 20:54

Copilot AI requested a review from pelikhan April 25, 2026 20:54

Merge branch 'main' into copilot/analyze-audit-diff-data

b6afadb

pelikhan merged commit 54d5a25 into main Apr 25, 2026
17 of 18 checks passed

pelikhan deleted the copilot/analyze-audit-diff-data branch April 25, 2026 21:02

This was referenced Apr 25, 2026

fix: add --skip-trust to Gemini CLI command to prevent yolo override in AWF sandbox #28496

Merged

Smoke Test: Codex - 24941211957 #28505

Closed

Smoke Test: Claude - 24941211951 #28508

Closed

github-actions Bot mentioned this pull request Apr 26, 2026

[docs] Update documentation for features from 2026-04-26 #28577

Merged

-	if m1 != nil {
-		for _, tc := range m1.ToolCalls {
-			run1Tools[tc.Name] = tc
-		}
-	}
-	if m2 != nil {
-		for _, tc := range m2.ToolCalls {
-			run2Tools[tc.Name] = tc
+	aggregateToolCall := func(tools map[string]ToolCallInfo, tc ToolCallInfo) {
+		if existing, ok := tools[tc.Name]; ok {
+			existing.CallCount += tc.CallCount
+			if tc.MaxInputSize > existing.MaxInputSize {
+				existing.MaxInputSize = tc.MaxInputSize
+			}
+			if tc.MaxOutputSize > existing.MaxOutputSize {
+				existing.MaxOutputSize = tc.MaxOutputSize
+			}
+			tools[tc.Name] = existing
+			return
+		}
+		tools[tc.Name] = tc
+	}
+	if m1 != nil {
+		for _, tc := range m1.ToolCalls {
+			aggregateToolCall(run1Tools, tc)
+		}
+	}
+	if m2 != nil {
+		for _, tc := range m2.ToolCalls {
+			aggregateToolCall(run2Tools, tc)

	if change == "" && entry.Status == "unchanged" {
	if change == "" {

Conversation

Copilot AI commented Apr 25, 2026

New data in RunMetricsDiff

New output sections (pretty + markdown)

Implementation notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

Test Classification Details

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

🏗️ Design Decision Gate — ADR Required

What to do next

Why ADRs Matter

Uh oh!

pelikhan commented Apr 25, 2026

Uh oh!

Copilot AI commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

github-actions Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Comment Memory

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New data in `RunMetricsDiff`

Copilot AI commented Apr 25, 2026 •

edited

Loading

github-actions Bot commented Apr 25, 2026 •

edited

Loading