[cli-tools-test] Daily CLI Tools Test: Audit shows inconsistent metrics on repeated calls for same run

Daily exploratory testing found that the `audit` tool returns inconsistent metrics for the same run when called multiple times.

### Reproduction Steps

1. Audit run `24326834856` using its run ID:
   ````
   audit(run_id_or_url: "24326834856")
   ```
2. Audit the same run using its URL:
   ```
   audit(run_id_or_url: "https://github.com/github/gh-aw/actions/runs/24326834856")
   ````

### Observed Inconsistency

| Metric | Call 1 (run ID) | Call 2 (URL) |
|--------|-----------------|--------------|
| `metrics.token_usage` | 381,270 | **4,714,624** (12× higher) |
| `metrics.turns` | 9 | **22** |
| `comparison.delta.turns.after` | 9 | 22 |
| `effective_tokens` | 423,687 | 423,687 ✅ same |
| `tokens_per_minute` | 82,288 | **1,017,544** |
| `firewall.total_requests` | 9 | 9 ✅ same |
| Cached `run_summary.json` size | 15,321 bytes | 15,561 bytes |

The firewall data (ground truth) consistently shows **9 requests and 381k tokens**. The second audit returned inflated values (`4.71M tokens`, `22 turns`) that do not match firewall records.

### Root Cause Hypothesis

The second audit likely fetched fresh data from GitHub APIs and overwrote the cached `run_summary.json` (size increased from 15,321 → 15,561 bytes). The fresh data may aggregate token usage from a different source than the firewall log, causing the inconsistency.

This means the `token_usage` and `turns` fields are non-deterministic across audit calls for the same run.

### Impact

- **Severity**: High
- **Frequency**: Reproducible — the first audit of a run may show different numbers than subsequent audits
- **User impact**: Engineers relying on audit reports for cost tracking or optimization may see misleading numbers

### Additional Context

- Run: [§24326834856](https://github.com/github/gh-aw/actions/runs/24326834856) (GPL Dependency Cleaner, success)
- Testing run: [§24327293076](https://github.com/github/gh-aw/actions/runs/24327293076)
- Date: 2026-04-13
- gh-aw version: `d1c210e` (v1.0.21)




> Generated by [Daily CLI Tools Exploratory Tester](https://github.com/github/gh-aw/actions/runs/24327293076/agentic_workflow) · ● 1.4M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-cli-tools-tester%22&type=issues)
> - [x] expires  on Apr 20, 2026, 5:40 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cli-tools-test] Daily CLI Tools Test: Audit shows inconsistent metrics on repeated calls for same run #25985

Reproduction Steps

Observed Inconsistency

Root Cause Hypothesis

Impact

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Call 1 (run ID)	Call 2 (URL)
`metrics.token_usage`	381,270	4,714,624 (12× higher)
`metrics.turns`	9	22
`comparison.delta.turns.after`	9	22
`effective_tokens`	423,687	423,687 ✅ same
`tokens_per_minute`	82,288	1,017,544
`firewall.total_requests`	9	9 ✅ same
Cached `run_summary.json` size	15,321 bytes	15,561 bytes

[cli-tools-test] Daily CLI Tools Test: Audit shows inconsistent metrics on repeated calls for same run #25985

Description

Reproduction Steps

Observed Inconsistency

Root Cause Hypothesis

Impact

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions