Daily exploratory testing found that the audit tool returns inconsistent metrics for the same run when called multiple times.
Reproduction Steps
- Audit run
24326834856 using its run ID:
audit(run_id_or_url: "24326834856")
```
- Audit the same run using its URL:
audit(run_id_or_url: "https://github.com/github/gh-aw/actions/runs/24326834856")
Observed Inconsistency
| Metric |
Call 1 (run ID) |
Call 2 (URL) |
metrics.token_usage |
381,270 |
4,714,624 (12× higher) |
metrics.turns |
9 |
22 |
comparison.delta.turns.after |
9 |
22 |
effective_tokens |
423,687 |
423,687 ✅ same |
tokens_per_minute |
82,288 |
1,017,544 |
firewall.total_requests |
9 |
9 ✅ same |
Cached run_summary.json size |
15,321 bytes |
15,561 bytes |
The firewall data (ground truth) consistently shows 9 requests and 381k tokens. The second audit returned inflated values (4.71M tokens, 22 turns) that do not match firewall records.
Root Cause Hypothesis
The second audit likely fetched fresh data from GitHub APIs and overwrote the cached run_summary.json (size increased from 15,321 → 15,561 bytes). The fresh data may aggregate token usage from a different source than the firewall log, causing the inconsistency.
This means the token_usage and turns fields are non-deterministic across audit calls for the same run.
Impact
- Severity: High
- Frequency: Reproducible — the first audit of a run may show different numbers than subsequent audits
- User impact: Engineers relying on audit reports for cost tracking or optimization may see misleading numbers
Additional Context
- Run: §24326834856 (GPL Dependency Cleaner, success)
- Testing run: §24327293076
- Date: 2026-04-13
- gh-aw version:
d1c210e (v1.0.21)
Generated by Daily CLI Tools Exploratory Tester · ● 1.4M · ◷
Daily exploratory testing found that the
audittool returns inconsistent metrics for the same run when called multiple times.Reproduction Steps
24326834856using its run ID:Observed Inconsistency
metrics.token_usagemetrics.turnscomparison.delta.turns.aftereffective_tokenstokens_per_minutefirewall.total_requestsrun_summary.jsonsizeThe firewall data (ground truth) consistently shows 9 requests and 381k tokens. The second audit returned inflated values (
4.71M tokens,22 turns) that do not match firewall records.Root Cause Hypothesis
The second audit likely fetched fresh data from GitHub APIs and overwrote the cached
run_summary.json(size increased from 15,321 → 15,561 bytes). The fresh data may aggregate token usage from a different source than the firewall log, causing the inconsistency.This means the
token_usageandturnsfields are non-deterministic across audit calls for the same run.Impact
Additional Context
d1c210e(v1.0.21)