Skip to content

[cli-tools-test] Daily CLI Tools Test: Audit shows inconsistent metrics on repeated calls for same run #25985

@github-actions

Description

@github-actions

Daily exploratory testing found that the audit tool returns inconsistent metrics for the same run when called multiple times.

Reproduction Steps

  1. Audit run 24326834856 using its run ID:
    audit(run_id_or_url: "24326834856")
    ```
    
  2. Audit the same run using its URL:
    audit(run_id_or_url: "https://github.com/github/gh-aw/actions/runs/24326834856")
    

Observed Inconsistency

Metric Call 1 (run ID) Call 2 (URL)
metrics.token_usage 381,270 4,714,624 (12× higher)
metrics.turns 9 22
comparison.delta.turns.after 9 22
effective_tokens 423,687 423,687 ✅ same
tokens_per_minute 82,288 1,017,544
firewall.total_requests 9 9 ✅ same
Cached run_summary.json size 15,321 bytes 15,561 bytes

The firewall data (ground truth) consistently shows 9 requests and 381k tokens. The second audit returned inflated values (4.71M tokens, 22 turns) that do not match firewall records.

Root Cause Hypothesis

The second audit likely fetched fresh data from GitHub APIs and overwrote the cached run_summary.json (size increased from 15,321 → 15,561 bytes). The fresh data may aggregate token usage from a different source than the firewall log, causing the inconsistency.

This means the token_usage and turns fields are non-deterministic across audit calls for the same run.

Impact

  • Severity: High
  • Frequency: Reproducible — the first audit of a run may show different numbers than subsequent audits
  • User impact: Engineers relying on audit reports for cost tracking or optimization may see misleading numbers

Additional Context

  • Run: §24326834856 (GPL Dependency Cleaner, success)
  • Testing run: §24327293076
  • Date: 2026-04-13
  • gh-aw version: d1c210e (v1.0.21)

Generated by Daily CLI Tools Exploratory Tester · ● 1.4M ·

  • expires on Apr 20, 2026, 5:40 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions