feat: Reconcile per-run token usage against GitHub billing API for ground-truth cost attribution


## Summary

`gh aw logs --json` currently emits `estimated_cost` per run, and the AWF api-proxy captures `token-usage.jsonl` artifacts with input/output token counts. Both are useful, but for Copilot-backed runs the cost figure is a heuristic — Copilot doesn't expose billing-grade pricing, and the estimator has been observed to drift significantly from actual published rates (see #10300, where the estimate was off by ~220x for Claude Sonnet 4.5).

With Copilot moving to usage-based billing on June 1, 2026 (AI Credits, 1 credit = $0.01 USD), the gap between "what gh-aw reports" and "what GitHub actually charges" becomes a budgeting problem rather than a curiosity. Teams running agentic workflows across many repos need ground-truth per-workflow attribution to make decisions about model selection, scheduling, and which workflows are worth keeping agentic.

## Proposed feature

Add a reconciliation job (workflow or CLI subcommand — `gh aw billing reconcile`, perhaps) that:

1. **Pulls authoritative usage data** from the GitHub billing REST API:
   - `/orgs/{org}/settings/billing/usage` (org-level)
   - `/enterprises/{enterprise}/settings/billing/usage/summary` (enterprise-level, with `cost_center_id` filtering)
   - `/{scope}/settings/billing/premium_request/usage` for premium request consumption during the PRU-to-AI-Credits transition window
2. **Joins it against gh-aw's per-run telemetry** — the existing `token-usage.jsonl` artifacts plus `gh aw logs --json` output — on workflow run ID, repository, and time window.
3. **Emits a reconciled artifact** with per-run, per-workflow, per-repo AI Credit consumption attributed from billing-of-record, not estimated.

## Why this is the right shape

- **It side-steps the estimator-accuracy problem.** Rather than asking gh-aw to predict GitHub's billing, it asks gh-aw to read GitHub's billing and attribute it back. The API of record exists; we should use it.
- **It survives the June 1 transition.** Whether a run is billed under PRUs (legacy annual plans) or AI Credits (new usage-based), the billing API is the source of truth and gh-aw inherits the correctness for free.
- **It enables FinOps at scale.** Organizations running gh-aw across dozens of repos can finally answer "what did this workflow cost us last month, by repo, by model?" without manual spreadsheet work or trust in a heuristic.
- **It composes with existing primitives.** The Cost Tracker workflow in `githubnext/agentics` already posts per-run spend summaries from `token-usage.jsonl`. A reconciliation step would slot in naturally as a periodic reporting workflow.

## Acceptance criteria

- A documented CLI subcommand or sample workflow that takes a time window and an account scope (org / enterprise / cost center) and produces a reconciled JSON artifact.
- The artifact includes, per workflow run: run ID, repository, workflow name, agent/model, token counts (from `token-usage.jsonl`), and billed AI Credits (from billing API).
- Documentation in `reference/cost-management/` describing the reconciliation flow, required token scopes, and known limitations (e.g., billing API latency, scope of attribution when token counts can't be uniquely mapped to a billed line item).
- A note on how this complements — rather than replaces — the existing `estimated_cost` field, which remains useful for in-flight runs before billing data is available.

## Open questions

- Billing API latency: how soon after a run completes does its consumption appear in `/billing/usage`? This determines how "live" the reconciliation can be.
- Attribution granularity: the billing API aggregates by product and SKU; is per-workflow-run attribution achievable, or is the realistic floor per-repo-per-day?
- Token scope ergonomics: billing endpoints currently require classic PATs with specific permissions. Anything that can be done to make this less painful for org-wide deployments would help adoption.

## Context

- #10300 — estimator accuracy issue, ~220x drift observed
- `gh-aw-firewall` #1536 — original token-usage tracking implementation
- `gh-aw-firewall` #2262 — daily token usage reports showing `Estimated total cost: $0.00 (cost tracking not yet enabled)`
- GitHub blog, "Improving token efficiency in GitHub Agentic Workflows" — established pattern of auditor/optimizer workflows operating on token-usage artifacts
- GitHub blog, "GitHub Copilot is moving to usage-based billing" — June 1, 2026 transition


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Reconcile per-run token usage against GitHub billing API for ground-truth cost attribution #34981

Summary

Proposed feature

Why this is the right shape

Acceptance criteria

Open questions

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: Reconcile per-run token usage against GitHub billing API for ground-truth cost attribution #34981

Description

Summary

Proposed feature

Why this is the right shape

Acceptance criteria

Open questions

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions