[token-consumption] Daily Token Consumption Report - 2026-06-02

### Executive Summary

Over the last 24h, agentic LLM telemetry for `github/gh-aw` recorded **151,320,255 total tokens** across **5,939 model calls** spanning **142 active workflows**. However, **per-workflow token attribution is not currently possible** from this telemetry: token usage and workflow identity are emitted on two **disjoint span populations** that share no queryable join key. The headline number is therefore reported by **engine** (the only attribution dimension with 100% coverage), not by workflow.

- **Confirmed token volume:** 151.3M tokens (148.4M input / 2.9M output) — input-dominated (98.1%).
- **Top engine:** `copilot` / `github_models` = 138.7M tokens (**91.7%** of all consumption).
- **Critical observability gap:** `gh-aw.workflow.name` is `null` on **100%** of token-bearing spans; spans carrying *both* workflow name and token data = **0**.
- **Companion checks:** `errors` dataset = 0 events; `logs severity:error` = 0 events (both explicitly empty for the window).

> Tooling note: `search_events` (AI/NL query) was **not available** in this MCP build (no embedded LLM provider). All queries used `list_events` with direct Sentry query syntax, as designed. `get_trace_details` was not available; trace continuity was verified via `list_events` filtered by `trace:<id>` instead.

### Key Metrics

| Metric | Value |
|---|---|
| Token-bearing events analyzed (`http.client` w/ `gen_ai.usage.*`) | 5,939 |
| Workflow-tagged lifecycle events (`span.op:gen_ai*`) | 3,667 |
| Events with token data | 5,939 |
| Events with **both** workflow name **and** token data | **0** |
| Total input tokens | 148,403,809 |
| Total output tokens | 2,916,446 |
| Total tokens | 151,320,255 |
| Unique workflows observed | 142 |
| Avg tokens / token-event | 25,479 |
| P95 tokens / token-event | 63,988 |
| Max single call | 136,526 |

### Token Consumption by Engine (best available attribution — 100% coverage)

Engine totals reconcile exactly to the 5,939-span / 151,320,255-token population (no unattributed remainder).

| Engine (`gen_ai.system`) | Calls | Input Tokens | Output Tokens | Total Tokens | Avg/Call | Share |
|---|---:|---:|---:|---:|---:|---:|
| `copilot` (github_models) | 3,389 | 137,318,203 | 1,376,446 | 138,694,649 | 40,925 | 91.7% |
| `openai` (codex) | 234 | 9,934,024 | 88,537 | 10,022,561 | 42,831 | 6.6% |
| `anthropic` (claude) | 2,316 | 1,151,582 | 1,451,463 | 2,603,045 | 1,124 | 1.7% |
| **Total** | **5,939** | **148,403,809** | **2,916,446** | **151,320,255** | **25,479** | **100%** |

Notable: `anthropic` averages **1,124 tokens/call** vs **40,925** (copilot) and **42,831** (openai) — a ~36× gap. This is consistent with effective prompt caching / lean context on the Claude path versus full-context-per-call accounting on the Copilot and Codex paths, and is the single largest lever for reducing total consumption.

### Top 10 Workflows by Activity Volume (NOT token-attributed)

Because token counts cannot be tied to a workflow, the table below ranks workflows by **`gen_ai` lifecycle span count** (an activity proxy), not by tokens. Do **not** read these as token rankings.

| Workflow | `gen_ai` Events | Token Total |
|---|---:|---:|
| Smoke CI | 506 | N/A (unattributable) |
| PR Sous Chef | 182 | N/A (unattributable) |
| AI Moderator | 94 | N/A (unattributable) |
| Issue Monster | 68 | N/A (unattributable) |
| Chaos PR Bundle Fuzzer | 60 | N/A (unattributable) |
| Smoke Gemini | 58 | N/A (unattributable) |
| Smoke Copilot | 56 | N/A (unattributable) |
| Smoke Claude | 50 | N/A (unattributable) |
| Smoke Codex | 50 | N/A (unattributable) |
| PR Triage Agent | 48 | N/A (unattributable) |

<details>
<summary>Data Quality and Gaps</summary>

**Root cause — two disjoint span populations (confirmed, not inferred):**

1. **Token-bearing spans** — `span.op:http.client`, auto-instrumented by each engine's own OTel SDK (Copilot CLI, Codex/OpenAI, Claude). These carry `gen_ai.system` and `gen_ai.usage.input_tokens|output_tokens|total_tokens` as proper integers, but **`gh-aw.workflow.name`, `github.run_id`, `gen_ai.request.model`, and `transaction` are all `null`** (verified by group-by on each — every grouping collapses to a single `null` bucket of 5,939 spans).
2. **Workflow-identifying spans** — `span.op:gen_ai` (3,667 spans, 142 workflows), emitted by `actions/setup/js/send_otlp_span.cjs`. These carry `gh-aw.workflow.name` and `gen_ai.system`, but **token fields are `null`** across every workflow group.

**Direct proof of the gap:** `has:gh-aw.workflow.name has:gen_ai.usage.total_tokens` returns **count = 0**.

**Trace-level linkage exists but is not query-aggregatable.** Representative trace `97aba4bfb13844c3c3c5387ca88f5ad4` contains *both* the workflow-identified `gen_ai` spans (`Daily Community Attribution Updater`, system `github_models`) and the token-bearing `http.client` spans (system `copilot`, ~105k tokens each) under one `trace`. So tokens *could* be attributed by a per-trace self-join (trace → workflow from `gen_ai` spans, then sum `http.client` tokens per trace), but the MCP `list_events` aggregate cannot self-join across the two populations in a single query.

**Emit-side semantics cross-check (`actions/setup/js/send_otlp_span.cjs`):**
- The gh-aw emitter *does* attach `gen_ai.usage.*` (integers) — but only to the dedicated agent span (`gh-aw.<job>.agent`) or the agent conclusion span, sourced from `/tmp/gh-aw/agent_usage.json` (lines ~2087–2160). In the observed window **no gh-aw lifecycle span carries any token data** (group-by `span.op` on token spans yields only `http.client`), indicating that `agent_usage.json` roll-up path is not populating token attributes in production.
- Minor naming inconsistency: the gh-aw emitter labels the Copilot engine `github_models` (per `ENGINE_TO_SYSTEM_MAP`), while the engine's own http.client instrumentation labels it `copilot`. Both refer to the same engine; cross-source joins must treat them as equivalent.

**Other gaps:**
- `events_missing_workflow` (token spans): 5,939 / 5,939 = **100%**.
- `gen_ai.request.model` is `null` on all token spans — model-level cost breakdown is also unavailable.
- `gen_ai.request.tokens` exists but is typed as a **string** in the schema (numeric `sum()` rejected with HTTP 400); only `gen_ai.usage.*` are numerically aggregatable.

</details>

<details>
<summary>Confirmed vs. Observability-Gap Findings</summary>

- **Confirmed:** 151.3M tokens consumed in 24h; engine split copilot 91.7% / openai 6.6% / anthropic 1.7%; input-token dominance (98.1%); errors and error-logs both zero.
- **Observability gap (not a failure):** per-workflow and per-model token attribution unavailable due to disjoint spans. No token value was invented or estimated for any workflow.

</details>

### Recommendations

1. **Propagate gh-aw identity onto engine spans at the source (highest impact).** Pass `OTEL_RESOURCE_ATTRIBUTES=gh-aw.workflow.name=...,github.run_id=...` (and `service.version`) into the engine subprocess so its auto-instrumented `http.client` gen_ai spans inherit workflow/run identity. This bridges the two populations directly and makes `sum(gen_ai.usage.total_tokens)` group-by-workflow work in one query.
2. **Fix the `agent_usage.json` roll-up path** in `send_otlp_span.cjs` so the `gh-aw.agent.agent` span (which already carries `gh-aw.workflow.name`) actually emits `gen_ai.usage.*`. This provides a second, workflow-attributed token source independent of engine instrumentation.
3. **Attack the Copilot/OpenAI per-call input size.** copilot and openai average ~41k–43k input tokens/call vs ~1.1k for anthropic. Investigate prompt caching, context pruning, and tool-result truncation on the Copilot/Codex engines — a small reduction here dominates total cost (Copilot alone is 91.7% of tokens).
4. **Normalize `gen_ai.request.tokens` to a numeric type** and populate `gen_ai.request.model` on token spans to unlock model-level cost reporting.

### References

- Engine token breakdown (Sentry): https://github.sentry.io/explore/traces/?query=has:gen_ai.usage.total_tokens&project=4511347087179777&mode=aggregate&statsPeriod=24h&table=span
- Verified representative trace: https://github.sentry.io/explore/traces/trace/97aba4bfb13844c3c3c5387ca88f5ad4
- Workflow run: [§26821258456](https://github.com/github/gh-aw/actions/runs/26821258456)







> Generated by [📊 Daily Token Consumption Report (Sentry OTel)](https://github.com/github/gh-aw/actions/runs/26821258456) · opus48 2.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-token-consumption-report%22&type=issues)
> - [x] expires  on Jun 3, 2026, 1:08 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[token-consumption] Daily Token Consumption Report - 2026-06-02 #36445

Executive Summary

Key Metrics

Token Consumption by Engine (best available attribution — 100% coverage)

Top 10 Workflows by Activity Volume (NOT token-attributed)

Recommendations

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Token-bearing events analyzed (`http.client` w/ `gen_ai.usage.*`)	5,939
Workflow-tagged lifecycle events (`span.op:gen_ai*`)	3,667
Events with token data	5,939
Events with both workflow name and token data	0
Total input tokens	148,403,809
Total output tokens	2,916,446
Total tokens	151,320,255
Unique workflows observed	142
Avg tokens / token-event	25,479
P95 tokens / token-event	63,988
Max single call	136,526

Engine (`gen_ai.system`)	Calls	Input Tokens	Output Tokens	Total Tokens	Avg/Call	Share
`copilot` (github_models)	3,389	137,318,203	1,376,446	138,694,649	40,925	91.7%
`openai` (codex)	234	9,934,024	88,537	10,022,561	42,831	6.6%
`anthropic` (claude)	2,316	1,151,582	1,451,463	2,603,045	1,124	1.7%
Total	5,939	148,403,809	2,916,446	151,320,255	25,479	100%

Workflow	`gen_ai` Events	Token Total
Smoke CI	506	N/A (unattributable)
PR Sous Chef	182	N/A (unattributable)
AI Moderator	94	N/A (unattributable)
Issue Monster	68	N/A (unattributable)
Chaos PR Bundle Fuzzer	60	N/A (unattributable)
Smoke Gemini	58	N/A (unattributable)
Smoke Copilot	56	N/A (unattributable)
Smoke Claude	50	N/A (unattributable)
Smoke Codex	50	N/A (unattributable)
PR Triage Agent	48	N/A (unattributable)

[token-consumption] Daily Token Consumption Report - 2026-06-02 #36445

Description

Executive Summary

Key Metrics

Token Consumption by Engine (best available attribution — 100% coverage)

Top 10 Workflows by Activity Volume (NOT token-attributed)

Recommendations

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions