Skip to content

[token-consumption] Daily Token Consumption Report - 2026-06-02 #36445

@github-actions

Description

@github-actions

Executive Summary

Over the last 24h, agentic LLM telemetry for github/gh-aw recorded 151,320,255 total tokens across 5,939 model calls spanning 142 active workflows. However, per-workflow token attribution is not currently possible from this telemetry: token usage and workflow identity are emitted on two disjoint span populations that share no queryable join key. The headline number is therefore reported by engine (the only attribution dimension with 100% coverage), not by workflow.

  • Confirmed token volume: 151.3M tokens (148.4M input / 2.9M output) — input-dominated (98.1%).
  • Top engine: copilot / github_models = 138.7M tokens (91.7% of all consumption).
  • Critical observability gap: gh-aw.workflow.name is null on 100% of token-bearing spans; spans carrying both workflow name and token data = 0.
  • Companion checks: errors dataset = 0 events; logs severity:error = 0 events (both explicitly empty for the window).

Tooling note: search_events (AI/NL query) was not available in this MCP build (no embedded LLM provider). All queries used list_events with direct Sentry query syntax, as designed. get_trace_details was not available; trace continuity was verified via list_events filtered by trace:<id> instead.

Key Metrics

Metric Value
Token-bearing events analyzed (http.client w/ gen_ai.usage.*) 5,939
Workflow-tagged lifecycle events (span.op:gen_ai*) 3,667
Events with token data 5,939
Events with both workflow name and token data 0
Total input tokens 148,403,809
Total output tokens 2,916,446
Total tokens 151,320,255
Unique workflows observed 142
Avg tokens / token-event 25,479
P95 tokens / token-event 63,988
Max single call 136,526

Token Consumption by Engine (best available attribution — 100% coverage)

Engine totals reconcile exactly to the 5,939-span / 151,320,255-token population (no unattributed remainder).

Engine (gen_ai.system) Calls Input Tokens Output Tokens Total Tokens Avg/Call Share
copilot (github_models) 3,389 137,318,203 1,376,446 138,694,649 40,925 91.7%
openai (codex) 234 9,934,024 88,537 10,022,561 42,831 6.6%
anthropic (claude) 2,316 1,151,582 1,451,463 2,603,045 1,124 1.7%
Total 5,939 148,403,809 2,916,446 151,320,255 25,479 100%

Notable: anthropic averages 1,124 tokens/call vs 40,925 (copilot) and 42,831 (openai) — a ~36× gap. This is consistent with effective prompt caching / lean context on the Claude path versus full-context-per-call accounting on the Copilot and Codex paths, and is the single largest lever for reducing total consumption.

Top 10 Workflows by Activity Volume (NOT token-attributed)

Because token counts cannot be tied to a workflow, the table below ranks workflows by gen_ai lifecycle span count (an activity proxy), not by tokens. Do not read these as token rankings.

Workflow gen_ai Events Token Total
Smoke CI 506 N/A (unattributable)
PR Sous Chef 182 N/A (unattributable)
AI Moderator 94 N/A (unattributable)
Issue Monster 68 N/A (unattributable)
Chaos PR Bundle Fuzzer 60 N/A (unattributable)
Smoke Gemini 58 N/A (unattributable)
Smoke Copilot 56 N/A (unattributable)
Smoke Claude 50 N/A (unattributable)
Smoke Codex 50 N/A (unattributable)
PR Triage Agent 48 N/A (unattributable)
Data Quality and Gaps

Root cause — two disjoint span populations (confirmed, not inferred):

  1. Token-bearing spansspan.op:http.client, auto-instrumented by each engine's own OTel SDK (Copilot CLI, Codex/OpenAI, Claude). These carry gen_ai.system and gen_ai.usage.input_tokens|output_tokens|total_tokens as proper integers, but gh-aw.workflow.name, github.run_id, gen_ai.request.model, and transaction are all null (verified by group-by on each — every grouping collapses to a single null bucket of 5,939 spans).
  2. Workflow-identifying spansspan.op:gen_ai (3,667 spans, 142 workflows), emitted by actions/setup/js/send_otlp_span.cjs. These carry gh-aw.workflow.name and gen_ai.system, but token fields are null across every workflow group.

Direct proof of the gap: has:gh-aw.workflow.name has:gen_ai.usage.total_tokens returns count = 0.

Trace-level linkage exists but is not query-aggregatable. Representative trace 97aba4bfb13844c3c3c5387ca88f5ad4 contains both the workflow-identified gen_ai spans (Daily Community Attribution Updater, system github_models) and the token-bearing http.client spans (system copilot, ~105k tokens each) under one trace. So tokens could be attributed by a per-trace self-join (trace → workflow from gen_ai spans, then sum http.client tokens per trace), but the MCP list_events aggregate cannot self-join across the two populations in a single query.

Emit-side semantics cross-check (actions/setup/js/send_otlp_span.cjs):

  • The gh-aw emitter does attach gen_ai.usage.* (integers) — but only to the dedicated agent span (gh-aw.<job>.agent) or the agent conclusion span, sourced from /tmp/gh-aw/agent_usage.json (lines ~2087–2160). In the observed window no gh-aw lifecycle span carries any token data (group-by span.op on token spans yields only http.client), indicating that agent_usage.json roll-up path is not populating token attributes in production.
  • Minor naming inconsistency: the gh-aw emitter labels the Copilot engine github_models (per ENGINE_TO_SYSTEM_MAP), while the engine's own http.client instrumentation labels it copilot. Both refer to the same engine; cross-source joins must treat them as equivalent.

Other gaps:

  • events_missing_workflow (token spans): 5,939 / 5,939 = 100%.
  • gen_ai.request.model is null on all token spans — model-level cost breakdown is also unavailable.
  • gen_ai.request.tokens exists but is typed as a string in the schema (numeric sum() rejected with HTTP 400); only gen_ai.usage.* are numerically aggregatable.
Confirmed vs. Observability-Gap Findings
  • Confirmed: 151.3M tokens consumed in 24h; engine split copilot 91.7% / openai 6.6% / anthropic 1.7%; input-token dominance (98.1%); errors and error-logs both zero.
  • Observability gap (not a failure): per-workflow and per-model token attribution unavailable due to disjoint spans. No token value was invented or estimated for any workflow.

Recommendations

  1. Propagate gh-aw identity onto engine spans at the source (highest impact). Pass OTEL_RESOURCE_ATTRIBUTES=gh-aw.workflow.name=...,github.run_id=... (and service.version) into the engine subprocess so its auto-instrumented http.client gen_ai spans inherit workflow/run identity. This bridges the two populations directly and makes sum(gen_ai.usage.total_tokens) group-by-workflow work in one query.
  2. Fix the agent_usage.json roll-up path in send_otlp_span.cjs so the gh-aw.agent.agent span (which already carries gh-aw.workflow.name) actually emits gen_ai.usage.*. This provides a second, workflow-attributed token source independent of engine instrumentation.
  3. Attack the Copilot/OpenAI per-call input size. copilot and openai average ~41k–43k input tokens/call vs ~1.1k for anthropic. Investigate prompt caching, context pruning, and tool-result truncation on the Copilot/Codex engines — a small reduction here dominates total cost (Copilot alone is 91.7% of tokens).
  4. Normalize gen_ai.request.tokens to a numeric type and populate gen_ai.request.model on token spans to unlock model-level cost reporting.

References

Generated by 📊 Daily Token Consumption Report (Sentry OTel) · opus48 2.2M ·

  • expires on Jun 3, 2026, 1:08 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions