Executive Summary
Over the last 24h, agentic LLM telemetry for github/gh-aw recorded 151,320,255 total tokens across 5,939 model calls spanning 142 active workflows. However, per-workflow token attribution is not currently possible from this telemetry: token usage and workflow identity are emitted on two disjoint span populations that share no queryable join key. The headline number is therefore reported by engine (the only attribution dimension with 100% coverage), not by workflow.
- Confirmed token volume: 151.3M tokens (148.4M input / 2.9M output) — input-dominated (98.1%).
- Top engine:
copilot / github_models = 138.7M tokens (91.7% of all consumption).
- Critical observability gap:
gh-aw.workflow.name is null on 100% of token-bearing spans; spans carrying both workflow name and token data = 0.
- Companion checks:
errors dataset = 0 events; logs severity:error = 0 events (both explicitly empty for the window).
Tooling note: search_events (AI/NL query) was not available in this MCP build (no embedded LLM provider). All queries used list_events with direct Sentry query syntax, as designed. get_trace_details was not available; trace continuity was verified via list_events filtered by trace:<id> instead.
Key Metrics
| Metric |
Value |
Token-bearing events analyzed (http.client w/ gen_ai.usage.*) |
5,939 |
Workflow-tagged lifecycle events (span.op:gen_ai*) |
3,667 |
| Events with token data |
5,939 |
| Events with both workflow name and token data |
0 |
| Total input tokens |
148,403,809 |
| Total output tokens |
2,916,446 |
| Total tokens |
151,320,255 |
| Unique workflows observed |
142 |
| Avg tokens / token-event |
25,479 |
| P95 tokens / token-event |
63,988 |
| Max single call |
136,526 |
Token Consumption by Engine (best available attribution — 100% coverage)
Engine totals reconcile exactly to the 5,939-span / 151,320,255-token population (no unattributed remainder).
Engine (gen_ai.system) |
Calls |
Input Tokens |
Output Tokens |
Total Tokens |
Avg/Call |
Share |
copilot (github_models) |
3,389 |
137,318,203 |
1,376,446 |
138,694,649 |
40,925 |
91.7% |
openai (codex) |
234 |
9,934,024 |
88,537 |
10,022,561 |
42,831 |
6.6% |
anthropic (claude) |
2,316 |
1,151,582 |
1,451,463 |
2,603,045 |
1,124 |
1.7% |
| Total |
5,939 |
148,403,809 |
2,916,446 |
151,320,255 |
25,479 |
100% |
Notable: anthropic averages 1,124 tokens/call vs 40,925 (copilot) and 42,831 (openai) — a ~36× gap. This is consistent with effective prompt caching / lean context on the Claude path versus full-context-per-call accounting on the Copilot and Codex paths, and is the single largest lever for reducing total consumption.
Top 10 Workflows by Activity Volume (NOT token-attributed)
Because token counts cannot be tied to a workflow, the table below ranks workflows by gen_ai lifecycle span count (an activity proxy), not by tokens. Do not read these as token rankings.
| Workflow |
gen_ai Events |
Token Total |
| Smoke CI |
506 |
N/A (unattributable) |
| PR Sous Chef |
182 |
N/A (unattributable) |
| AI Moderator |
94 |
N/A (unattributable) |
| Issue Monster |
68 |
N/A (unattributable) |
| Chaos PR Bundle Fuzzer |
60 |
N/A (unattributable) |
| Smoke Gemini |
58 |
N/A (unattributable) |
| Smoke Copilot |
56 |
N/A (unattributable) |
| Smoke Claude |
50 |
N/A (unattributable) |
| Smoke Codex |
50 |
N/A (unattributable) |
| PR Triage Agent |
48 |
N/A (unattributable) |
Data Quality and Gaps
Root cause — two disjoint span populations (confirmed, not inferred):
- Token-bearing spans —
span.op:http.client, auto-instrumented by each engine's own OTel SDK (Copilot CLI, Codex/OpenAI, Claude). These carry gen_ai.system and gen_ai.usage.input_tokens|output_tokens|total_tokens as proper integers, but gh-aw.workflow.name, github.run_id, gen_ai.request.model, and transaction are all null (verified by group-by on each — every grouping collapses to a single null bucket of 5,939 spans).
- Workflow-identifying spans —
span.op:gen_ai (3,667 spans, 142 workflows), emitted by actions/setup/js/send_otlp_span.cjs. These carry gh-aw.workflow.name and gen_ai.system, but token fields are null across every workflow group.
Direct proof of the gap: has:gh-aw.workflow.name has:gen_ai.usage.total_tokens returns count = 0.
Trace-level linkage exists but is not query-aggregatable. Representative trace 97aba4bfb13844c3c3c5387ca88f5ad4 contains both the workflow-identified gen_ai spans (Daily Community Attribution Updater, system github_models) and the token-bearing http.client spans (system copilot, ~105k tokens each) under one trace. So tokens could be attributed by a per-trace self-join (trace → workflow from gen_ai spans, then sum http.client tokens per trace), but the MCP list_events aggregate cannot self-join across the two populations in a single query.
Emit-side semantics cross-check (actions/setup/js/send_otlp_span.cjs):
- The gh-aw emitter does attach
gen_ai.usage.* (integers) — but only to the dedicated agent span (gh-aw.<job>.agent) or the agent conclusion span, sourced from /tmp/gh-aw/agent_usage.json (lines ~2087–2160). In the observed window no gh-aw lifecycle span carries any token data (group-by span.op on token spans yields only http.client), indicating that agent_usage.json roll-up path is not populating token attributes in production.
- Minor naming inconsistency: the gh-aw emitter labels the Copilot engine
github_models (per ENGINE_TO_SYSTEM_MAP), while the engine's own http.client instrumentation labels it copilot. Both refer to the same engine; cross-source joins must treat them as equivalent.
Other gaps:
events_missing_workflow (token spans): 5,939 / 5,939 = 100%.
gen_ai.request.model is null on all token spans — model-level cost breakdown is also unavailable.
gen_ai.request.tokens exists but is typed as a string in the schema (numeric sum() rejected with HTTP 400); only gen_ai.usage.* are numerically aggregatable.
Confirmed vs. Observability-Gap Findings
- Confirmed: 151.3M tokens consumed in 24h; engine split copilot 91.7% / openai 6.6% / anthropic 1.7%; input-token dominance (98.1%); errors and error-logs both zero.
- Observability gap (not a failure): per-workflow and per-model token attribution unavailable due to disjoint spans. No token value was invented or estimated for any workflow.
Recommendations
- Propagate gh-aw identity onto engine spans at the source (highest impact). Pass
OTEL_RESOURCE_ATTRIBUTES=gh-aw.workflow.name=...,github.run_id=... (and service.version) into the engine subprocess so its auto-instrumented http.client gen_ai spans inherit workflow/run identity. This bridges the two populations directly and makes sum(gen_ai.usage.total_tokens) group-by-workflow work in one query.
- Fix the
agent_usage.json roll-up path in send_otlp_span.cjs so the gh-aw.agent.agent span (which already carries gh-aw.workflow.name) actually emits gen_ai.usage.*. This provides a second, workflow-attributed token source independent of engine instrumentation.
- Attack the Copilot/OpenAI per-call input size. copilot and openai average ~41k–43k input tokens/call vs ~1.1k for anthropic. Investigate prompt caching, context pruning, and tool-result truncation on the Copilot/Codex engines — a small reduction here dominates total cost (Copilot alone is 91.7% of tokens).
- Normalize
gen_ai.request.tokens to a numeric type and populate gen_ai.request.model on token spans to unlock model-level cost reporting.
References
Generated by 📊 Daily Token Consumption Report (Sentry OTel) · opus48 2.2M · ◷
Executive Summary
Over the last 24h, agentic LLM telemetry for
github/gh-awrecorded 151,320,255 total tokens across 5,939 model calls spanning 142 active workflows. However, per-workflow token attribution is not currently possible from this telemetry: token usage and workflow identity are emitted on two disjoint span populations that share no queryable join key. The headline number is therefore reported by engine (the only attribution dimension with 100% coverage), not by workflow.copilot/github_models= 138.7M tokens (91.7% of all consumption).gh-aw.workflow.nameisnullon 100% of token-bearing spans; spans carrying both workflow name and token data = 0.errorsdataset = 0 events;logs severity:error= 0 events (both explicitly empty for the window).Key Metrics
http.clientw/gen_ai.usage.*)span.op:gen_ai*)Token Consumption by Engine (best available attribution — 100% coverage)
Engine totals reconcile exactly to the 5,939-span / 151,320,255-token population (no unattributed remainder).
gen_ai.system)copilot(github_models)openai(codex)anthropic(claude)Notable:
anthropicaverages 1,124 tokens/call vs 40,925 (copilot) and 42,831 (openai) — a ~36× gap. This is consistent with effective prompt caching / lean context on the Claude path versus full-context-per-call accounting on the Copilot and Codex paths, and is the single largest lever for reducing total consumption.Top 10 Workflows by Activity Volume (NOT token-attributed)
Because token counts cannot be tied to a workflow, the table below ranks workflows by
gen_ailifecycle span count (an activity proxy), not by tokens. Do not read these as token rankings.gen_aiEventsData Quality and Gaps
Root cause — two disjoint span populations (confirmed, not inferred):
span.op:http.client, auto-instrumented by each engine's own OTel SDK (Copilot CLI, Codex/OpenAI, Claude). These carrygen_ai.systemandgen_ai.usage.input_tokens|output_tokens|total_tokensas proper integers, butgh-aw.workflow.name,github.run_id,gen_ai.request.model, andtransactionare allnull(verified by group-by on each — every grouping collapses to a singlenullbucket of 5,939 spans).span.op:gen_ai(3,667 spans, 142 workflows), emitted byactions/setup/js/send_otlp_span.cjs. These carrygh-aw.workflow.nameandgen_ai.system, but token fields arenullacross every workflow group.Direct proof of the gap:
has:gh-aw.workflow.name has:gen_ai.usage.total_tokensreturns count = 0.Trace-level linkage exists but is not query-aggregatable. Representative trace
97aba4bfb13844c3c3c5387ca88f5ad4contains both the workflow-identifiedgen_aispans (Daily Community Attribution Updater, systemgithub_models) and the token-bearinghttp.clientspans (systemcopilot, ~105k tokens each) under onetrace. So tokens could be attributed by a per-trace self-join (trace → workflow fromgen_aispans, then sumhttp.clienttokens per trace), but the MCPlist_eventsaggregate cannot self-join across the two populations in a single query.Emit-side semantics cross-check (
actions/setup/js/send_otlp_span.cjs):gen_ai.usage.*(integers) — but only to the dedicated agent span (gh-aw.<job>.agent) or the agent conclusion span, sourced from/tmp/gh-aw/agent_usage.json(lines ~2087–2160). In the observed window no gh-aw lifecycle span carries any token data (group-byspan.opon token spans yields onlyhttp.client), indicating thatagent_usage.jsonroll-up path is not populating token attributes in production.github_models(perENGINE_TO_SYSTEM_MAP), while the engine's own http.client instrumentation labels itcopilot. Both refer to the same engine; cross-source joins must treat them as equivalent.Other gaps:
events_missing_workflow(token spans): 5,939 / 5,939 = 100%.gen_ai.request.modelisnullon all token spans — model-level cost breakdown is also unavailable.gen_ai.request.tokensexists but is typed as a string in the schema (numericsum()rejected with HTTP 400); onlygen_ai.usage.*are numerically aggregatable.Confirmed vs. Observability-Gap Findings
Recommendations
OTEL_RESOURCE_ATTRIBUTES=gh-aw.workflow.name=...,github.run_id=...(andservice.version) into the engine subprocess so its auto-instrumentedhttp.clientgen_ai spans inherit workflow/run identity. This bridges the two populations directly and makessum(gen_ai.usage.total_tokens)group-by-workflow work in one query.agent_usage.jsonroll-up path insend_otlp_span.cjsso thegh-aw.agent.agentspan (which already carriesgh-aw.workflow.name) actually emitsgen_ai.usage.*. This provides a second, workflow-attributed token source independent of engine instrumentation.gen_ai.request.tokensto a numeric type and populategen_ai.request.modelon token spans to unlock model-level cost reporting.References