Executive Summary
Over the last 24 hours, agentic workflows in github/gh-aw consumed ~120.6M tokens across 5,588 model-call spans, with usage overwhelmingly input-dominated: ~117.8M input vs ~2.84M output tokens (a ~41:1 input:output ratio). Prompt/context size — not generation length — is the dominant cost driver.
Reliability is clean: companion checks on the errors and logs datasets both returned zero error events / error-level logs in the window.
The headline data-quality issue is structural, not a one-off: every token-bearing span carries null for gh-aw.workflow.name, transaction, and gen_ai.request.model. Token usage is emitted only on span.op:http.client spans (the instrumented model-API calls), while the workflow name lives on a separate gen_ai setup span in the same trace. Per-workflow attribution below was reconstructed by joining token spans to setup spans on trace — it is evidence-based but, by necessity, covers the heaviest traces rather than a complete per-workflow rollup.
Key Metrics
| Metric |
Value |
| Events analyzed (model-call spans w/ token data) |
5,588 |
| Events with token data |
5,588 (100% of scope) |
| Total input tokens |
117,805,127 |
| Total output tokens |
2,836,882 |
| Total tokens |
120,642,009 |
| Unique workflows emitting telemetry |
≥ 50 (inventory query capped at 50) |
| Avg tokens/event |
21,615 |
| P95 tokens/event |
60,859 |
| Max tokens/event |
157,085 |
| Events missing workflow identifier |
5,588 (100% — see Data Quality) |
| Error events / error logs (24h) |
0 / 0 |
Top Token Consumers (workflow rollup of the 12 heaviest traces)
Reconstructed via trace-join. Daily Ambient Context Optimizer recurs 3× in the heaviest traces, making it the clear top aggregate consumer; Smoke Codex recurs 2×. Counts are a lower bound (top-trace sample only), not full per-workflow totals.
| Workflow |
Runs (traces) |
Input Tokens |
Output Tokens |
Total Tokens |
| Daily Ambient Context Optimizer |
3 |
7,378,432 |
69,272 |
7,447,704 |
| Smoke Codex |
2 |
5,063,812 |
29,600 |
5,093,412 |
| Daily CLI Tools Exploratory Tester |
1 |
2,546,859 |
17,135 |
2,563,994 |
| Ubuntu Actions Image Analyzer |
1 |
2,521,578 |
19,479 |
2,541,057 |
| Code Simplifier |
1 |
2,510,812 |
21,646 |
2,532,458 |
| Workflow Skill Extractor |
1 |
2,515,889 |
13,320 |
2,529,209 |
| Daily Firewall Logs Collector and Reporter |
1 |
2,473,342 |
37,006 |
2,510,348 |
| UK AI Operational Resilience |
1 |
2,400,766 |
48,018 |
2,448,784 |
| Package Specification Extractor |
1 |
1,929,515 |
10,195 |
1,939,710 |
Per-trace evidence (12 heaviest traces, fully confirmed)
Each row is one workflow run (trace), with token sums from the spans dataset (has:gen_ai.usage.total_tokens) and the workflow name resolved from that trace's setup span. The 12 traces total ~29.6M tokens (~24.5% of the 24h total); the remainder is spread across ~100+ lighter traces that were not individually attributed.
| Trace |
Workflow |
Calls |
Input |
Output |
Total |
| 9dde8a13... |
Smoke Codex |
46 |
2,860,649 |
18,025 |
2,878,674 |
| a97c3ad4... |
Daily CLI Tools Exploratory Tester |
55 |
2,546,859 |
17,135 |
2,563,994 |
| 2172db96... |
Ubuntu Actions Image Analyzer |
64 |
2,521,578 |
19,479 |
2,541,057 |
| 78992741... |
Code Simplifier |
62 |
2,510,812 |
21,646 |
2,532,458 |
| e719065f... |
Workflow Skill Extractor |
49 |
2,515,889 |
13,320 |
2,529,209 |
| 365b9906... |
Daily Ambient Context Optimizer |
51 |
2,504,399 |
22,853 |
2,527,252 |
| 4927d2ab... |
Daily Firewall Logs Collector and Reporter |
51 |
2,473,342 |
37,006 |
2,510,348 |
| 90dcd657... |
Daily Ambient Context Optimizer |
60 |
2,444,318 |
20,148 |
2,464,466 |
| 42fbd029... |
Daily Ambient Context Optimizer |
57 |
2,429,715 |
26,271 |
2,455,986 |
| c7e9cfec... |
UK AI Operational Resilience |
48 |
2,400,766 |
48,018 |
2,448,784 |
| 0a1cbc61... |
Smoke Codex |
36 |
2,203,163 |
11,575 |
2,214,738 |
| 4201497c... |
Package Specification Extractor |
36 |
1,929,515 |
10,195 |
1,939,710 |
Verification: trace 9dde8a133d9e4fd64d365fac19169092 was confirmed end-to-end — it contains 46 token-bearing http.client spans plus a gen_ai setup span carrying gh-aw.workflow.name = "Smoke Codex", validating the trace-join continuity.
Workflow activity inventory (by run count, from setup spans)
Highest-frequency workflows by unique traces in 24h (these carry gh-aw.workflow.name but not token data — they cannot be summed for tokens directly):
- Smoke CI — 109 runs
- Auto-Triage Issues — 23
- AI Moderator — 21
- PR Sous Chef — 18
- Test Quality Sentinel — 15 · PR Code Quality Reviewer — 15 · Issue Monster — 15 · Matt Pocock Skills Reviewer — 15 · Design Decision Gate 🏗️ — 15
- Smoke Copilot — 9 · Daily Ambient Context Optimizer — 9
- (≥50 distinct workflows total; list capped at 50)
Note the divergence: high frequency (e.g. Smoke CI, 109 runs) does not imply high token consumption — the heaviest traces belong to lower-frequency, high-context workflows. Smoke CI's aggregate could not be confirmed because its per-run token spans are not among the heavy traces.
Data Quality and Gaps
- Workflow identifier missing on 100% of token spans.
gen_ai.usage.* is emitted on span.op:http.client spans where gh-aw.workflow.name, transaction, and gen_ai.request.model are all null. Workflow attribution required a manual trace-join (group token spans by trace, then resolve the workflow from each trace's gen_ai setup span).
search_events / Seer unavailable in this MCP build (no embedded LLM provider). All queries used list_events with direct Sentry query syntax against dataset:spans.
- Model + run_id not attributable on token spans:
gen_ai.request.model is null and github.run_id returned null when grouping token/setup spans, so cost-by-model and run-level rollups are not possible from these spans today.
- Live-ingestion drift: the token-span count read as 5,581 then 5,588 across consecutive queries; headline uses the latest (5,588). Sums (~120.6M) are from the snapshot and may move slightly.
- Attribution coverage: only the 12 heaviest traces (~24.5% of total tokens) were individually attributed; full per-workflow totals would require resolving all ~100+ remaining traces.
- Token precedence: all usable records carried
gen_ai.usage.*; no ai.*/usage.*/prompt_tokens aliases were observed, so no double-count risk applied.
Recommendations
- Propagate workflow context onto the model-call spans (highest leverage). In
actions/setup/js/send_otlp_span.cjs, add gh-aw.workflow.name (and ideally gen_ai.request.model + github.run_id) as attributes — or resource attributes — on the http.client spans that carry gen_ai.usage.*. This removes the need for trace-joins and unlocks one-query tokens-by-workflow and tokens-by-model reporting.
- Investigate
Daily Ambient Context Optimizer — the top aggregate consumer (≥7.4M tokens across 3 runs, ~2.48M/run). Audit how much repo/context it loads per run; trim or cache the ambient context to cut input tokens.
- Target the input side, not output. With a ~41:1 input:output ratio, savings come from prompt/context reduction: prune system prompts, scope file/context reads, and enable prompt caching for repeated context across the high-context workflows (
Code Simplifier, Ubuntu Actions Image Analyzer, Workflow Skill Extractor, UK AI Operational Resilience).
- Right-size smoke tests.
Smoke Codex runs spent ~2.5M tokens each (top of the heavy-trace list). Confirm smoke workflows use minimal fixtures/prompts rather than full-context payloads.
References
Generated by 📊 Daily Token Consumption Report (Sentry OTel) · opus48 12.8M · ◷
Executive Summary
Over the last 24 hours, agentic workflows in
github/gh-awconsumed ~120.6M tokens across 5,588 model-call spans, with usage overwhelmingly input-dominated: ~117.8M input vs ~2.84M output tokens (a ~41:1 input:output ratio). Prompt/context size — not generation length — is the dominant cost driver.Reliability is clean: companion checks on the
errorsandlogsdatasets both returned zero error events / error-level logs in the window.The headline data-quality issue is structural, not a one-off: every token-bearing span carries
nullforgh-aw.workflow.name,transaction, andgen_ai.request.model. Token usage is emitted only onspan.op:http.clientspans (the instrumented model-API calls), while the workflow name lives on a separategen_aisetup span in the same trace. Per-workflow attribution below was reconstructed by joining token spans to setup spans ontrace— it is evidence-based but, by necessity, covers the heaviest traces rather than a complete per-workflow rollup.Key Metrics
Top Token Consumers (workflow rollup of the 12 heaviest traces)
Reconstructed via trace-join.
Daily Ambient Context Optimizerrecurs 3× in the heaviest traces, making it the clear top aggregate consumer;Smoke Codexrecurs 2×. Counts are a lower bound (top-trace sample only), not full per-workflow totals.Per-trace evidence (12 heaviest traces, fully confirmed)
Each row is one workflow run (
trace), with token sums from thespansdataset (has:gen_ai.usage.total_tokens) and the workflow name resolved from that trace's setup span. The 12 traces total ~29.6M tokens (~24.5% of the 24h total); the remainder is spread across ~100+ lighter traces that were not individually attributed.Verification: trace
9dde8a133d9e4fd64d365fac19169092was confirmed end-to-end — it contains 46 token-bearinghttp.clientspans plus agen_aisetup span carryinggh-aw.workflow.name = "Smoke Codex", validating the trace-join continuity.Workflow activity inventory (by run count, from setup spans)
Highest-frequency workflows by unique traces in 24h (these carry
gh-aw.workflow.namebut not token data — they cannot be summed for tokens directly):Note the divergence: high frequency (e.g. Smoke CI, 109 runs) does not imply high token consumption — the heaviest traces belong to lower-frequency, high-context workflows. Smoke CI's aggregate could not be confirmed because its per-run token spans are not among the heavy traces.
Data Quality and Gaps
gen_ai.usage.*is emitted onspan.op:http.clientspans wheregh-aw.workflow.name,transaction, andgen_ai.request.modelare allnull. Workflow attribution required a manual trace-join (group token spans bytrace, then resolve the workflow from each trace'sgen_aisetup span).search_events/ Seer unavailable in this MCP build (no embedded LLM provider). All queries usedlist_eventswith direct Sentry query syntax againstdataset:spans.gen_ai.request.modelis null andgithub.run_idreturned null when grouping token/setup spans, so cost-by-model and run-level rollups are not possible from these spans today.gen_ai.usage.*; noai.*/usage.*/prompt_tokensaliases were observed, so no double-count risk applied.Recommendations
actions/setup/js/send_otlp_span.cjs, addgh-aw.workflow.name(and ideallygen_ai.request.model+github.run_id) as attributes — or resource attributes — on thehttp.clientspans that carrygen_ai.usage.*. This removes the need for trace-joins and unlocks one-query tokens-by-workflow and tokens-by-model reporting.Daily Ambient Context Optimizer— the top aggregate consumer (≥7.4M tokens across 3 runs, ~2.48M/run). Audit how much repo/context it loads per run; trim or cache the ambient context to cut input tokens.Code Simplifier,Ubuntu Actions Image Analyzer,Workflow Skill Extractor,UK AI Operational Resilience).Smoke Codexruns spent ~2.5M tokens each (top of the heavy-trace list). Confirm smoke workflows use minimal fixtures/prompts rather than full-context payloads.References