Skip to content

[token-consumption] Daily Token Consumption Report - 2026-05-25 #34646

@github-actions

Description

@github-actions

Executive Summary

In the last 24 hours, agentic workflows in github/gh-aw emitted 5,475 token-bearing LLM API spans consuming 144.86M tokens (142.75M input / 2.12M output). The token mix is heavily input-dominated — agents push large prompts (full repo context, logs) and receive comparatively small completions. claude-sonnet-4.6 carries 81.9 % of total tokens across 2,418 spans, with gpt-5.4-mini a distant second at 11.5 %.

The top 10 individual runs account for ~26 % of the day's tokens (~38.2M), led by a single Daily Firewall Logs Collector and Reporter run that consumed 10.75M tokens — 22× the median trace and ~7 % of the daily total on its own. This is the strongest single optimization target.

Observability gap: gh-aw.workflow.name is set on gen_ai parent spans but is not propagated to the http.client children that carry gen_ai.usage.* token counts. Direct sum(gen_ai.usage.total_tokens) by gh-aw.workflow.name returns null; per-workflow attribution required cross-referencing via trace IDs. The figures below report top runs (one trace each) rather than workflow rollups for that reason.

Key Metrics

Metric Value
Events analyzed (token-bearing http.client spans) 5,475
Events with token data 5,475
gen_ai parent spans (24h) 3,879
Total input tokens 142,746,607
Total output tokens 2,118,233
Total tokens 144,864,840
Unique workflows seen (parent-span attribute, top 50) 50+
Avg tokens / event 26,459
P95 tokens / event 67,859
Errors dataset events (24h) 0
Logs dataset events (24h) 0

Tokens by Model

Model Spans Input Output Total Share
claude-sonnet-4.6 2,418 117,679,870 905,133 118,585,003 81.9 %
gpt-5.4-mini-2026-03-17 679 16,378,706 279,620 16,658,326 11.5 %
claude-haiku-4.5 122 4,715,764 35,735 4,751,499 3.3 %
gpt-5.5-2026-04-23 49 1,927,071 28,942 1,956,013 1.4 %
claude-opus-4-7 2,182 875,372 845,218 1,720,590 1.2 %
claude-sonnet-4.5 22 1,070,870 23,361 1,094,231 0.8 %
gpt-4.1-2025-04-14 3 98,954 224 99,178 0.07 %

Top 10 Workflow Runs by Token Consumption

Resolved via trace_id joins between token-bearing http.client spans and the matching gen_ai parent that carries gh-aw.workflow.name / gh-aw.run.id.

# Workflow Run LLM Spans Input Output Total
1 Daily Firewall Logs Collector and Reporter §26381491116 155 10,698,174 51,879 10,750,053
2 daily-experiment-report §26392702536 56 4,689,831 41,421 4,731,252
3 Daily Syntax Error Quality Check §26391962230 74 3,952,294 10,647 3,962,941
4 Dead Code Removal Agent §26364187746 62 3,724,824 17,330 3,742,154
5 Daily Testify Uber Super Expert §26368957854 50 3,007,447 17,496 3,024,943
6 Q §26364669602 49 2,588,363 10,226 2,598,589
7 Daily Compiler Threat Spec Optimizer §26381611018 43 2,492,355 13,216 2,505,571
8 Copilot CLI Deep Research Agent §26384338048 39 2,284,317 19,531 2,303,848
9 Layout Specification Maintainer §26391872642 38 2,285,306 10,070 2,295,376
10 Daily Compiler Quality Check §26381848944 35 2,246,855 18,009 2,264,864

Top-10 combined: ~38.2M tokens, ~26 % of daily total.

Data Quality and Gaps
  • Workflow attribute not propagated to LLM spans. gh-aw.workflow.name and gh-aw.run.id are populated on span.op:gen_ai parent spans only. The span.op:http.client children that carry gen_ai.usage.input_tokens / output_tokens / total_tokens have these attributes as null. As a result, sum(gen_ai.usage.total_tokens) by gh-aw.workflow.name over the spans dataset returns a single null bucket with 5,458 / 144.73M. Per-workflow attribution required iterating over the top traces by token sum and querying trace:<id> has:gh-aw.workflow.name to resolve each one — feasible for the top N, not for the long tail.
  • transaction field is null on all token-bearing spans, so transaction-level rollup is also unavailable.
  • errors and logs datasets returned zero events for 24h — confirmed via empty list_events(dataset=errors) and list_events(dataset=logs). This is either silent success (no failures captured) or an instrumentation gap; cannot disambiguate from telemetry alone.
  • Token-precedence note: all data was sourced from gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.usage.total_tokens. No ai.*_tokens or usage.*_tokens aliases were present; no double-counting risk in this window.
  • gen_ai parent spans (3,879) vs token-bearing spans (5,475): the count gap is expected — a single agent turn often issues multiple LLM HTTP calls (tool-use loops, retries). Long-tail workflows below the top 25 traces are aggregated into model/global totals but not attributed by name.
  • One unresolved workflow name in the top spans dataset: a bucket labelled [Filtered] (20 spans) appeared in the gen_ai parent aggregation, indicating Sentry data-scrubbing redacted the workflow name for that run.
Top Traces by Tokens (raw)

Ranked aggregate over has:gen_ai.usage.total_tokens grouped by trace, top 25:

Rank Trace LLM Spans Total Tokens
1 dd64b9489dfccdacc29707d8b81ff798 155 10,750,053
2 5f88618ba71e17a9f9c6bf2f3de6b2f7 56 4,731,252
3 684d3866a4544953a15018858ddc80ec 74 3,962,941
4 9f2914242ceb26a1001c4464fc571052 62 3,742,154
5 49846a6f751540f74fc88df0287e6137 50 3,024,943
6 f7f54ad1d117ed5e78586ea5595e5467 49 2,598,589
7 a6ccf2d1bf48f51c6768a47cb1c356a2 43 2,505,571
8 6046653e9a0e04c60565ac03a5ac00cd 39 2,303,848
9 a3ec39137084759ec6c31f7c645c8116 38 2,295,376
10 c7ab4cf3e3f84e64ff9e119cbdbdaaa7 35 2,264,864
11 8a43d206c57044f44309ccaf35c74493 33 2,025,021
12 9576b78819df48bb5b1a9852bdae93b6 49 1,926,224
13 ef3e9a7757f2046bfc4fdc77a4e65234 26 1,869,476
14 51870a5df94b26ad9ba336966f4605e5 36 1,794,813
15 9a6cafb05078be9bbd09785930e92440 42 1,728,219
16 b42f5d6e60cae6c60dbd92caeafcba68 68 1,698,446
17 1004b6b975a7dd8064bd9a33efa3a75f 43 1,603,427
18 9cc48e3ec74785148163b92e56f2a67e 28 1,587,024
19 b49e4fa159f1a277d3e5e74d4e4422ec 37 1,586,080
20 aa8cee327ae88a068175babb8de849b4 32 1,539,675
21 2e0a433b0bfc6df4d7ace21524aa6d6f 51 1,494,140
22 9b528ee883a1d7233a3526bd62d32a83 28 1,488,746
23 233ab5df05d5e5257d1e5d7f9fd50220 41 1,485,784
24 bc5988a0d81f705718e125e1018bf7e7 35 1,412,593
25 bf468a29c4852bdecf1577aac3ed6819 27 1,343,796

Recommendations

  1. Propagate workflow identity onto LLM spans. Add gh-aw.workflow.name, gh-aw.workflow.id, and gh-aw.run.id as OTLP resource attributes in actions/setup/js/send_otlp_span.cjs (or wherever the SDK is initialized for agent runs). Today the auto-instrumented http.client spans for the Anthropic / OpenAI SDKs have these attributes as null, which blocks single-query workflow rollups in Sentry. Until this is fixed, daily reports must trace-walk to attribute tokens.
  2. Investigate Daily Firewall Logs Collector and Reporter run §26381491116. 10.75M tokens in one run (155 LLM calls, ~69K avg per call) is an outlier — single largest consumer in the window. Likely cause: passing raw firewall logs verbatim into a sonnet prompt on every iteration. Consider chunked summarization with a claude-haiku-4.5 first pass, or pre-aggregating logs before they reach the agent.
  3. Move bulk scheduled scans off claude-sonnet-4.6. Sonnet drives 81.9 % of all tokens; many of the top-10 consumers are linting / code-review style passes (Daily Syntax Error Quality Check, Daily Compiler Quality Check, Dead Code Removal Agent, Layout Specification Maintainer) where claude-haiku-4.5 produces comparable structured output at roughly 5× lower input cost. Pilot one of these on haiku and compare PR quality before broader rollout.
  4. Add a per-run token soft cap with checkpoint summarization. 11 of the top 25 traces each exceed 2M tokens, almost entirely input. Introducing a max_prompt_tokens budget that triggers summarization-of-history before re-prompting would clip the long-tail of runaway agent loops without changing the model mix.

References

Generated by 📊 Daily Token Consumption Report (Sentry OTel) · opus47 12.1M ·

  • expires on May 26, 2026, 1:02 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions