[token-consumption] Daily Token Consumption Report - 2026-05-25

### Executive Summary

In the last 24 hours, agentic workflows in `github/gh-aw` emitted **5,475 token-bearing LLM API spans** consuming **144.86M tokens** (142.75M input / 2.12M output). The token mix is heavily input-dominated — agents push large prompts (full repo context, logs) and receive comparatively small completions. **`claude-sonnet-4.6` carries 81.9 % of total tokens** across 2,418 spans, with `gpt-5.4-mini` a distant second at 11.5 %.

The **top 10 individual runs account for ~26 % of the day's tokens (~38.2M)**, led by a single `Daily Firewall Logs Collector and Reporter` run that consumed 10.75M tokens — 22× the median trace and ~7 % of the daily total on its own. This is the strongest single optimization target.

**Observability gap:** `gh-aw.workflow.name` is set on `gen_ai` parent spans but is **not propagated to the `http.client` children that carry `gen_ai.usage.*` token counts**. Direct `sum(gen_ai.usage.total_tokens) by gh-aw.workflow.name` returns null; per-workflow attribution required cross-referencing via `trace` IDs. The figures below report **top runs (one trace each)** rather than workflow rollups for that reason.

### Key Metrics

| Metric | Value |
|---|---|
| Events analyzed (token-bearing `http.client` spans) | 5,475 |
| Events with token data | 5,475 |
| `gen_ai` parent spans (24h) | 3,879 |
| Total input tokens | 142,746,607 |
| Total output tokens | 2,118,233 |
| Total tokens | 144,864,840 |
| Unique workflows seen (parent-span attribute, top 50) | 50+ |
| Avg tokens / event | 26,459 |
| P95 tokens / event | 67,859 |
| Errors dataset events (24h) | 0 |
| Logs dataset events (24h) | 0 |

### Tokens by Model

| Model | Spans | Input | Output | Total | Share |
|---|---:|---:|---:|---:|---:|
| claude-sonnet-4.6 | 2,418 | 117,679,870 | 905,133 | 118,585,003 | 81.9 % |
| gpt-5.4-mini-2026-03-17 | 679 | 16,378,706 | 279,620 | 16,658,326 | 11.5 % |
| claude-haiku-4.5 | 122 | 4,715,764 | 35,735 | 4,751,499 | 3.3 % |
| gpt-5.5-2026-04-23 | 49 | 1,927,071 | 28,942 | 1,956,013 | 1.4 % |
| claude-opus-4-7 | 2,182 | 875,372 | 845,218 | 1,720,590 | 1.2 % |
| claude-sonnet-4.5 | 22 | 1,070,870 | 23,361 | 1,094,231 | 0.8 % |
| gpt-4.1-2025-04-14 | 3 | 98,954 | 224 | 99,178 | 0.07 % |

### Top 10 Workflow Runs by Token Consumption

Resolved via `trace_id` joins between token-bearing `http.client` spans and the matching `gen_ai` parent that carries `gh-aw.workflow.name` / `gh-aw.run.id`.

| # | Workflow | Run | LLM Spans | Input | Output | Total |
|---:|---|---|---:|---:|---:|---:|
| 1 | Daily Firewall Logs Collector and Reporter | [§26381491116](https://github.com/github/gh-aw/actions/runs/26381491116) | 155 | 10,698,174 | 51,879 | 10,750,053 |
| 2 | daily-experiment-report | [§26392702536](https://github.com/github/gh-aw/actions/runs/26392702536) | 56 | 4,689,831 | 41,421 | 4,731,252 |
| 3 | Daily Syntax Error Quality Check | [§26391962230](https://github.com/github/gh-aw/actions/runs/26391962230) | 74 | 3,952,294 | 10,647 | 3,962,941 |
| 4 | Dead Code Removal Agent | [§26364187746](https://github.com/github/gh-aw/actions/runs/26364187746) | 62 | 3,724,824 | 17,330 | 3,742,154 |
| 5 | Daily Testify Uber Super Expert | [§26368957854](https://github.com/github/gh-aw/actions/runs/26368957854) | 50 | 3,007,447 | 17,496 | 3,024,943 |
| 6 | Q | [§26364669602](https://github.com/github/gh-aw/actions/runs/26364669602) | 49 | 2,588,363 | 10,226 | 2,598,589 |
| 7 | Daily Compiler Threat Spec Optimizer | [§26381611018](https://github.com/github/gh-aw/actions/runs/26381611018) | 43 | 2,492,355 | 13,216 | 2,505,571 |
| 8 | Copilot CLI Deep Research Agent | [§26384338048](https://github.com/github/gh-aw/actions/runs/26384338048) | 39 | 2,284,317 | 19,531 | 2,303,848 |
| 9 | Layout Specification Maintainer | [§26391872642](https://github.com/github/gh-aw/actions/runs/26391872642) | 38 | 2,285,306 | 10,070 | 2,295,376 |
| 10 | Daily Compiler Quality Check | [§26381848944](https://github.com/github/gh-aw/actions/runs/26381848944) | 35 | 2,246,855 | 18,009 | 2,264,864 |

**Top-10 combined: ~38.2M tokens, ~26 % of daily total.**

<details>
<summary>Data Quality and Gaps</summary>

- **Workflow attribute not propagated to LLM spans.** `gh-aw.workflow.name` and `gh-aw.run.id` are populated on `span.op:gen_ai` parent spans only. The `span.op:http.client` children that carry `gen_ai.usage.input_tokens` / `output_tokens` / `total_tokens` have these attributes as `null`. As a result, `sum(gen_ai.usage.total_tokens) by gh-aw.workflow.name` over the spans dataset returns a single null bucket with 5,458 / 144.73M. Per-workflow attribution required iterating over the top traces by token sum and querying `trace:<id> has:gh-aw.workflow.name` to resolve each one — feasible for the top N, not for the long tail.
- **`transaction` field is null** on all token-bearing spans, so transaction-level rollup is also unavailable.
- **`errors` and `logs` datasets returned zero events for 24h** — confirmed via empty `list_events(dataset=errors)` and `list_events(dataset=logs)`. This is either silent success (no failures captured) or an instrumentation gap; cannot disambiguate from telemetry alone.
- **Token-precedence note:** all data was sourced from `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, and `gen_ai.usage.total_tokens`. No `ai.*_tokens` or `usage.*_tokens` aliases were present; no double-counting risk in this window.
- **`gen_ai` parent spans (3,879) vs token-bearing spans (5,475):** the count gap is expected — a single agent turn often issues multiple LLM HTTP calls (tool-use loops, retries). Long-tail workflows below the top 25 traces are aggregated into model/global totals but not attributed by name.
- **One unresolved workflow name** in the top spans dataset: a bucket labelled `[Filtered]` (20 spans) appeared in the gen_ai parent aggregation, indicating Sentry data-scrubbing redacted the workflow name for that run.

</details>

<details>
<summary>Top Traces by Tokens (raw)</summary>

Ranked aggregate over `has:gen_ai.usage.total_tokens` grouped by `trace`, top 25:

| Rank | Trace | LLM Spans | Total Tokens |
|---:|---|---:|---:|
| 1 | dd64b9489dfccdacc29707d8b81ff798 | 155 | 10,750,053 |
| 2 | 5f88618ba71e17a9f9c6bf2f3de6b2f7 | 56 | 4,731,252 |
| 3 | 684d3866a4544953a15018858ddc80ec | 74 | 3,962,941 |
| 4 | 9f2914242ceb26a1001c4464fc571052 | 62 | 3,742,154 |
| 5 | 49846a6f751540f74fc88df0287e6137 | 50 | 3,024,943 |
| 6 | f7f54ad1d117ed5e78586ea5595e5467 | 49 | 2,598,589 |
| 7 | a6ccf2d1bf48f51c6768a47cb1c356a2 | 43 | 2,505,571 |
| 8 | 6046653e9a0e04c60565ac03a5ac00cd | 39 | 2,303,848 |
| 9 | a3ec39137084759ec6c31f7c645c8116 | 38 | 2,295,376 |
| 10 | c7ab4cf3e3f84e64ff9e119cbdbdaaa7 | 35 | 2,264,864 |
| 11 | 8a43d206c57044f44309ccaf35c74493 | 33 | 2,025,021 |
| 12 | 9576b78819df48bb5b1a9852bdae93b6 | 49 | 1,926,224 |
| 13 | ef3e9a7757f2046bfc4fdc77a4e65234 | 26 | 1,869,476 |
| 14 | 51870a5df94b26ad9ba336966f4605e5 | 36 | 1,794,813 |
| 15 | 9a6cafb05078be9bbd09785930e92440 | 42 | 1,728,219 |
| 16 | b42f5d6e60cae6c60dbd92caeafcba68 | 68 | 1,698,446 |
| 17 | 1004b6b975a7dd8064bd9a33efa3a75f | 43 | 1,603,427 |
| 18 | 9cc48e3ec74785148163b92e56f2a67e | 28 | 1,587,024 |
| 19 | b49e4fa159f1a277d3e5e74d4e4422ec | 37 | 1,586,080 |
| 20 | aa8cee327ae88a068175babb8de849b4 | 32 | 1,539,675 |
| 21 | 2e0a433b0bfc6df4d7ace21524aa6d6f | 51 | 1,494,140 |
| 22 | 9b528ee883a1d7233a3526bd62d32a83 | 28 | 1,488,746 |
| 23 | 233ab5df05d5e5257d1e5d7f9fd50220 | 41 | 1,485,784 |
| 24 | bc5988a0d81f705718e125e1018bf7e7 | 35 | 1,412,593 |
| 25 | bf468a29c4852bdecf1577aac3ed6819 | 27 | 1,343,796 |

</details>

### Recommendations

1. **Propagate workflow identity onto LLM spans.** Add `gh-aw.workflow.name`, `gh-aw.workflow.id`, and `gh-aw.run.id` as OTLP **resource attributes** in `actions/setup/js/send_otlp_span.cjs` (or wherever the SDK is initialized for agent runs). Today the auto-instrumented `http.client` spans for the Anthropic / OpenAI SDKs have these attributes as `null`, which blocks single-query workflow rollups in Sentry. Until this is fixed, daily reports must trace-walk to attribute tokens.
2. **Investigate `Daily Firewall Logs Collector and Reporter` run [§26381491116](https://github.com/github/gh-aw/actions/runs/26381491116).** 10.75M tokens in one run (155 LLM calls, ~69K avg per call) is an outlier — single largest consumer in the window. Likely cause: passing raw firewall logs verbatim into a sonnet prompt on every iteration. Consider chunked summarization with a `claude-haiku-4.5` first pass, or pre-aggregating logs before they reach the agent.
3. **Move bulk scheduled scans off `claude-sonnet-4.6`.** Sonnet drives 81.9 % of all tokens; many of the top-10 consumers are linting / code-review style passes (`Daily Syntax Error Quality Check`, `Daily Compiler Quality Check`, `Dead Code Removal Agent`, `Layout Specification Maintainer`) where `claude-haiku-4.5` produces comparable structured output at roughly 5× lower input cost. Pilot one of these on haiku and compare PR quality before broader rollout.
4. **Add a per-run token soft cap with checkpoint summarization.** 11 of the top 25 traces each exceed 2M tokens, almost entirely input. Introducing a `max_prompt_tokens` budget that triggers summarization-of-history before re-prompting would clip the long-tail of runaway agent loops without changing the model mix.

### References

- [Sentry — token-bearing spans, last 24h](https://github.sentry.io/explore/traces/?query=has:gen_ai.usage.total_tokens&project=4511347087179777&statsPeriod=24h&table=span)
- [Sentry — by model, last 24h](https://github.sentry.io/explore/traces/?query=has:gen_ai.usage.total_tokens&project=4511347087179777&aggregateField=%7B%22groupBy%22:%22gen_ai.response.model%22%7D&aggregateField=%7B%22yAxes%22:%5B%22sum%28gen_ai.usage.total_tokens%29%22%5D%7D&mode=aggregate&sort=-sum%28gen_ai.usage.total_tokens%29&statsPeriod=24h&table=span)
- [Top run — Daily Firewall Logs Collector and Reporter §26381491116](https://github.com/github/gh-aw/actions/runs/26381491116)







> Generated by [📊 Daily Token Consumption Report (Sentry OTel)](https://github.com/github/gh-aw/actions/runs/26401490457) · opus47 12.1M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-token-consumption-report%22&type=issues)
> - [x] expires  on May 26, 2026, 1:02 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[token-consumption] Daily Token Consumption Report - 2026-05-25 #34646

Executive Summary

Key Metrics

Tokens by Model

Top 10 Workflow Runs by Token Consumption

Recommendations

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Events analyzed (token-bearing `http.client` spans)	5,475
Events with token data	5,475
`gen_ai` parent spans (24h)	3,879
Total input tokens	142,746,607
Total output tokens	2,118,233
Total tokens	144,864,840
Unique workflows seen (parent-span attribute, top 50)	50+
Avg tokens / event	26,459
P95 tokens / event	67,859
Errors dataset events (24h)	0
Logs dataset events (24h)	0

Model	Spans	Input	Output	Total	Share
claude-sonnet-4.6	2,418	117,679,870	905,133	118,585,003	81.9 %
gpt-5.4-mini-2026-03-17	679	16,378,706	279,620	16,658,326	11.5 %
claude-haiku-4.5	122	4,715,764	35,735	4,751,499	3.3 %
gpt-5.5-2026-04-23	49	1,927,071	28,942	1,956,013	1.4 %
claude-opus-4-7	2,182	875,372	845,218	1,720,590	1.2 %
claude-sonnet-4.5	22	1,070,870	23,361	1,094,231	0.8 %
gpt-4.1-2025-04-14	3	98,954	224	99,178	0.07 %

#	Workflow	Run	LLM Spans	Input	Output	Total
1	Daily Firewall Logs Collector and Reporter	§26381491116	155	10,698,174	51,879	10,750,053
2	daily-experiment-report	§26392702536	56	4,689,831	41,421	4,731,252
3	Daily Syntax Error Quality Check	§26391962230	74	3,952,294	10,647	3,962,941
4	Dead Code Removal Agent	§26364187746	62	3,724,824	17,330	3,742,154
5	Daily Testify Uber Super Expert	§26368957854	50	3,007,447	17,496	3,024,943
6	Q	§26364669602	49	2,588,363	10,226	2,598,589
7	Daily Compiler Threat Spec Optimizer	§26381611018	43	2,492,355	13,216	2,505,571
8	Copilot CLI Deep Research Agent	§26384338048	39	2,284,317	19,531	2,303,848
9	Layout Specification Maintainer	§26391872642	38	2,285,306	10,070	2,295,376
10	Daily Compiler Quality Check	§26381848944	35	2,246,855	18,009	2,264,864

Rank	Trace	LLM Spans	Total Tokens
1	dd64b9489dfccdacc29707d8b81ff798	155	10,750,053
2	5f88618ba71e17a9f9c6bf2f3de6b2f7	56	4,731,252
3	684d3866a4544953a15018858ddc80ec	74	3,962,941
4	9f2914242ceb26a1001c4464fc571052	62	3,742,154
5	49846a6f751540f74fc88df0287e6137	50	3,024,943
6	f7f54ad1d117ed5e78586ea5595e5467	49	2,598,589
7	a6ccf2d1bf48f51c6768a47cb1c356a2	43	2,505,571
8	6046653e9a0e04c60565ac03a5ac00cd	39	2,303,848
9	a3ec39137084759ec6c31f7c645c8116	38	2,295,376
10	c7ab4cf3e3f84e64ff9e119cbdbdaaa7	35	2,264,864
11	8a43d206c57044f44309ccaf35c74493	33	2,025,021
12	9576b78819df48bb5b1a9852bdae93b6	49	1,926,224
13	ef3e9a7757f2046bfc4fdc77a4e65234	26	1,869,476
14	51870a5df94b26ad9ba336966f4605e5	36	1,794,813
15	9a6cafb05078be9bbd09785930e92440	42	1,728,219
16	b42f5d6e60cae6c60dbd92caeafcba68	68	1,698,446
17	1004b6b975a7dd8064bd9a33efa3a75f	43	1,603,427
18	9cc48e3ec74785148163b92e56f2a67e	28	1,587,024
19	b49e4fa159f1a277d3e5e74d4e4422ec	37	1,586,080
20	aa8cee327ae88a068175babb8de849b4	32	1,539,675
21	2e0a433b0bfc6df4d7ace21524aa6d6f	51	1,494,140
22	9b528ee883a1d7233a3526bd62d32a83	28	1,488,746
23	233ab5df05d5e5257d1e5d7f9fd50220	41	1,485,784
24	bc5988a0d81f705718e125e1018bf7e7	35	1,412,593
25	bf468a29c4852bdecf1577aac3ed6819	27	1,343,796

[token-consumption] Daily Token Consumption Report - 2026-05-25 #34646

Description

Executive Summary

Key Metrics

Tokens by Model

Top 10 Workflow Runs by Token Consumption

Recommendations

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions