[otel-advisor] OTel improvement: source gh-aw.effective_tokens from agent_usage.json (native cost metric is on 0 spans)

### 📡 OTel Instrumentation Improvement: make `gh-aw.effective_tokens` reliable by reading the durable `agent_usage.json` artifact

**Analysis Date**: 2026-05-30 
**Priority**: High 
**Effort**: Small (< 2h)

### Problem

`sendJobConclusionSpan` in `actions/setup/js/send_otlp_span.cjs` is supposed to emit `gh-aw.effective_tokens` — gh-aw's **engine-agnostic** per-run token-cost metric — on every conclusion span. But it sources the value **only** from the `GH_AW_EFFECTIVE_TOKENS` environment variable:

```javascript
// send_otlp_span.cjs:1745-1747
const rawET = process.env.GH_AW_EFFECTIVE_TOKENS || "";
const effectiveTokens = rawET ? parseInt(rawET, 10) : NaN;
```

That env var is exported via `core.exportVariable` (i.e. `$GITHUB_ENV`) by `parse_token_usage.cjs` **inside the agent job**, but it is not propagated into the OTLP conclusion **post-step** environment (it is only explicitly wired into the `safe_outputs` job via `needs.agent.outputs.effective_tokens`). The result: when the conclusion span is built, `GH_AW_EFFECTIVE_TOKENS` is unset, `effectiveTokens` is `NaN`, and the attribute is silently dropped.

**Live telemetry confirms this is a 100% silent failure**, not a rollout blip:

- Sentry spans dataset (`github` org, `gh-aw` project): `has:gh-aw.effective_tokens` → **0 spans over the last 30 days**.
- In the **last 24h**: **0 of 344** `gh-aw.agent.conclusion` spans carry `gh-aw.effective_tokens` **or** `gen_ai.usage.total_tokens` — across all engines (copilot 215, claude 62, codex 34, pi 11, gemini 11, antigravity 11).

A DevOps engineer therefore **cannot answer "how many tokens / how much did this run cost?"** from OTel today — the one native, engine-agnostic cost attribute gh-aw emits never reaches the backend.

<details>
<summary>Why This Matters (DevOps Perspective)</summary>

`gh-aw.effective_tokens` is the single attribute that normalizes cost across **all** engines (copilot, claude, codex, gemini, pi, antigravity). The OTel GenAI `gen_ai.usage.*` attributes are engine-dependent and, per live data, reach only ~4% of runs (434 `gh-aw.agent.conclusion` spans over 30 days, 0 in the last 24h) because they depend on a result event in `agent-stdio.log` that several engines never emit.

With `gh-aw.effective_tokens` reliably present, these become possible with no per-engine special-casing:

- **Dashboards**: `sum(gh-aw.effective_tokens)` per workflow / per engine / per day — token burn-down and cost attribution.
- **Alerts**: page when a workflow's effective tokens spike vs. its baseline (run-away agent detection).
- **Triage**: correlate `gh-aw.run.status:failure` (68/24h) with token consumption to spot timeouts caused by context exhaustion.

Today all of these silently return empty, which reads as "zero cost" rather than "no data" — the most dangerous kind of observability gap.

</details>

<details>
<summary>Current Behavior</summary>

The conclusion span already reads the durable `agent_usage.json` artifact — but only for `gen_ai.usage.*`, ignoring the `effective_tokens` field that the very same file contains:

```javascript
// send_otlp_span.cjs:1745-1747 — effective tokens: ENV ONLY
const rawET = process.env.GH_AW_EFFECTIVE_TOKENS || "";
const effectiveTokens = rawET ? parseInt(rawET, 10) : NaN;

// send_otlp_span.cjs:1905-1906 — emitted only when env was present (it never is)
if (!isNaN(effectiveTokens) && effectiveTokens > 0) {
 attributes.push(buildAttr("gh-aw.effective_tokens", effectiveTokens));
}

// send_otlp_span.cjs:2092 — agent_usage.json IS read here, but only for gen_ai.usage.*
const agentUsage = readJSONIfExists("/tmp/gh-aw/agent_usage.json") || runtimeMetrics.tokenUsage || {};
```

Meanwhile `parse_token_usage.cjs` writes the value to disk every run:

```javascript
// parse_token_usage.cjs:129-142
const agentUsage = {
 input_tokens: summary.totalInputTokens,
 output_tokens: summary.totalOutputTokens,
 cache_read_tokens: summary.totalCacheReadTokens,
 cache_write_tokens: summary.totalCacheWriteTokens,
 effective_tokens: effectiveTokens, // <-- durable, but never read by the OTLP span
 ...(primaryModel ? { primary_model: primaryModel } : {}),
};
fs.writeFileSync(AGENT_USAGE_PATH, JSON.stringify(agentUsage) + "\n"); // /tmp/gh-aw/agent_usage.json
if (effectiveTokens > 0) {
 core.exportVariable("GH_AW_EFFECTIVE_TOKENS", String(effectiveTokens)); // <-- not visible in the post-step
}
```

</details>

<details>
<summary>Proposed Change</summary>

Fall back to the on-disk `agent_usage.json` artifact (already bundled in the agent artifact and present on disk for every job) when the env var is missing, mirroring how `gen_ai.usage.*` is already sourced. Also gate the attribute to the agent job so `sum(gh-aw.effective_tokens)` is not inflated across the multiple downstream jobs that download the same artifact.

```javascript
// Proposed: actions/setup/js/send_otlp_span.cjs (~line 1745)
// Prefer the GH_AW_EFFECTIVE_TOKENS env var, but fall back to the durable
// agent_usage.json artifact: the env var is exported to GITHUB_ENV inside the
// agent job and is NOT visible in the OTLP conclusion post-step, so relying on
// it alone drops the attribute on 100% of spans.
const rawET = process.env.GH_AW_EFFECTIVE_TOKENS || "";
let effectiveTokens = rawET ? parseInt(rawET, 10) : NaN;
if (!(Number.isFinite(effectiveTokens) && effectiveTokens > 0)) {
 const usageForET = readJSONIfExists("/tmp/gh-aw/agent_usage.json");
 if (usageForET && typeof usageForET.effective_tokens === "number" && usageForET.effective_tokens > 0) {
 effectiveTokens = usageForET.effective_tokens;
 }
}
```

```javascript
// Proposed: actions/setup/js/send_otlp_span.cjs (~line 1905)
// Gate to the agent job to avoid double-counting across downstream jobs that
// also have agent_usage.json on disk (same rationale as gen_ai.usage.* below).
if (jobName === "agent" && Number.isFinite(effectiveTokens) && effectiveTokens > 0) {
 attributes.push(buildAttr("gh-aw.effective_tokens", effectiveTokens));
}
```

</details>

<details>
<summary>Expected Outcome</summary>

After this change:

- **In Grafana / Honeycomb / Datadog / Sentry**: `gh-aw.effective_tokens` becomes present on `gh-aw.agent.conclusion` spans for **every** engine, enabling `sum`/`avg`/`p95` token-cost dashboards and threshold alerts with no per-engine special-casing.
- **In the JSONL mirror**: the agent conclusion span gains a populated `gh-aw.effective_tokens` attribute, so post-hoc artifact debugging shows run cost without a live collector.
- **For on-call engineers**: failed/timed-out runs can be correlated with token burn (context-exhaustion timeouts become visible).

(Note: Sentry's EAP currently types `gh-aw.*` custom attributes as *string* fields, so `avg()`/`sum()` in Sentry still rejects them — a Sentry schema-inference behavior, not a gh-aw wire-format bug, so out of scope here. Grafana/Honeycomb/Datadog aggregate them fine.)

</details>

<details>
<summary>Implementation Steps</summary>

- [ ] Edit `actions/setup/js/send_otlp_span.cjs`: add the `agent_usage.json` fallback for `effectiveTokens` (~line 1745) and gate emission to `jobName === "agent"` (~line 1905).
- [ ] Update `actions/setup/js/send_otlp_span.test.cjs` to assert `gh-aw.effective_tokens` is emitted from `agent_usage.json` when `GH_AW_EFFECTIVE_TOKENS` is unset, and is absent on non-agent jobs.
- [ ] Run `make test-unit` (or `cd actions/setup/js && npx vitest run send_otlp_span`) to confirm tests pass.
- [ ] Run `make fmt` to ensure formatting.
- [ ] Open a PR referencing this issue.

</details>

<details>
<summary>Evidence from Live OTel Data (Sentry / Grafana)</summary>

**Backend used**: Sentry spans dataset — org `github`, project `gh-aw`, region `https://us.sentry.io`. (Grafana has a Tempo datasource `grafanacloud-ghaw-traces`, but the Grafana MCP build available to this run exposes only `list_datasources`/`get_datasource` — no `tempo_traceql-search`/`tempo_get-trace` — so Tempo trace querying was not possible. Noted as a backend/tooling limitation; Sentry provided sufficient evidence.)

**Pipeline is healthy** (rules out a broad export problem):
- `span.name:gh-aw.*` over 24h → setup+conclusion spans for `activation` (348), `conclusion` (348), `agent` (345 setup / 344 conclusion), `pre_activation` (287), `safe_outputs` (282), `detection` (234).
- Trace continuity intact: trace `797e7af5c08fc5b14427502603b2e4b0` joins gh-aw lifecycle spans with `mcp.tool_call` / `gateway.backend.execute` children under one trace.
- `gh-aw.run.status` populated: `success` 1994 / `failure` 68 (24h).
- Resource attributes verified present (HEAD local mirror + Sentry): `service.version`, `github.repository`, `github.run_id`, `github.event_name`, `deployment.environment`.

**The gap**:
- `has:gh-aw.effective_tokens` → **0 spans / 30 days** (and 0 / 24h).
- `span.name:gh-aw.agent.conclusion has:gh-aw.effective_tokens` grouped by `gh-aw.engine.id` → **No results** (24h).
- For contrast, `gen_ai.usage.total_tokens` reaches only `gh-aw.agent.conclusion` 434 spans / 30d (copilot 297, claude 111, codex 26; gemini/pi/antigravity 0) and **0 in the last 24h** — confirming token telemetry is broadly missing and the engine-agnostic `effective_tokens` is the right metric to make reliable.
- The dedicated `gh-aw.agent.agent` span (intended token carrier) returns **0 results / 30 days**, so the conclusion-span fallback is the only carrier — making its `effective_tokens` source the highest-leverage fix.

</details>

<details>
<summary>Related Files</summary>

- `actions/setup/js/send_otlp_span.cjs` (lines 1745-1747, 1905-1906, 2092)
- `actions/setup/js/parse_token_usage.cjs` (lines 129-142 — writes `effective_tokens` into `agent_usage.json`)
- `actions/setup/js/action_conclusion_otlp.cjs` (conclusion-span entrypoint; passes `startMs` only)
- `actions/setup/js/send_otlp_span.test.cjs` (unit tests to extend)
- `pkg/workflow/compiler_safe_outputs_job.go` (line 672 — the only job explicitly wired with `GH_AW_EFFECTIVE_TOKENS`)

</details>

---

*[Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26680726572)*







> Generated by [📊 Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26680726572) · opus48 4.1M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on Jun 6, 2026, 10:00 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] OTel improvement: source gh-aw.effective_tokens from agent_usage.json (native cost metric is on 0 spans) #35900

📡 OTel Instrumentation Improvement: make `gh-aw.effective_tokens` reliable by reading the durable `agent_usage.json` artifact

Problem

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[otel-advisor] OTel improvement: source gh-aw.effective_tokens from agent_usage.json (native cost metric is on 0 spans) #35900

Description

📡 OTel Instrumentation Improvement: make gh-aw.effective_tokens reliable by reading the durable agent_usage.json artifact

Problem

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

📡 OTel Instrumentation Improvement: make `gh-aw.effective_tokens` reliable by reading the durable `agent_usage.json` artifact