[otel-advisor] OTel improvement: add gen_ai.response.finish_reasons to the agent span (currently dropping stop_reason)

### 📡 OTel Instrumentation Improvement: emit `gen_ai.response.finish_reasons` on the dedicated agent span

**Analysis Date**: 2026-05-10
**Priority**: High
**Effort**: Small (< 2h)

### Problem

`readAgentRuntimeMetrics` in `actions/setup/js/send_otlp_span.cjs` (lines 1095–1139) parses the Claude/Codex/Gemini agent's stdio JSON `result` line and extracts only `num_turns` and `total_cost_usd`. The **same JSON line already contains `stop_reason`** — this is verified by the test fixtures in `actions/setup/js/parse_threat_detection_results.test.cjs:91` (`stop_reason":"end_turn"`). The field is read into context, ignored, and never makes it into a span attribute.

As a result, the dedicated agent span (`gh-aw.<job>.agent`, built at `send_otlp_span.cjs:1485`) carries the OTel GenAI request attributes (`gen_ai.request.model`, `gen_ai.system`, `gen_ai.operation.name`) and usage attributes (`gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, etc.) but **omits the standard `gen_ai.response.finish_reasons` response attribute**. There is no way for a DevOps engineer querying the OTel backend to answer “which runs were truncated by `max_tokens`?” — a silent-failure mode where the agent's reasoning gets cut mid-thought and the workflow nevertheless reports STATUS_OK because the agent exited without throwing.

<details>
<summary>Why This Matters (DevOps Perspective)</summary>

`stop_reason` distinguishes between fundamentally different operational outcomes that today look identical in OTel:

| `stop_reason` | What it means | Operational signal |
|---|---|---|
| `end_turn` / `stop` | Clean completion | Healthy |
| `max_tokens` / `length` | Output truncated mid-thought | **Silent failure** — needs bigger context window or shorter prompt |
| `tool_use` | Agent exited mid-tool-call | Often a tool-loop bug |
| `stop_sequence` | Hit a configured stop string | Usually fine; sometimes premature |

Without this attribute:
- You **cannot alert** on truncation pressure (engineers only discover it by reading individual workflow logs).
- Grafana / Datadog / Honeycomb GenAI dashboards that expect `gen_ai.response.finish_reasons` show as “unknown” for every gh-aw run.
- MTTR for “why did the agent stop early?” incidents stays in the 10–30 min range because the next step is opening the artifact mirror and grepping `agent-stdio.log` by hand.

With this attribute, an engineer can run a single PromQL/SQL filter (`gen_ai.response.finish_reasons = "max_tokens"`) to find every truncated run across the org, and a recurring spike on that filter becomes an actionable alert.

</details>

<details>
<summary>Current Behavior</summary>

`actions/setup/js/send_otlp_span.cjs:1095–1139` — the parser drops `stop_reason`:

```javascript
function readAgentRuntimeMetrics() {
 const metrics = { turns: undefined, estimatedCostUsd: undefined, warningCount: 0 };
 try {
 const content = fs.readFileSync(AGENT_STDIO_LOG_PATH, "utf8");
 const lines = content.split("\n");
 for (const rawLine of lines) {
 const line = rawLine.trim();
 if (!line) continue;
 if (/^(?:\[WARN\]|npm warn\b)/i.test(line)) metrics.warningCount += 1;
 const jsonStart = line.indexOf("{");
 if (jsonStart < 0) continue;
 try {
 const parsed = JSON.parse(line.slice(jsonStart));
 if (!parsed || parsed.type !== "result") continue;
 if (typeof parsed.num_turns === "number" && parsed.num_turns >= 0) metrics.turns = parsed.num_turns;
 if (typeof parsed.total_cost_usd === "number" && Number.isFinite(parsed.total_cost_usd) && parsed.total_cost_usd >= 0) {
 metrics.estimatedCostUsd = parsed.total_cost_usd;
 }
 // ⚠️ parsed.stop_reason and parsed.is_error are visible here but never extracted.
 } catch { /* ignore */ }
 }
 } catch { return metrics; }
 return metrics;
}
```

`actions/setup/js/send_otlp_span.cjs:1462–1500` — the agent span builds OTel GenAI attributes but never sets a response attribute:

```javascript
const agentAttributes = [...attributes, ...usageAttrs];
agentAttributes.push(buildAttr("gen_ai.operation.name", "chat"));
if (engineId) {
 agentAttributes.push(buildAttr("gen_ai.system", ENGINE_TO_SYSTEM_MAP[engineId] || engineId));
 agentAttributes.push(buildAttr("gh-aw.engine", engineId));
}
if (workflowName) agentAttributes.push(buildAttr("gen_ai.workflow.name", workflowName));
// ⚠️ No gen_ai.response.* attributes anywhere.
```

</details>

<details>
<summary>Proposed Change</summary>

Extend `readAgentRuntimeMetrics` to extract `stop_reason` and `is_error`, then attach them as standard OTel GenAI attributes on the dedicated agent span.

```javascript
// 1) actions/setup/js/send_otlp_span.cjs — update the typedef and parser

/**
 * `@typedef` {Object} AgentRuntimeMetrics
 * `@property` {number | undefined} turns
 * `@property` {number | undefined} estimatedCostUsd
 * `@property` {number} warningCount
 * `@property` {string | undefined} finishReason // NEW: e.g. "end_turn", "max_tokens", "tool_use"
 * `@property` {boolean | undefined} isError // NEW: Claude SDK reports an internal error
 */

function readAgentRuntimeMetrics() {
 const metrics = { turns: undefined, estimatedCostUsd: undefined, warningCount: 0, finishReason: undefined, isError: undefined };
 // ... existing loop ...
 if (typeof parsed.stop_reason === "string" && parsed.stop_reason) {
 metrics.finishReason = parsed.stop_reason;
 }
 if (typeof parsed.is_error === "boolean") {
 metrics.isError = parsed.is_error;
 }
 // ...
}

// 2) actions/setup/js/send_otlp_span.cjs — in the dedicated-agent-span block (~line 1462–1483)

if (typeof runtimeMetrics.finishReason === "string") {
 // OTel GenAI semantic convention: gen_ai.response.finish_reasons is an array.
 // We have a single reason, so emit a single-element JSON array string for backends
 // that prefer string-typed attributes (OTLP/HTTP JSON does not natively support arrays).
 agentAttributes.push(buildAttr("gen_ai.response.finish_reasons", JSON.stringify([runtimeMetrics.finishReason])));
 // Mirror as a flat string for backends that filter on scalar attributes.
 agentAttributes.push(buildAttr("gh-aw.agent.finish_reason", runtimeMetrics.finishReason));
}
if (typeof runtimeMetrics.isError === "boolean") {
 agentAttributes.push(buildAttr("gh-aw.agent.is_error", runtimeMetrics.isError));
}
```

Keep the change scoped to the **agent span only** (not the conclusion span), matching the existing convention that `gen_ai.usage.*` lives on the agent span to avoid double-counting.

</details>

<details>
<summary>Expected Outcome</summary>

After this change:

- **Grafana / Honeycomb / Datadog**: queries like `gen_ai.response.finish_reasons = "max_tokens"` and `gh-aw.agent.finish_reason = "tool_use"` become available; native GenAI dashboards in Honeycomb/Grafana will start populating their “Finish Reasons” panels for every gh-aw agent run.
- **JSONL mirror** (`/tmp/gh-aw/otel.jsonl`): every agent span carries the new attributes, so artifact-only debugging gets the same signal without a live collector.
- **On-call engineers**: “why did the agent stop early?” collapses from a 10–30 min log-grep to a single dashboard glance. Truncation incidents become alertable rather than discovered-after-the-fact.

</details>

<details>
<summary>Implementation Steps</summary>

- [ ] In `actions/setup/js/send_otlp_span.cjs`, extend the `AgentRuntimeMetrics` typedef and the parser loop to capture `parsed.stop_reason` and `parsed.is_error`.
- [ ] In the dedicated-agent-span block (around line 1462), push `gen_ai.response.finish_reasons` (JSON-encoded array) and `gh-aw.agent.finish_reason` / `gh-aw.agent.is_error` onto `agentAttributes` when present.
- [ ] Add unit tests in `actions/setup/js/action_conclusion_otlp.test.cjs` (or `action_otlp.test.cjs`) that:
 - Write a fake `agent-stdio.log` with a `result` line carrying `"stop_reason":"max_tokens"`.
 - Assert the produced agent-span attributes include `gen_ai.response.finish_reasons` set to `["max_tokens"]`.
 - Assert the conclusion span does **not** carry the same attribute (avoid double-emit).
- [ ] Run `make test-unit` (or `cd actions/setup/js && npx vitest run`) to confirm tests pass.
- [ ] Run `make fmt` to ensure formatting.
- [ ] Open a PR referencing this issue.

</details>

<details>
<summary>Evidence from Live Sentry Data</summary>

The Sentry MCP server attached to this workflow (`/home/runner/work/_temp/gh-aw/mcp-cli/tools/sentry.json`) reports an empty tool list — every direct tool call (`find_organizations`, `search_events`, etc.) returns `unknown tool`. Live Sentry data was therefore unavailable for this run, so this recommendation is grounded in **static evidence from the code and the existing test fixtures**:

- `actions/setup/js/parse_threat_detection_results.test.cjs:91` confirms that the agent stdio JSON shape carries `stop_reason` (`'{"type":"result","subtype":"success","is_error":false,...,"stop_reason":"end_turn"}'`).
- `actions/setup/js/send_otlp_span.cjs:1117–1129` shows the parser already iterates these lines but extracts only `num_turns` and `total_cost_usd`.
- `actions/setup/js/send_otlp_span.cjs:1462–1500` shows the agent span is built with OTel GenAI request/usage attributes but no `gen_ai.response.*` attributes.

When Sentry data is available again, the recommendation can be reinforced by querying for spans named `gh-aw.*.agent` in the spans dataset and confirming none of them carry `gen_ai.response.finish_reasons`.

</details>

<details>
<summary>Related Files</summary>

- `actions/setup/js/send_otlp_span.cjs` (parser at lines 1095–1139; agent span at 1462–1500)
- `actions/setup/js/action_conclusion_otlp.test.cjs` (add unit tests here)
- `actions/setup/js/parse_threat_detection_results.test.cjs` (reference for the stdio JSON shape)
- `actions/setup/js/action_otlp.test.cjs` (additional test surface)

</details>

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/25625396866) workflow*







> Generated by [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/25625396866/agentic_workflow) · ● 9.8M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on May 17, 2026, 9:45 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] OTel improvement: add gen_ai.response.finish_reasons to the agent span (currently dropping stop_reason) #31322

📡 OTel Instrumentation Improvement: emit `gen_ai.response.finish_reasons` on the dedicated agent span

Problem

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

`stop_reason`	What it means	Operational signal
`end_turn` / `stop`	Clean completion	Healthy
`max_tokens` / `length`	Output truncated mid-thought	Silent failure — needs bigger context window or shorter prompt
`tool_use`	Agent exited mid-tool-call	Often a tool-loop bug
`stop_sequence`	Hit a configured stop string	Usually fine; sometimes premature

[otel-advisor] OTel improvement: add gen_ai.response.finish_reasons to the agent span (currently dropping stop_reason) #31322

Description

📡 OTel Instrumentation Improvement: emit gen_ai.response.finish_reasons on the dedicated agent span

Problem

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

📡 OTel Instrumentation Improvement: emit `gen_ai.response.finish_reasons` on the dedicated agent span