Skip to content

[otel-advisor] OTel improvement: add gen_ai.response.finish_reasons to the agent span (currently dropping stop_reason) #31322

@github-actions

Description

@github-actions

📡 OTel Instrumentation Improvement: emit gen_ai.response.finish_reasons on the dedicated agent span

Analysis Date: 2026-05-10
Priority: High
Effort: Small (< 2h)

Problem

readAgentRuntimeMetrics in actions/setup/js/send_otlp_span.cjs (lines 1095–1139) parses the Claude/Codex/Gemini agent's stdio JSON result line and extracts only num_turns and total_cost_usd. The same JSON line already contains stop_reason — this is verified by the test fixtures in actions/setup/js/parse_threat_detection_results.test.cjs:91 (stop_reason":"end_turn"). The field is read into context, ignored, and never makes it into a span attribute.

As a result, the dedicated agent span (gh-aw.<job>.agent, built at send_otlp_span.cjs:1485) carries the OTel GenAI request attributes (gen_ai.request.model, gen_ai.system, gen_ai.operation.name) and usage attributes (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, etc.) but omits the standard gen_ai.response.finish_reasons response attribute. There is no way for a DevOps engineer querying the OTel backend to answer “which runs were truncated by max_tokens?” — a silent-failure mode where the agent's reasoning gets cut mid-thought and the workflow nevertheless reports STATUS_OK because the agent exited without throwing.

Why This Matters (DevOps Perspective)

stop_reason distinguishes between fundamentally different operational outcomes that today look identical in OTel:

stop_reason What it means Operational signal
end_turn / stop Clean completion Healthy
max_tokens / length Output truncated mid-thought Silent failure — needs bigger context window or shorter prompt
tool_use Agent exited mid-tool-call Often a tool-loop bug
stop_sequence Hit a configured stop string Usually fine; sometimes premature

Without this attribute:

  • You cannot alert on truncation pressure (engineers only discover it by reading individual workflow logs).
  • Grafana / Datadog / Honeycomb GenAI dashboards that expect gen_ai.response.finish_reasons show as “unknown” for every gh-aw run.
  • MTTR for “why did the agent stop early?” incidents stays in the 10–30 min range because the next step is opening the artifact mirror and grepping agent-stdio.log by hand.

With this attribute, an engineer can run a single PromQL/SQL filter (gen_ai.response.finish_reasons = "max_tokens") to find every truncated run across the org, and a recurring spike on that filter becomes an actionable alert.

Current Behavior

actions/setup/js/send_otlp_span.cjs:1095–1139 — the parser drops stop_reason:

function readAgentRuntimeMetrics() {
  const metrics = { turns: undefined, estimatedCostUsd: undefined, warningCount: 0 };
  try {
    const content = fs.readFileSync(AGENT_STDIO_LOG_PATH, "utf8");
    const lines = content.split("\n");
    for (const rawLine of lines) {
      const line = rawLine.trim();
      if (!line) continue;
      if (/^(?:\[WARN\]|npm warn\b)/i.test(line)) metrics.warningCount += 1;
      const jsonStart = line.indexOf("{");
      if (jsonStart < 0) continue;
      try {
        const parsed = JSON.parse(line.slice(jsonStart));
        if (!parsed || parsed.type !== "result") continue;
        if (typeof parsed.num_turns === "number" && parsed.num_turns >= 0) metrics.turns = parsed.num_turns;
        if (typeof parsed.total_cost_usd === "number" && Number.isFinite(parsed.total_cost_usd) && parsed.total_cost_usd >= 0) {
          metrics.estimatedCostUsd = parsed.total_cost_usd;
        }
        // ⚠️ parsed.stop_reason and parsed.is_error are visible here but never extracted.
      } catch { /* ignore */ }
    }
  } catch { return metrics; }
  return metrics;
}

actions/setup/js/send_otlp_span.cjs:1462–1500 — the agent span builds OTel GenAI attributes but never sets a response attribute:

const agentAttributes = [...attributes, ...usageAttrs];
agentAttributes.push(buildAttr("gen_ai.operation.name", "chat"));
if (engineId) {
  agentAttributes.push(buildAttr("gen_ai.system", ENGINE_TO_SYSTEM_MAP[engineId] || engineId));
  agentAttributes.push(buildAttr("gh-aw.engine", engineId));
}
if (workflowName) agentAttributes.push(buildAttr("gen_ai.workflow.name", workflowName));
// ⚠️ No gen_ai.response.* attributes anywhere.
Proposed Change

Extend readAgentRuntimeMetrics to extract stop_reason and is_error, then attach them as standard OTel GenAI attributes on the dedicated agent span.

// 1) actions/setup/js/send_otlp_span.cjs — update the typedef and parser

/**
 * `@typedef` {Object} AgentRuntimeMetrics
 * `@property` {number | undefined} turns
 * `@property` {number | undefined} estimatedCostUsd
 * `@property` {number} warningCount
 * `@property` {string | undefined} finishReason   // NEW: e.g. "end_turn", "max_tokens", "tool_use"
 * `@property` {boolean | undefined} isError       // NEW: Claude SDK reports an internal error
 */

function readAgentRuntimeMetrics() {
  const metrics = { turns: undefined, estimatedCostUsd: undefined, warningCount: 0, finishReason: undefined, isError: undefined };
  // ... existing loop ...
  if (typeof parsed.stop_reason === "string" && parsed.stop_reason) {
    metrics.finishReason = parsed.stop_reason;
  }
  if (typeof parsed.is_error === "boolean") {
    metrics.isError = parsed.is_error;
  }
  // ...
}

// 2) actions/setup/js/send_otlp_span.cjs — in the dedicated-agent-span block (~line 1462–1483)

if (typeof runtimeMetrics.finishReason === "string") {
  // OTel GenAI semantic convention: gen_ai.response.finish_reasons is an array.
  // We have a single reason, so emit a single-element JSON array string for backends
  // that prefer string-typed attributes (OTLP/HTTP JSON does not natively support arrays).
  agentAttributes.push(buildAttr("gen_ai.response.finish_reasons", JSON.stringify([runtimeMetrics.finishReason])));
  // Mirror as a flat string for backends that filter on scalar attributes.
  agentAttributes.push(buildAttr("gh-aw.agent.finish_reason", runtimeMetrics.finishReason));
}
if (typeof runtimeMetrics.isError === "boolean") {
  agentAttributes.push(buildAttr("gh-aw.agent.is_error", runtimeMetrics.isError));
}

Keep the change scoped to the agent span only (not the conclusion span), matching the existing convention that gen_ai.usage.* lives on the agent span to avoid double-counting.

Expected Outcome

After this change:

  • Grafana / Honeycomb / Datadog: queries like gen_ai.response.finish_reasons = "max_tokens" and gh-aw.agent.finish_reason = "tool_use" become available; native GenAI dashboards in Honeycomb/Grafana will start populating their “Finish Reasons” panels for every gh-aw agent run.
  • JSONL mirror (/tmp/gh-aw/otel.jsonl): every agent span carries the new attributes, so artifact-only debugging gets the same signal without a live collector.
  • On-call engineers: “why did the agent stop early?” collapses from a 10–30 min log-grep to a single dashboard glance. Truncation incidents become alertable rather than discovered-after-the-fact.
Implementation Steps
  • In actions/setup/js/send_otlp_span.cjs, extend the AgentRuntimeMetrics typedef and the parser loop to capture parsed.stop_reason and parsed.is_error.
  • In the dedicated-agent-span block (around line 1462), push gen_ai.response.finish_reasons (JSON-encoded array) and gh-aw.agent.finish_reason / gh-aw.agent.is_error onto agentAttributes when present.
  • Add unit tests in actions/setup/js/action_conclusion_otlp.test.cjs (or action_otlp.test.cjs) that:
    • Write a fake agent-stdio.log with a result line carrying "stop_reason":"max_tokens".
    • Assert the produced agent-span attributes include gen_ai.response.finish_reasons set to ["max_tokens"].
    • Assert the conclusion span does not carry the same attribute (avoid double-emit).
  • Run make test-unit (or cd actions/setup/js && npx vitest run) to confirm tests pass.
  • Run make fmt to ensure formatting.
  • Open a PR referencing this issue.
Evidence from Live Sentry Data

The Sentry MCP server attached to this workflow (/home/runner/work/_temp/gh-aw/mcp-cli/tools/sentry.json) reports an empty tool list — every direct tool call (find_organizations, search_events, etc.) returns unknown tool. Live Sentry data was therefore unavailable for this run, so this recommendation is grounded in static evidence from the code and the existing test fixtures:

  • actions/setup/js/parse_threat_detection_results.test.cjs:91 confirms that the agent stdio JSON shape carries stop_reason ('{"type":"result","subtype":"success","is_error":false,...,"stop_reason":"end_turn"}').
  • actions/setup/js/send_otlp_span.cjs:1117–1129 shows the parser already iterates these lines but extracts only num_turns and total_cost_usd.
  • actions/setup/js/send_otlp_span.cjs:1462–1500 shows the agent span is built with OTel GenAI request/usage attributes but no gen_ai.response.* attributes.

When Sentry data is available again, the recommendation can be reinforced by querying for spans named gh-aw.*.agent in the spans dataset and confirming none of them carry gen_ai.response.finish_reasons.

Related Files
  • actions/setup/js/send_otlp_span.cjs (parser at lines 1095–1139; agent span at 1462–1500)
  • actions/setup/js/action_conclusion_otlp.test.cjs (add unit tests here)
  • actions/setup/js/parse_threat_detection_results.test.cjs (reference for the stdio JSON shape)
  • actions/setup/js/action_otlp.test.cjs (additional test surface)

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor · ● 9.8M ·

  • expires on May 17, 2026, 9:45 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions