📡 OTel Instrumentation Improvement: emit gen_ai.response.finish_reasons on the dedicated agent span
Analysis Date: 2026-05-10
Priority: High
Effort: Small (< 2h)
Problem
readAgentRuntimeMetrics in actions/setup/js/send_otlp_span.cjs (lines 1095–1139) parses the Claude/Codex/Gemini agent's stdio JSON result line and extracts only num_turns and total_cost_usd. The same JSON line already contains stop_reason — this is verified by the test fixtures in actions/setup/js/parse_threat_detection_results.test.cjs:91 (stop_reason":"end_turn"). The field is read into context, ignored, and never makes it into a span attribute.
As a result, the dedicated agent span (gh-aw.<job>.agent, built at send_otlp_span.cjs:1485) carries the OTel GenAI request attributes (gen_ai.request.model, gen_ai.system, gen_ai.operation.name) and usage attributes (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, etc.) but omits the standard gen_ai.response.finish_reasons response attribute. There is no way for a DevOps engineer querying the OTel backend to answer “which runs were truncated by max_tokens?” — a silent-failure mode where the agent's reasoning gets cut mid-thought and the workflow nevertheless reports STATUS_OK because the agent exited without throwing.
Why This Matters (DevOps Perspective)
stop_reason distinguishes between fundamentally different operational outcomes that today look identical in OTel:
stop_reason |
What it means |
Operational signal |
end_turn / stop |
Clean completion |
Healthy |
max_tokens / length |
Output truncated mid-thought |
Silent failure — needs bigger context window or shorter prompt |
tool_use |
Agent exited mid-tool-call |
Often a tool-loop bug |
stop_sequence |
Hit a configured stop string |
Usually fine; sometimes premature |
Without this attribute:
- You cannot alert on truncation pressure (engineers only discover it by reading individual workflow logs).
- Grafana / Datadog / Honeycomb GenAI dashboards that expect
gen_ai.response.finish_reasons show as “unknown” for every gh-aw run.
- MTTR for “why did the agent stop early?” incidents stays in the 10–30 min range because the next step is opening the artifact mirror and grepping
agent-stdio.log by hand.
With this attribute, an engineer can run a single PromQL/SQL filter (gen_ai.response.finish_reasons = "max_tokens") to find every truncated run across the org, and a recurring spike on that filter becomes an actionable alert.
Current Behavior
actions/setup/js/send_otlp_span.cjs:1095–1139 — the parser drops stop_reason:
function readAgentRuntimeMetrics() {
const metrics = { turns: undefined, estimatedCostUsd: undefined, warningCount: 0 };
try {
const content = fs.readFileSync(AGENT_STDIO_LOG_PATH, "utf8");
const lines = content.split("\n");
for (const rawLine of lines) {
const line = rawLine.trim();
if (!line) continue;
if (/^(?:\[WARN\]|npm warn\b)/i.test(line)) metrics.warningCount += 1;
const jsonStart = line.indexOf("{");
if (jsonStart < 0) continue;
try {
const parsed = JSON.parse(line.slice(jsonStart));
if (!parsed || parsed.type !== "result") continue;
if (typeof parsed.num_turns === "number" && parsed.num_turns >= 0) metrics.turns = parsed.num_turns;
if (typeof parsed.total_cost_usd === "number" && Number.isFinite(parsed.total_cost_usd) && parsed.total_cost_usd >= 0) {
metrics.estimatedCostUsd = parsed.total_cost_usd;
}
// ⚠️ parsed.stop_reason and parsed.is_error are visible here but never extracted.
} catch { /* ignore */ }
}
} catch { return metrics; }
return metrics;
}
actions/setup/js/send_otlp_span.cjs:1462–1500 — the agent span builds OTel GenAI attributes but never sets a response attribute:
const agentAttributes = [...attributes, ...usageAttrs];
agentAttributes.push(buildAttr("gen_ai.operation.name", "chat"));
if (engineId) {
agentAttributes.push(buildAttr("gen_ai.system", ENGINE_TO_SYSTEM_MAP[engineId] || engineId));
agentAttributes.push(buildAttr("gh-aw.engine", engineId));
}
if (workflowName) agentAttributes.push(buildAttr("gen_ai.workflow.name", workflowName));
// ⚠️ No gen_ai.response.* attributes anywhere.
Proposed Change
Extend readAgentRuntimeMetrics to extract stop_reason and is_error, then attach them as standard OTel GenAI attributes on the dedicated agent span.
// 1) actions/setup/js/send_otlp_span.cjs — update the typedef and parser
/**
* `@typedef` {Object} AgentRuntimeMetrics
* `@property` {number | undefined} turns
* `@property` {number | undefined} estimatedCostUsd
* `@property` {number} warningCount
* `@property` {string | undefined} finishReason // NEW: e.g. "end_turn", "max_tokens", "tool_use"
* `@property` {boolean | undefined} isError // NEW: Claude SDK reports an internal error
*/
function readAgentRuntimeMetrics() {
const metrics = { turns: undefined, estimatedCostUsd: undefined, warningCount: 0, finishReason: undefined, isError: undefined };
// ... existing loop ...
if (typeof parsed.stop_reason === "string" && parsed.stop_reason) {
metrics.finishReason = parsed.stop_reason;
}
if (typeof parsed.is_error === "boolean") {
metrics.isError = parsed.is_error;
}
// ...
}
// 2) actions/setup/js/send_otlp_span.cjs — in the dedicated-agent-span block (~line 1462–1483)
if (typeof runtimeMetrics.finishReason === "string") {
// OTel GenAI semantic convention: gen_ai.response.finish_reasons is an array.
// We have a single reason, so emit a single-element JSON array string for backends
// that prefer string-typed attributes (OTLP/HTTP JSON does not natively support arrays).
agentAttributes.push(buildAttr("gen_ai.response.finish_reasons", JSON.stringify([runtimeMetrics.finishReason])));
// Mirror as a flat string for backends that filter on scalar attributes.
agentAttributes.push(buildAttr("gh-aw.agent.finish_reason", runtimeMetrics.finishReason));
}
if (typeof runtimeMetrics.isError === "boolean") {
agentAttributes.push(buildAttr("gh-aw.agent.is_error", runtimeMetrics.isError));
}
Keep the change scoped to the agent span only (not the conclusion span), matching the existing convention that gen_ai.usage.* lives on the agent span to avoid double-counting.
Expected Outcome
After this change:
- Grafana / Honeycomb / Datadog: queries like
gen_ai.response.finish_reasons = "max_tokens" and gh-aw.agent.finish_reason = "tool_use" become available; native GenAI dashboards in Honeycomb/Grafana will start populating their “Finish Reasons” panels for every gh-aw agent run.
- JSONL mirror (
/tmp/gh-aw/otel.jsonl): every agent span carries the new attributes, so artifact-only debugging gets the same signal without a live collector.
- On-call engineers: “why did the agent stop early?” collapses from a 10–30 min log-grep to a single dashboard glance. Truncation incidents become alertable rather than discovered-after-the-fact.
Implementation Steps
Evidence from Live Sentry Data
The Sentry MCP server attached to this workflow (/home/runner/work/_temp/gh-aw/mcp-cli/tools/sentry.json) reports an empty tool list — every direct tool call (find_organizations, search_events, etc.) returns unknown tool. Live Sentry data was therefore unavailable for this run, so this recommendation is grounded in static evidence from the code and the existing test fixtures:
actions/setup/js/parse_threat_detection_results.test.cjs:91 confirms that the agent stdio JSON shape carries stop_reason ('{"type":"result","subtype":"success","is_error":false,...,"stop_reason":"end_turn"}').
actions/setup/js/send_otlp_span.cjs:1117–1129 shows the parser already iterates these lines but extracts only num_turns and total_cost_usd.
actions/setup/js/send_otlp_span.cjs:1462–1500 shows the agent span is built with OTel GenAI request/usage attributes but no gen_ai.response.* attributes.
When Sentry data is available again, the recommendation can be reinforced by querying for spans named gh-aw.*.agent in the spans dataset and confirming none of them carry gen_ai.response.finish_reasons.
Related Files
actions/setup/js/send_otlp_span.cjs (parser at lines 1095–1139; agent span at 1462–1500)
actions/setup/js/action_conclusion_otlp.test.cjs (add unit tests here)
actions/setup/js/parse_threat_detection_results.test.cjs (reference for the stdio JSON shape)
actions/setup/js/action_otlp.test.cjs (additional test surface)
Generated by the Daily OTel Instrumentation Advisor workflow
Generated by Daily OTel Instrumentation Advisor · ● 9.8M · ◷
📡 OTel Instrumentation Improvement: emit
gen_ai.response.finish_reasonson the dedicated agent spanAnalysis Date: 2026-05-10
Priority: High
Effort: Small (< 2h)
Problem
readAgentRuntimeMetricsinactions/setup/js/send_otlp_span.cjs(lines 1095–1139) parses the Claude/Codex/Gemini agent's stdio JSONresultline and extracts onlynum_turnsandtotal_cost_usd. The same JSON line already containsstop_reason— this is verified by the test fixtures inactions/setup/js/parse_threat_detection_results.test.cjs:91(stop_reason":"end_turn"). The field is read into context, ignored, and never makes it into a span attribute.As a result, the dedicated agent span (
gh-aw.<job>.agent, built atsend_otlp_span.cjs:1485) carries the OTel GenAI request attributes (gen_ai.request.model,gen_ai.system,gen_ai.operation.name) and usage attributes (gen_ai.usage.input_tokens,gen_ai.usage.output_tokens, etc.) but omits the standardgen_ai.response.finish_reasonsresponse attribute. There is no way for a DevOps engineer querying the OTel backend to answer “which runs were truncated bymax_tokens?” — a silent-failure mode where the agent's reasoning gets cut mid-thought and the workflow nevertheless reports STATUS_OK because the agent exited without throwing.Why This Matters (DevOps Perspective)
stop_reasondistinguishes between fundamentally different operational outcomes that today look identical in OTel:stop_reasonend_turn/stopmax_tokens/lengthtool_usestop_sequenceWithout this attribute:
gen_ai.response.finish_reasonsshow as “unknown” for every gh-aw run.agent-stdio.logby hand.With this attribute, an engineer can run a single PromQL/SQL filter (
gen_ai.response.finish_reasons = "max_tokens") to find every truncated run across the org, and a recurring spike on that filter becomes an actionable alert.Current Behavior
actions/setup/js/send_otlp_span.cjs:1095–1139— the parser dropsstop_reason:actions/setup/js/send_otlp_span.cjs:1462–1500— the agent span builds OTel GenAI attributes but never sets a response attribute:Proposed Change
Extend
readAgentRuntimeMetricsto extractstop_reasonandis_error, then attach them as standard OTel GenAI attributes on the dedicated agent span.Keep the change scoped to the agent span only (not the conclusion span), matching the existing convention that
gen_ai.usage.*lives on the agent span to avoid double-counting.Expected Outcome
After this change:
gen_ai.response.finish_reasons = "max_tokens"andgh-aw.agent.finish_reason = "tool_use"become available; native GenAI dashboards in Honeycomb/Grafana will start populating their “Finish Reasons” panels for every gh-aw agent run./tmp/gh-aw/otel.jsonl): every agent span carries the new attributes, so artifact-only debugging gets the same signal without a live collector.Implementation Steps
actions/setup/js/send_otlp_span.cjs, extend theAgentRuntimeMetricstypedef and the parser loop to captureparsed.stop_reasonandparsed.is_error.gen_ai.response.finish_reasons(JSON-encoded array) andgh-aw.agent.finish_reason/gh-aw.agent.is_errorontoagentAttributeswhen present.actions/setup/js/action_conclusion_otlp.test.cjs(oraction_otlp.test.cjs) that:agent-stdio.logwith aresultline carrying"stop_reason":"max_tokens".gen_ai.response.finish_reasonsset to["max_tokens"].make test-unit(orcd actions/setup/js && npx vitest run) to confirm tests pass.make fmtto ensure formatting.Evidence from Live Sentry Data
The Sentry MCP server attached to this workflow (
/home/runner/work/_temp/gh-aw/mcp-cli/tools/sentry.json) reports an empty tool list — every direct tool call (find_organizations,search_events, etc.) returnsunknown tool. Live Sentry data was therefore unavailable for this run, so this recommendation is grounded in static evidence from the code and the existing test fixtures:actions/setup/js/parse_threat_detection_results.test.cjs:91confirms that the agent stdio JSON shape carriesstop_reason('{"type":"result","subtype":"success","is_error":false,...,"stop_reason":"end_turn"}').actions/setup/js/send_otlp_span.cjs:1117–1129shows the parser already iterates these lines but extracts onlynum_turnsandtotal_cost_usd.actions/setup/js/send_otlp_span.cjs:1462–1500shows the agent span is built with OTel GenAI request/usage attributes but nogen_ai.response.*attributes.When Sentry data is available again, the recommendation can be reinforced by querying for spans named
gh-aw.*.agentin the spans dataset and confirming none of them carrygen_ai.response.finish_reasons.Related Files
actions/setup/js/send_otlp_span.cjs(parser at lines 1095–1139; agent span at 1462–1500)actions/setup/js/action_conclusion_otlp.test.cjs(add unit tests here)actions/setup/js/parse_threat_detection_results.test.cjs(reference for the stdio JSON shape)actions/setup/js/action_otlp.test.cjs(additional test surface)Generated by the Daily OTel Instrumentation Advisor workflow