📡 OTel Instrumentation Improvement: surface agent output metrics in conclusion spans for all runs
Analysis Date: 2026-04-20
Priority: Medium
Effort: Small (< 2h)
Problem
sendJobConclusionSpan in actions/setup/js/send_otlp_span.cjs reads agent_output.json only when the agent job failed or timed out. For successful runs the file is never opened, so the conclusion span carries no information about what the agent actually produced. This means an engineer looking at traces in Grafana, Honeycomb, or Datadog cannot answer:
- "How many items did this run create?"
- "Was this a read-only run or a write-capable run?"
- "Is there a correlation between item count and token cost?"
- "Why did a 'successful' run produce zero outputs when we expected changes?"
The gap is visible by comparing two files in the same directory:
generate_observability_summary.cjs always reads agent_output.json and surfaces createdItemCount in the job step summary.
send_otlp_span.cjs skips the read entirely for success, so the same metric never reaches the OTLP backend.
Why This Matters (DevOps Perspective)
The step summary is ephemeral and not queryable. OTLP spans are indexed, filterable, and alertable. Without gh-aw.output.item_count in the span:
- You cannot build a Grafana panel grouping runs by output volume.
- You cannot set an alert when a workflow that normally creates N items suddenly creates 0.
- You cannot segment cost (token counts) by productivity (items created) in a single query.
- An on-call engineer investigating "why did the bot do nothing?" has no span attribute to filter on — they must hunt through step summary HTML.
Adding this single numeric attribute unblocks all of the above.
Current Behavior
// actions/setup/js/send_otlp_span.cjs — lines 708–715
// When the agent failed, read agent_output.json to surface structured error details.
// Lazy-read: skip I/O entirely when the job succeeded or was cancelled.
const agentOutput = isAgentFailure ? readJSONIfExists("/tmp/gh-aw/agent_output.json") || {} : {};
const outputErrors = Array.isArray(agentOutput.errors) ? agentOutput.errors : [];
For a successful run isAgentFailure === false, so agentOutput is {}, items is never accessed, and no output-count attribute is set on the span.
Compare with the summary generator which always reads the file:
// actions/setup/js/generate_observability_summary.cjs
const agentOutput = readJSONIfExists(AGENT_OUTPUT_PATH) || { items: [], errors: [] };
const items = Array.isArray(agentOutput.items) ? agentOutput.items : [];
// → shown as "created items: N" in step summary but never in OTLP
Proposed Change
// actions/setup/js/send_otlp_span.cjs — replace lines 708–715
// Always read agent_output.json so output metrics appear in spans for all runs,
// not just failures. readJSONIfExists returns null when the file is absent (e.g.
// cancelled jobs), which the fallback {} handles safely.
const agentOutput = readJSONIfExists("/tmp/gh-aw/agent_output.json") || {};
const outputErrors = Array.isArray(agentOutput.errors) ? agentOutput.errors : [];
const outputItems = Array.isArray(agentOutput.items) ? agentOutput.items : [];
// --- further down, after the existing error-message attributes ---
// Always include item count so success-path runs are queryable by output volume.
attributes.push(buildAttr("gh-aw.output.item_count", outputItems.length));
// Include unique item types as a comma-separated string for easy grouping.
const itemTypes = [...new Set(outputItems.map(i => (i && typeof i.type === "string" ? i.type : "")).filter(Boolean))].sort();
if (itemTypes.length > 0) {
attributes.push(buildAttr("gh-aw.output.item_types", itemTypes.join(",")));
}
The same two attributes should also be added to the gh-aw.agent.agent sub-span (built a few lines below using the same attributes array — they'll propagate automatically once added above).
Expected Outcome
After this change:
- In Grafana / Honeycomb / Datadog: filter spans by
gh-aw.output.item_count = 0 to find read-only or zero-output runs; group by gh-aw.output.item_types to see which kinds of items each engine creates; build a scatter plot of gh-aw.tokens.input vs gh-aw.output.item_count to spot cost-to-productivity anomalies.
- In the JSONL mirror: every conclusion span line in
/tmp/gh-aw/otel.jsonl will include the item count, making artifact-based post-mortems immediately actionable without querying the backend.
- For on-call engineers: a single attribute query (
gh-aw.output.item_count = 0 AND gh-aw.agent.conclusion = "success") surfaces runs where the agent ran to completion but produced nothing — a common symptom of silent permission or routing failures.
Implementation Steps
Evidence from Live Sentry Data
⚠️ Note: The Sentry MCP server returned an empty tool list during this analysis run (sentry --help showed 0 tools). Live span sampling was not possible. The gap above is confirmed purely by static code analysis — specifically the asymmetry between generate_observability_summary.cjs (which always reads agent_output.json) and sendJobConclusionSpan (which skips the read for non-failure runs). No conflicting live-data evidence is available to deprioritize this recommendation.
Related Files
actions/setup/js/send_otlp_span.cjs — primary change site (lines ~708–760)
actions/setup/js/generate_observability_summary.cjs — reference implementation for reading agent_output.json
actions/setup/js/action_conclusion_otlp.cjs — calls sendJobConclusionSpan; no change needed
- Test file:
actions/setup/js/send_otlp_span.test.cjs (or equivalent)
Generated by the Daily OTel Instrumentation Advisor workflow
Generated by Daily OTel Instrumentation Advisor · ● 179.8K · ◷
📡 OTel Instrumentation Improvement: surface agent output metrics in conclusion spans for all runs
Analysis Date: 2026-04-20
Priority: Medium
Effort: Small (< 2h)
Problem
sendJobConclusionSpaninactions/setup/js/send_otlp_span.cjsreadsagent_output.jsononly when the agent job failed or timed out. For successful runs the file is never opened, so the conclusion span carries no information about what the agent actually produced. This means an engineer looking at traces in Grafana, Honeycomb, or Datadog cannot answer:The gap is visible by comparing two files in the same directory:
generate_observability_summary.cjsalways readsagent_output.jsonand surfacescreatedItemCountin the job step summary.send_otlp_span.cjsskips the read entirely for success, so the same metric never reaches the OTLP backend.Why This Matters (DevOps Perspective)
The step summary is ephemeral and not queryable. OTLP spans are indexed, filterable, and alertable. Without
gh-aw.output.item_countin the span:Adding this single numeric attribute unblocks all of the above.
Current Behavior
For a successful run
isAgentFailure === false, soagentOutputis{},itemsis never accessed, and no output-count attribute is set on the span.Compare with the summary generator which always reads the file:
Proposed Change
The same two attributes should also be added to the
gh-aw.agent.agentsub-span (built a few lines below using the sameattributesarray — they'll propagate automatically once added above).Expected Outcome
After this change:
gh-aw.output.item_count = 0to find read-only or zero-output runs; group bygh-aw.output.item_typesto see which kinds of items each engine creates; build a scatter plot ofgh-aw.tokens.inputvsgh-aw.output.item_countto spot cost-to-productivity anomalies./tmp/gh-aw/otel.jsonlwill include the item count, making artifact-based post-mortems immediately actionable without querying the backend.gh-aw.output.item_count = 0 AND gh-aw.agent.conclusion = "success") surfaces runs where the agent ran to completion but produced nothing — a common symptom of silent permission or routing failures.Implementation Steps
actions/setup/js/send_otlp_span.cjs: change the ternary on line 710 to always callreadJSONIfExists("/tmp/gh-aw/agent_output.json")regardless ofisAgentFailure.outputItemsfrom the parsed result (same pattern asoutputErrors).gh-aw.output.item_count(outputItems.length) toattributesunconditionally — place it near the existinggh-aw.effective_tokensattribute for logical grouping.gh-aw.output.item_types(comma-separated sorted unique types) toattributeswhenitemTypes.length > 0.send_otlp_span.test.cjs) to assertgh-aw.output.item_countis present on success-path conclusion spans and equals the number of items in the mockagent_output.json.cd actions/setup/js && npx vitest runto confirm tests pass.make fmtto ensure formatting.Evidence from Live Sentry Data
Related Files
actions/setup/js/send_otlp_span.cjs— primary change site (lines ~708–760)actions/setup/js/generate_observability_summary.cjs— reference implementation for readingagent_output.jsonactions/setup/js/action_conclusion_otlp.cjs— callssendJobConclusionSpan; no change neededactions/setup/js/send_otlp_span.test.cjs(or equivalent)Generated by the Daily OTel Instrumentation Advisor workflow