Skip to content

[otel-advisor] add gh-aw.output.item_count to conclusion spans for successful runs #27440

@github-actions

Description

@github-actions

📡 OTel Instrumentation Improvement: surface agent output metrics in conclusion spans for all runs

Analysis Date: 2026-04-20
Priority: Medium
Effort: Small (< 2h)

Problem

sendJobConclusionSpan in actions/setup/js/send_otlp_span.cjs reads agent_output.json only when the agent job failed or timed out. For successful runs the file is never opened, so the conclusion span carries no information about what the agent actually produced. This means an engineer looking at traces in Grafana, Honeycomb, or Datadog cannot answer:

  • "How many items did this run create?"
  • "Was this a read-only run or a write-capable run?"
  • "Is there a correlation between item count and token cost?"
  • "Why did a 'successful' run produce zero outputs when we expected changes?"

The gap is visible by comparing two files in the same directory:

  • generate_observability_summary.cjs always reads agent_output.json and surfaces createdItemCount in the job step summary.
  • send_otlp_span.cjs skips the read entirely for success, so the same metric never reaches the OTLP backend.

Why This Matters (DevOps Perspective)

The step summary is ephemeral and not queryable. OTLP spans are indexed, filterable, and alertable. Without gh-aw.output.item_count in the span:

  • You cannot build a Grafana panel grouping runs by output volume.
  • You cannot set an alert when a workflow that normally creates N items suddenly creates 0.
  • You cannot segment cost (token counts) by productivity (items created) in a single query.
  • An on-call engineer investigating "why did the bot do nothing?" has no span attribute to filter on — they must hunt through step summary HTML.

Adding this single numeric attribute unblocks all of the above.

Current Behavior

// actions/setup/js/send_otlp_span.cjs — lines 708–715
// When the agent failed, read agent_output.json to surface structured error details.
// Lazy-read: skip I/O entirely when the job succeeded or was cancelled.
const agentOutput = isAgentFailure ? readJSONIfExists("/tmp/gh-aw/agent_output.json") || {} : {};
const outputErrors = Array.isArray(agentOutput.errors) ? agentOutput.errors : [];

For a successful run isAgentFailure === false, so agentOutput is {}, items is never accessed, and no output-count attribute is set on the span.

Compare with the summary generator which always reads the file:

// actions/setup/js/generate_observability_summary.cjs
const agentOutput = readJSONIfExists(AGENT_OUTPUT_PATH) || { items: [], errors: [] };
const items = Array.isArray(agentOutput.items) ? agentOutput.items : [];
// → shown as "created items: N" in step summary but never in OTLP

Proposed Change

// actions/setup/js/send_otlp_span.cjs — replace lines 708–715
// Always read agent_output.json so output metrics appear in spans for all runs,
// not just failures. readJSONIfExists returns null when the file is absent (e.g.
// cancelled jobs), which the fallback {} handles safely.
const agentOutput = readJSONIfExists("/tmp/gh-aw/agent_output.json") || {};
const outputErrors = Array.isArray(agentOutput.errors) ? agentOutput.errors : [];
const outputItems = Array.isArray(agentOutput.items) ? agentOutput.items : [];

// --- further down, after the existing error-message attributes ---

// Always include item count so success-path runs are queryable by output volume.
attributes.push(buildAttr("gh-aw.output.item_count", outputItems.length));

// Include unique item types as a comma-separated string for easy grouping.
const itemTypes = [...new Set(outputItems.map(i => (i && typeof i.type === "string" ? i.type : "")).filter(Boolean))].sort();
if (itemTypes.length > 0) {
  attributes.push(buildAttr("gh-aw.output.item_types", itemTypes.join(",")));
}

The same two attributes should also be added to the gh-aw.agent.agent sub-span (built a few lines below using the same attributes array — they'll propagate automatically once added above).

Expected Outcome

After this change:

  • In Grafana / Honeycomb / Datadog: filter spans by gh-aw.output.item_count = 0 to find read-only or zero-output runs; group by gh-aw.output.item_types to see which kinds of items each engine creates; build a scatter plot of gh-aw.tokens.input vs gh-aw.output.item_count to spot cost-to-productivity anomalies.
  • In the JSONL mirror: every conclusion span line in /tmp/gh-aw/otel.jsonl will include the item count, making artifact-based post-mortems immediately actionable without querying the backend.
  • For on-call engineers: a single attribute query (gh-aw.output.item_count = 0 AND gh-aw.agent.conclusion = "success") surfaces runs where the agent ran to completion but produced nothing — a common symptom of silent permission or routing failures.
Implementation Steps
  • In actions/setup/js/send_otlp_span.cjs: change the ternary on line 710 to always call readJSONIfExists("/tmp/gh-aw/agent_output.json") regardless of isAgentFailure.
  • Extract outputItems from the parsed result (same pattern as outputErrors).
  • Add gh-aw.output.item_count (outputItems.length) to attributes unconditionally — place it near the existing gh-aw.effective_tokens attribute for logical grouping.
  • Add gh-aw.output.item_types (comma-separated sorted unique types) to attributes when itemTypes.length > 0.
  • Update the corresponding test file (send_otlp_span.test.cjs) to assert gh-aw.output.item_count is present on success-path conclusion spans and equals the number of items in the mock agent_output.json.
  • Run cd actions/setup/js && npx vitest run to confirm tests pass.
  • Run make fmt to ensure formatting.
  • Open a PR referencing this issue.

Evidence from Live Sentry Data

⚠️ Note: The Sentry MCP server returned an empty tool list during this analysis run (sentry --help showed 0 tools). Live span sampling was not possible. The gap above is confirmed purely by static code analysis — specifically the asymmetry between generate_observability_summary.cjs (which always reads agent_output.json) and sendJobConclusionSpan (which skips the read for non-failure runs). No conflicting live-data evidence is available to deprioritize this recommendation.

Related Files

  • actions/setup/js/send_otlp_span.cjs — primary change site (lines ~708–760)
  • actions/setup/js/generate_observability_summary.cjs — reference implementation for reading agent_output.json
  • actions/setup/js/action_conclusion_otlp.cjs — calls sendJobConclusionSpan; no change needed
  • Test file: actions/setup/js/send_otlp_span.test.cjs (or equivalent)

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor · ● 179.8K ·

  • expires on Apr 27, 2026, 9:30 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions