Skip to content

[otel-advisor] OTel improvement: emit agent execution span for timed-out runs where agent_output.json is absentΒ #27228

@github-actions

Description

@github-actions

πŸ“‘ OTel Instrumentation Improvement: emit gh-aw.agent.agent span for timed-out runs

Analysis Date: 2026-04-19
Priority: High
Effort: Small (< 2h)

Problem

The gh-aw.agent.agent sub-span β€” which measures pure AI execution latency β€” is only emitted when agent_output.json exists with a valid mtime. For timed-out runs (GH_AW_AGENT_CONCLUSION=timed_out), the agent process is killed before agent_output.json is written, so fs.statSync throws and agentEndMs stays null. The guard condition on line 837 of send_otlp_span.cjs then fails silently, and no agent span is emitted for the most operationally critical failure mode.

A DevOps engineer today cannot answer: "Did this workflow time out after 5 minutes (misconfigured) or after 50 minutes (model ran long)?" β€” that distinction is invisible in timed-out traces.

Why This Matters (DevOps Perspective)

Timed-out runs are the failure mode most likely to hide cost and latency regressions. Without the agent span for timeouts:

  • Grafana / Honeycomb / Datadog: you cannot plot AI execution duration for failed runs, making it impossible to set duration-based alerts that catch runaway agents before they exhaust budget.
  • MTTR: engineers triaging a timeout must mentally subtract setup overhead from the conclusion span duration rather than reading the AI latency directly.
  • Trace consistency: successful traces have 3 spans (setup, agent, conclusion); timed-out traces have only 2 (setup, conclusion). The missing span breaks span-count-based dashboards and makes trace shapes inconsistent.

Current Behavior

// actions/setup/js/send_otlp_span.cjs (lines 827–837)
const agentStartMs = options.startMs;
let agentEndMs = null;
try {
  agentEndMs = fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs;
} catch {
  // agent_output.json may not exist for non-agent jobs; skip dedicated span.
}

if (jobName === "agent" && typeof agentStartMs === "number" && agentStartMs > 0
    && typeof agentEndMs === "number" && agentEndMs > agentStartMs) {
  // ... emit agent span (never reached for timed-out runs)
}

For GH_AW_AGENT_CONCLUSION=timed_out, agent_output.json is absent β†’ statSync throws β†’ agentEndMs is null β†’ the typeof agentEndMs === "number" guard fails β†’ no agent span emitted.

Proposed Change

Fall back to nowMs() as the agent span end time when the run is a timed-out failure and agent_output.json is absent. This bounds the AI execution duration to [setup-end, conclusion-start], which is a useful lower bound even if slightly larger than the true agent wall-clock time.

// Proposed change to actions/setup/js/send_otlp_span.cjs (around line 827)
const agentStartMs = options.startMs;
let agentEndMs = null;
try {
  agentEndMs = fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs;
} catch {
  // agent_output.json absent (e.g. timed-out run where the agent process was killed
  // before writing output): fall back to nowMs() so the agent span still bounds
  // execution duration. Only do this for agent failures β€” non-agent jobs (safe-outputs,
  // activation) should not emit an agent span.
  if (isAgentFailure && jobName === "agent"
      && typeof agentStartMs === "number" && agentStartMs > 0) {
    agentEndMs = nowMs();
  }
}

if (jobName === "agent" && typeof agentStartMs === "number" && agentStartMs > 0
    && typeof agentEndMs === "number" && agentEndMs > agentStartMs) {
  // ... emit agent span β€” now also runs for timed-out jobs
}

Expected Outcome

After this change:

  • In Grafana / Honeycomb / Datadog: timed-out traces now have 3 spans (setup, agent, conclusion), matching successful traces. You can plot gh-aw.agent.agent span duration across all outcomes and alert when AI latency exceeds a threshold regardless of whether the run succeeded.
  • In the JSONL mirror: otel.jsonl gains an agent span entry for every timed-out run, improving post-hoc artifact-based debugging.
  • For on-call engineers: "How long did the AI run before timing out?" becomes a one-click query on the gh-aw.agent.agent span duration rather than a manual subtraction from conclusion span duration.
Implementation Steps
  • In actions/setup/js/send_otlp_span.cjs (lines 828–836): update the catch block to set agentEndMs = nowMs() when isAgentFailure && jobName === "agent" && typeof agentStartMs === "number" && agentStartMs > 0
  • Update actions/setup/js/send_otlp_span.test.cjs (around line 1614, the "does not emit a dedicated agent span when agent_output mtime is unavailable" test): add a sibling test that asserts an agent span IS emitted when GH_AW_AGENT_CONCLUSION=timed_out and statSync throws
  • Keep the existing test at line 1614 but scope it to non-failure cases (e.g. GH_AW_AGENT_CONCLUSION unset) to preserve the "non-agent jobs skip the span" invariant
  • Run cd actions/setup/js && npx vitest run to confirm tests pass
  • Run make fmt to ensure formatting
  • Open a PR referencing this issue

Evidence from Live Sentry Data

The Sentry MCP server returned 0 available tools during this analysis run and could not be queried. The finding is based entirely on static code analysis of send_otlp_span.cjs (lines 827–858). The gap is confirmed by the existing test at line 1614 of send_otlp_span.test.cjs, which explicitly tests that no agent span is emitted when statSync throws β€” and that test passes today, documenting the missing span as known (but intentional-seeming) behavior. No comparable test asserts the span IS emitted for timed-out failure runs.

Related Files

  • actions/setup/js/send_otlp_span.cjs β€” primary change (lines 827–837)
  • actions/setup/js/send_otlp_span.test.cjs β€” add test for timed-out agent span emission
  • actions/setup/js/action_conclusion_otlp.cjs β€” no change needed (orchestrates sendJobConclusionSpan which handles the logic)

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor Β· ● 186.3K Β· β—·

  • expires on Apr 26, 2026, 9:24 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions