Skip to content

[otel-advisor] add dedicated gh-aw.job.agent span to measure AI execution latency #26499

@github-actions

Description

@github-actions

📡 OTel Instrumentation Improvement: Add dedicated gh-aw.job.agent span for AI execution latency

Analysis Date: 2026-04-15
Priority: High
Effort: Small (< 2h)

Problem

The current instrumentation has two spans per job: a brief gh-aw.<job>.setup span and a gh-aw.<job>.conclusion span that covers the entire job execution window from setup-end to job-end (GITHUB_AW_OTEL_JOB_START_MSnowMs()).

That conclusion span window includes:

  1. The agent AI execution step (the core operation)
  2. The safe-outputs step (GitHub write-back)
  3. Any other cleanup steps

There is no span that isolates just the AI model execution phase. A DevOps engineer cannot answer: "How long did the AI model specifically take?" or "Is this workflow slow because of AI latency or because safe-outputs is hitting GitHub API rate limits?"

The gap is in sendJobConclusionSpan in actions/setup/js/send_otlp_span.cjs (lines 653–847): it emits a single span for the whole window and never separately timestamps the agent execution phase.

Why This Matters (DevOps Perspective)

This is an AI agent runner — the AI execution time is the primary cost driver and the primary latency risk. Without isolating it:

  • No AI latency SLO: You can't alert on "AI step > 5 min" because the conclusion span also includes safe-outputs time.
  • No cost attribution: Token-per-minute rate is impossible to compute without the denominator (AI-only duration).
  • No root-cause triage: A slow job could be slow AI or slow GitHub API — the current trace doesn't disambiguate.
  • No model comparison: When switching models, you can't compare latency in Grafana/Honeycomb because there's no AI-specific duration field.

Adding a dedicated gh-aw.job.agent span reduces MTTR for slow-workflow investigations from "re-read step logs" to "look at the trace waterfall."

Current Behavior

sendJobConclusionSpan currently reads agent_output.json only when the job failed:

// Current: actions/setup/js/send_otlp_span.cjs (lines 707–709)
// agent_output.json is only read to extract error messages on failure.
// Its mtime — which approximates when the agent step finished — is never examined.
const agentOutput = isAgentFailure ? readJSONIfExists("/tmp/gh-aw/agent_output.json") || {} : {};

The conclusion span start time is the setup-end timestamp, and the end time is when the post step runs, making the span duration = AI time + safe-outputs time + everything else.

Proposed Change

After reading agent_output.json, also stat it to get its modification time as a close approximation of when the agent step finished. Use that to emit a child gh-aw.<jobName>.agent span covering just the AI execution window.

// Proposed addition to actions/setup/js/send_otlp_span.cjs
// (inside sendJobConclusionSpan, after building resourceAttributes)

// Attempt to bound the agent execution window using agent_output.json mtime.
// The file is written by the agent runner at the end of AI execution, so its
// mtime is a reliable approximation of when AI processing finished.
// agentStartMs === options.startMs === GITHUB_AW_OTEL_JOB_START_MS (setup-end).
const agentStartMs = options.startMs;
let agentEndMs = null;
try {
  agentEndMs = fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs;
} catch {
  // agent_output.json may not exist for non-agent jobs; skip silently.
}

if (
  typeof agentStartMs === "number" && agentStartMs > 0 &&
  typeof agentEndMs === "number" && agentEndMs > agentStartMs
) {
  // Build a dedicated span for the AI execution phase.
  // Carries the AI-specific attributes (model, tokens, conclusion, errors)
  // as a child of the setup span so it appears as a nested segment in trace views.
  const agentSpanAttrs = [
    ...attributes, // reuse the full attribute set including model, tokens, conclusion
  ];
  const agentSpanPayload = buildOTLPPayload({
    traceId,
    spanId: generateSpanId(),
    ...(parentSpanId ? { parentSpanId } : {}),
    spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
    startMs: agentStartMs,
    endMs: agentEndMs,
    serviceName,
    scopeVersion: version,
    attributes: agentSpanAttrs,
    resourceAttributes,
    statusCode,
    statusMessage,
    events: spanEvents,
  });
  appendToOTLPJSONL(agentSpanPayload);
  if (endpoint) {
    await sendOTLPSpan(endpoint, agentSpanPayload, { skipJSONL: true });
  }
}

Expected Outcome

After this change:

  • In Grafana / Honeycomb / Datadog: A new gh-aw.job.agent span appears in every trace waterfall, showing AI execution time as a distinct segment separate from setup and safe-outputs. You can group by gh-aw.model and plot P50/P95 AI duration over time.
  • In the JSONL mirror: An additional JSON line per run with name: "gh-aw.<job>.agent" and tight startTimeUnixNano/endTimeUnixNano bounds around the AI execution.
  • For on-call engineers: "Why was this workflow slow?" becomes a one-glance answer in the trace view — if the agent span is long, the AI model was slow; if the conclusion span has significant time after the agent span ends, safe-outputs or cleanup was slow.
Implementation Steps
  • In actions/setup/js/send_otlp_span.cjs, inside sendJobConclusionSpan (after resourceAttributes is built, ~line 795), add fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs to get agentEndMs
  • Emit the gh-aw.<jobName>.agent child span when both options.startMs and agentEndMs are valid and agentEndMs > startMs
  • Ensure the new span shares the same traceId, parentSpanId, resourceAttributes, and AI attributes (model, tokens, agentConclusion, error events)
  • Update actions/setup/js/send_otlp_span.test.cjs to assert the agent span is emitted with the expected name, startTimeUnixNano, and endTimeUnixNano when agent_output.json is present
  • Add a test case asserting no agent span is emitted when agent_output.json is absent (non-agent jobs)
  • Run cd actions/setup/js && npx vitest run to confirm tests pass
  • Run make fmt to ensure formatting
  • Open a PR referencing this issue

Evidence from Live Sentry Data

Note: No Sentry MCP server was available in this analysis environment; live span data could not be queried. The recommendation is based entirely on static code analysis of the instrumentation files. The gap (no agent-execution span) is structural and confirmed by the absence of any gh-aw.job.agent span name in the codebase (grep -r "gh-aw.job.agent" actions/setup/js/ returns no results in production code, only test expectations that would validate the new span once added).

To validate with real data after this improvement ships: query Sentry spans for name:"gh-aw.*.agent" and verify startTimeUnixNano aligns with the conclusion span's startTimeUnixNano, confirming the mtime-based boundary is accurate.

Related Files

  • actions/setup/js/send_otlp_span.cjs — primary change: sendJobConclusionSpan
  • actions/setup/js/send_otlp_span.test.cjs — new test cases for agent span emission
  • actions/setup/js/action_conclusion_otlp.cjs — no changes needed (passes startMs through already)
  • actions/setup/js/action_otlp.test.cjs — may need update if integration-level tests are added

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor · ● 204.3K ·

  • expires on Apr 22, 2026, 9:32 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions