[otel-advisor] add dedicated gh-aw.job.agent span to measure AI execution latency

### 📡 OTel Instrumentation Improvement: Add dedicated `gh-aw.job.agent` span for AI execution latency

**Analysis Date**: 2026-04-15
**Priority**: High
**Effort**: Small (< 2h)

### Problem

The current instrumentation has two spans per job: a brief `gh-aw.<job>.setup` span and a `gh-aw.<job>.conclusion` span that covers the entire job execution window from setup-end to job-end (`GITHUB_AW_OTEL_JOB_START_MS` → `nowMs()`).

That conclusion span window includes:
1. The agent AI execution step (the core operation)
2. The safe-outputs step (GitHub write-back)
3. Any other cleanup steps

There is no span that isolates just the AI model execution phase. A DevOps engineer cannot answer: **"How long did the AI model specifically take?"** or **"Is this workflow slow because of AI latency or because safe-outputs is hitting GitHub API rate limits?"**

The gap is in `sendJobConclusionSpan` in `actions/setup/js/send_otlp_span.cjs` (lines 653–847): it emits a single span for the whole window and never separately timestamps the agent execution phase.

### Why This Matters (DevOps Perspective)

This is an AI agent runner — the AI execution time is the primary cost driver and the primary latency risk. Without isolating it:

- **No AI latency SLO**: You can't alert on "AI step > 5 min" because the conclusion span also includes safe-outputs time.
- **No cost attribution**: Token-per-minute rate is impossible to compute without the denominator (AI-only duration).
- **No root-cause triage**: A slow job could be slow AI or slow GitHub API — the current trace doesn't disambiguate.
- **No model comparison**: When switching models, you can't compare latency in Grafana/Honeycomb because there's no AI-specific duration field.

Adding a dedicated `gh-aw.job.agent` span reduces MTTR for slow-workflow investigations from "re-read step logs" to "look at the trace waterfall."

### Current Behavior

`sendJobConclusionSpan` currently reads `agent_output.json` only when the job failed:

```javascript
// Current: actions/setup/js/send_otlp_span.cjs (lines 707–709)
// agent_output.json is only read to extract error messages on failure.
// Its mtime — which approximates when the agent step finished — is never examined.
const agentOutput = isAgentFailure ? readJSONIfExists("/tmp/gh-aw/agent_output.json") || {} : {};
```

The conclusion span start time is the setup-end timestamp, and the end time is when the post step runs, making the span duration = AI time + safe-outputs time + everything else.

### Proposed Change

After reading `agent_output.json`, also `stat` it to get its modification time as a close approximation of when the agent step finished. Use that to emit a child `gh-aw.<jobName>.agent` span covering just the AI execution window.

```javascript
// Proposed addition to actions/setup/js/send_otlp_span.cjs
// (inside sendJobConclusionSpan, after building resourceAttributes)

// Attempt to bound the agent execution window using agent_output.json mtime.
// The file is written by the agent runner at the end of AI execution, so its
// mtime is a reliable approximation of when AI processing finished.
// agentStartMs === options.startMs === GITHUB_AW_OTEL_JOB_START_MS (setup-end).
const agentStartMs = options.startMs;
let agentEndMs = null;
try {
  agentEndMs = fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs;
} catch {
  // agent_output.json may not exist for non-agent jobs; skip silently.
}

if (
  typeof agentStartMs === "number" && agentStartMs > 0 &&
  typeof agentEndMs === "number" && agentEndMs > agentStartMs
) {
  // Build a dedicated span for the AI execution phase.
  // Carries the AI-specific attributes (model, tokens, conclusion, errors)
  // as a child of the setup span so it appears as a nested segment in trace views.
  const agentSpanAttrs = [
    ...attributes, // reuse the full attribute set including model, tokens, conclusion
  ];
  const agentSpanPayload = buildOTLPPayload({
    traceId,
    spanId: generateSpanId(),
    ...(parentSpanId ? { parentSpanId } : {}),
    spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
    startMs: agentStartMs,
    endMs: agentEndMs,
    serviceName,
    scopeVersion: version,
    attributes: agentSpanAttrs,
    resourceAttributes,
    statusCode,
    statusMessage,
    events: spanEvents,
  });
  appendToOTLPJSONL(agentSpanPayload);
  if (endpoint) {
    await sendOTLPSpan(endpoint, agentSpanPayload, { skipJSONL: true });
  }
}
```

### Expected Outcome

After this change:

- **In Grafana / Honeycomb / Datadog**: A new `gh-aw.job.agent` span appears in every trace waterfall, showing AI execution time as a distinct segment separate from setup and safe-outputs. You can group by `gh-aw.model` and plot P50/P95 AI duration over time.
- **In the JSONL mirror**: An additional JSON line per run with `name: "gh-aw.<job>.agent"` and tight `startTimeUnixNano`/`endTimeUnixNano` bounds around the AI execution.
- **For on-call engineers**: "Why was this workflow slow?" becomes a one-glance answer in the trace view — if the `agent` span is long, the AI model was slow; if the `conclusion` span has significant time *after* the `agent` span ends, safe-outputs or cleanup was slow.

<details>
<summary><b>Implementation Steps</b></summary>

- [ ] In `actions/setup/js/send_otlp_span.cjs`, inside `sendJobConclusionSpan` (after `resourceAttributes` is built, ~line 795), add `fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs` to get `agentEndMs`
- [ ] Emit the `gh-aw.<jobName>.agent` child span when both `options.startMs` and `agentEndMs` are valid and `agentEndMs > startMs`
- [ ] Ensure the new span shares the same `traceId`, `parentSpanId`, `resourceAttributes`, and AI attributes (`model`, tokens, `agentConclusion`, error events)
- [ ] Update `actions/setup/js/send_otlp_span.test.cjs` to assert the agent span is emitted with the expected `name`, `startTimeUnixNano`, and `endTimeUnixNano` when `agent_output.json` is present
- [ ] Add a test case asserting no agent span is emitted when `agent_output.json` is absent (non-agent jobs)
- [ ] Run `cd actions/setup/js && npx vitest run` to confirm tests pass
- [ ] Run `make fmt` to ensure formatting
- [ ] Open a PR referencing this issue

</details>

### Evidence from Live Sentry Data

> **Note**: No Sentry MCP server was available in this analysis environment; live span data could not be queried. The recommendation is based entirely on static code analysis of the instrumentation files. The gap (no agent-execution span) is structural and confirmed by the absence of any `gh-aw.job.agent` span name in the codebase (`grep -r "gh-aw.job.agent" actions/setup/js/` returns no results in production code, only test expectations that would validate the new span once added).

To validate with real data after this improvement ships: query Sentry spans for `name:"gh-aw.*.agent"` and verify `startTimeUnixNano` aligns with the conclusion span's `startTimeUnixNano`, confirming the mtime-based boundary is accurate.

### Related Files

- `actions/setup/js/send_otlp_span.cjs` — primary change: `sendJobConclusionSpan`
- `actions/setup/js/send_otlp_span.test.cjs` — new test cases for agent span emission
- `actions/setup/js/action_conclusion_otlp.cjs` — no changes needed (passes `startMs` through already)
- `actions/setup/js/action_otlp.test.cjs` — may need update if integration-level tests are added

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24479222070) workflow*







> Generated by [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24479222070/agentic_workflow) · ● 204.3K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on Apr 22, 2026, 9:32 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] add dedicated gh-aw.job.agent span to measure AI execution latency #26499

📡 OTel Instrumentation Improvement: Add dedicated `gh-aw.job.agent` span for AI execution latency

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Expected Outcome

Evidence from Live Sentry Data

Related Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[otel-advisor] add dedicated gh-aw.job.agent span to measure AI execution latency #26499

Description

📡 OTel Instrumentation Improvement: Add dedicated gh-aw.job.agent span for AI execution latency

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Expected Outcome

Evidence from Live Sentry Data

Related Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

📡 OTel Instrumentation Improvement: Add dedicated `gh-aw.job.agent` span for AI execution latency