Add dedicated gh-aw.<job>.agent OTLP span to isolate AI execution latency#26504
Add dedicated gh-aw.<job>.agent OTLP span to isolate AI execution latency#26504
gh-aw.<job>.agent OTLP span to isolate AI execution latency#26504Conversation
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e31de787-cc5f-42d0-8e09-ff9d5248fc52 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
gh-aw.<job>.agent OTLP span to isolate AI execution latency
There was a problem hiding this comment.
Pull request overview
Adds a dedicated OTLP span for the agent execution phase so AI runtime latency can be distinguished from post-agent conclusion work.
Changes:
- Emit an additional
gh-aw.<jobName>.agent(orgh-aw.job.agent) span insidesendJobConclusionSpan, bounded byoptions.startMsand/tmp/gh-aw/agent_output.jsonmtime. - Mirror the new span to the OTLP JSONL artifact and optionally export it to the configured OTLP endpoint.
- Add unit tests covering agent-span emission and omission when
agent_output.jsonmtime is unavailable.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/send_otlp_span.cjs | Adds agent-phase span creation/export alongside the existing conclusion span. |
| actions/setup/js/send_otlp_span.test.cjs | Adds tests validating when the agent span is (and isn’t) emitted. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
actions/setup/js/send_otlp_span.cjs:856
- The new agent span is described as a child span, but it currently uses the same
parentSpanIdas the conclusion span (the setup span). That makes the agent and conclusion spans siblings in the trace rather than nesting the agent phase within the conclusion span. Consider generating the conclusion spanId once, using it asparentSpanIdfor the agent span, and then reusing that same spanId for the conclusion payload (with the setup span as the conclusion's parent).
const agentPayload = buildOTLPPayload({
traceId,
spanId: generateSpanId(),
...(parentSpanId ? { parentSpanId } : {}),
spanName: jobName ? `gh-aw.${jobName}.agent` : "gh-aw.job.agent",
startMs: agentStartMs,
endMs: agentEndMs,
serviceName,
scopeVersion: version,
attributes,
resourceAttributes,
statusCode,
statusMessage,
events: spanEvents,
});
appendToOTLPJSONL(agentPayload);
if (endpoint) {
await sendOTLPSpan(endpoint, agentPayload, { skipJSONL: true });
}
}
const payload = buildOTLPPayload({
traceId,
spanId: generateSpanId(),
...(parentSpanId ? { parentSpanId } : {}),
spanName,
- Files reviewed: 2/2 changed files
- Comments generated: 2
| resourceAttributes, | ||
| statusCode, | ||
| statusMessage, | ||
| events: spanEvents, |
There was a problem hiding this comment.
spanEvents timestamps are built using nowMs() earlier in the function, but this new agent span can end at agentEndMs (mtime) which may be earlier than nowMs(). That can produce OTLP spans where event timestamps fall after the span end time. Consider either recomputing events for the agent span with timestamps clamped to [agentStartMs, agentEndMs] (e.g., use agentEndMs for event time), or omit events from the agent span to avoid invalid timing.
| events: spanEvents, |
|
|
||
| const conclusionBody = JSON.parse(mockFetch.mock.calls[1][1].body); | ||
| const conclusionSpan = conclusionBody.resourceSpans[0].scopeSpans[0].spans[0]; | ||
| expect(conclusionSpan.name).toBe("gh-aw.agent.conclusion"); |
There was a problem hiding this comment.
The new tests validate span names/timestamps, but they don't assert the intended parent/child relationship between the agent span and the conclusion span. Adding an assertion around parentSpanId/spanId would prevent regressions (e.g., agent span should be a child of the conclusion span if that's the desired nesting).
| expect(conclusionSpan.name).toBe("gh-aw.agent.conclusion"); | |
| expect(conclusionSpan.name).toBe("gh-aw.agent.conclusion"); | |
| expect(agentSpan.traceId).toBe(conclusionSpan.traceId); | |
| expect(agentSpan.parentSpanId).toBe(conclusionSpan.spanId); |
🧪 Test Quality Sentinel ReportTest Quality Score: 85/100✅ Excellent test quality
Test Classification Details
Flagged Tests — Requires ReviewNo tests were flagged. Both tests are behavioral contracts with observable-output assertions. Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References: §24482849214
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 85/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). Both new tests verify behavioral contracts (OTLP span names, timestamps, call counts) and properly exercise the happy path and ENOENT error path.
|
@copilot, review all comments. |
…tion tests Agent-Logs-Url: https://github.com/github/gh-aw/sessions/8e74eb36-f8b3-4611-8d80-3174611c8250 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Current OTLP instrumentation only exposes setup and conclusion spans, where conclusion includes AI execution plus post-agent work (safe-outputs/cleanup). This makes AI latency indistinguishable from downstream step latency.
Agent-phase span in
sendJobConclusionSpangh-aw.<jobName>.agent(orgh-aw.job.agentwhen unnamed).startMs: existing job start boundary (options.startMs/ setup-end timestamp)endMs:/tmp/gh-aw/agent_output.jsonmtimeMs(agent completion proxy)Export behavior
skipJSONL: trueon network export)Targeted test coverage
startMs+agent_output.jsonmtime are available