You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
📡 OTel Instrumentation Improvement: Add dedicated gh-aw.job.agent span for AI execution latency
Analysis Date: 2026-04-15 Priority: High Effort: Small (< 2h)
Problem
The current instrumentation has two spans per job: a brief gh-aw.<job>.setup span and a gh-aw.<job>.conclusion span that covers the entire job execution window from setup-end to job-end (GITHUB_AW_OTEL_JOB_START_MS → nowMs()).
That conclusion span window includes:
The agent AI execution step (the core operation)
The safe-outputs step (GitHub write-back)
Any other cleanup steps
There is no span that isolates just the AI model execution phase. A DevOps engineer cannot answer: "How long did the AI model specifically take?" or "Is this workflow slow because of AI latency or because safe-outputs is hitting GitHub API rate limits?"
The gap is in sendJobConclusionSpan in actions/setup/js/send_otlp_span.cjs (lines 653–847): it emits a single span for the whole window and never separately timestamps the agent execution phase.
Why This Matters (DevOps Perspective)
This is an AI agent runner — the AI execution time is the primary cost driver and the primary latency risk. Without isolating it:
No AI latency SLO: You can't alert on "AI step > 5 min" because the conclusion span also includes safe-outputs time.
No cost attribution: Token-per-minute rate is impossible to compute without the denominator (AI-only duration).
No root-cause triage: A slow job could be slow AI or slow GitHub API — the current trace doesn't disambiguate.
No model comparison: When switching models, you can't compare latency in Grafana/Honeycomb because there's no AI-specific duration field.
Adding a dedicated gh-aw.job.agent span reduces MTTR for slow-workflow investigations from "re-read step logs" to "look at the trace waterfall."
Current Behavior
sendJobConclusionSpan currently reads agent_output.json only when the job failed:
// Current: actions/setup/js/send_otlp_span.cjs (lines 707–709)// agent_output.json is only read to extract error messages on failure.// Its mtime — which approximates when the agent step finished — is never examined.constagentOutput=isAgentFailure ? readJSONIfExists("/tmp/gh-aw/agent_output.json")||{} : {};
The conclusion span start time is the setup-end timestamp, and the end time is when the post step runs, making the span duration = AI time + safe-outputs time + everything else.
Proposed Change
After reading agent_output.json, also stat it to get its modification time as a close approximation of when the agent step finished. Use that to emit a child gh-aw.<jobName>.agent span covering just the AI execution window.
// Proposed addition to actions/setup/js/send_otlp_span.cjs// (inside sendJobConclusionSpan, after building resourceAttributes)// Attempt to bound the agent execution window using agent_output.json mtime.// The file is written by the agent runner at the end of AI execution, so its// mtime is a reliable approximation of when AI processing finished.// agentStartMs === options.startMs === GITHUB_AW_OTEL_JOB_START_MS (setup-end).constagentStartMs=options.startMs;letagentEndMs=null;try{agentEndMs=fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs;}catch{// agent_output.json may not exist for non-agent jobs; skip silently.}if(typeofagentStartMs==="number"&&agentStartMs>0&&typeofagentEndMs==="number"&&agentEndMs>agentStartMs){// Build a dedicated span for the AI execution phase.// Carries the AI-specific attributes (model, tokens, conclusion, errors)// as a child of the setup span so it appears as a nested segment in trace views.constagentSpanAttrs=[
...attributes,// reuse the full attribute set including model, tokens, conclusion];constagentSpanPayload=buildOTLPPayload({
traceId,spanId: generateSpanId(),
...(parentSpanId ? { parentSpanId } : {}),spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",startMs: agentStartMs,endMs: agentEndMs,
serviceName,scopeVersion: version,attributes: agentSpanAttrs,
resourceAttributes,
statusCode,
statusMessage,events: spanEvents,});appendToOTLPJSONL(agentSpanPayload);if(endpoint){awaitsendOTLPSpan(endpoint,agentSpanPayload,{skipJSONL: true});}}
Expected Outcome
After this change:
In Grafana / Honeycomb / Datadog: A new gh-aw.job.agent span appears in every trace waterfall, showing AI execution time as a distinct segment separate from setup and safe-outputs. You can group by gh-aw.model and plot P50/P95 AI duration over time.
In the JSONL mirror: An additional JSON line per run with name: "gh-aw.<job>.agent" and tight startTimeUnixNano/endTimeUnixNano bounds around the AI execution.
For on-call engineers: "Why was this workflow slow?" becomes a one-glance answer in the trace view — if the agent span is long, the AI model was slow; if the conclusion span has significant time after the agent span ends, safe-outputs or cleanup was slow.
Implementation Steps
In actions/setup/js/send_otlp_span.cjs, inside sendJobConclusionSpan (after resourceAttributes is built, ~line 795), add fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs to get agentEndMs
Emit the gh-aw.<jobName>.agent child span when both options.startMs and agentEndMs are valid and agentEndMs > startMs
Ensure the new span shares the same traceId, parentSpanId, resourceAttributes, and AI attributes (model, tokens, agentConclusion, error events)
Update actions/setup/js/send_otlp_span.test.cjs to assert the agent span is emitted with the expected name, startTimeUnixNano, and endTimeUnixNano when agent_output.json is present
Add a test case asserting no agent span is emitted when agent_output.json is absent (non-agent jobs)
Run cd actions/setup/js && npx vitest run to confirm tests pass
Run make fmt to ensure formatting
Open a PR referencing this issue
Evidence from Live Sentry Data
Note: No Sentry MCP server was available in this analysis environment; live span data could not be queried. The recommendation is based entirely on static code analysis of the instrumentation files. The gap (no agent-execution span) is structural and confirmed by the absence of any gh-aw.job.agent span name in the codebase (grep -r "gh-aw.job.agent" actions/setup/js/ returns no results in production code, only test expectations that would validate the new span once added).
To validate with real data after this improvement ships: query Sentry spans for name:"gh-aw.*.agent" and verify startTimeUnixNano aligns with the conclusion span's startTimeUnixNano, confirming the mtime-based boundary is accurate.
📡 OTel Instrumentation Improvement: Add dedicated
gh-aw.job.agentspan for AI execution latencyAnalysis Date: 2026-04-15
Priority: High
Effort: Small (< 2h)
Problem
The current instrumentation has two spans per job: a brief
gh-aw.<job>.setupspan and agh-aw.<job>.conclusionspan that covers the entire job execution window from setup-end to job-end (GITHUB_AW_OTEL_JOB_START_MS→nowMs()).That conclusion span window includes:
There is no span that isolates just the AI model execution phase. A DevOps engineer cannot answer: "How long did the AI model specifically take?" or "Is this workflow slow because of AI latency or because safe-outputs is hitting GitHub API rate limits?"
The gap is in
sendJobConclusionSpaninactions/setup/js/send_otlp_span.cjs(lines 653–847): it emits a single span for the whole window and never separately timestamps the agent execution phase.Why This Matters (DevOps Perspective)
This is an AI agent runner — the AI execution time is the primary cost driver and the primary latency risk. Without isolating it:
Adding a dedicated
gh-aw.job.agentspan reduces MTTR for slow-workflow investigations from "re-read step logs" to "look at the trace waterfall."Current Behavior
sendJobConclusionSpancurrently readsagent_output.jsononly when the job failed:The conclusion span start time is the setup-end timestamp, and the end time is when the post step runs, making the span duration = AI time + safe-outputs time + everything else.
Proposed Change
After reading
agent_output.json, alsostatit to get its modification time as a close approximation of when the agent step finished. Use that to emit a childgh-aw.<jobName>.agentspan covering just the AI execution window.Expected Outcome
After this change:
gh-aw.job.agentspan appears in every trace waterfall, showing AI execution time as a distinct segment separate from setup and safe-outputs. You can group bygh-aw.modeland plot P50/P95 AI duration over time.name: "gh-aw.<job>.agent"and tightstartTimeUnixNano/endTimeUnixNanobounds around the AI execution.agentspan is long, the AI model was slow; if theconclusionspan has significant time after theagentspan ends, safe-outputs or cleanup was slow.Implementation Steps
actions/setup/js/send_otlp_span.cjs, insidesendJobConclusionSpan(afterresourceAttributesis built, ~line 795), addfs.statSync("/tmp/gh-aw/agent_output.json").mtimeMsto getagentEndMsgh-aw.<jobName>.agentchild span when bothoptions.startMsandagentEndMsare valid andagentEndMs > startMstraceId,parentSpanId,resourceAttributes, and AI attributes (model, tokens,agentConclusion, error events)actions/setup/js/send_otlp_span.test.cjsto assert the agent span is emitted with the expectedname,startTimeUnixNano, andendTimeUnixNanowhenagent_output.jsonis presentagent_output.jsonis absent (non-agent jobs)cd actions/setup/js && npx vitest runto confirm tests passmake fmtto ensure formattingEvidence from Live Sentry Data
To validate with real data after this improvement ships: query Sentry spans for
name:"gh-aw.*.agent"and verifystartTimeUnixNanoaligns with the conclusion span'sstartTimeUnixNano, confirming the mtime-based boundary is accurate.Related Files
actions/setup/js/send_otlp_span.cjs— primary change:sendJobConclusionSpanactions/setup/js/send_otlp_span.test.cjs— new test cases for agent span emissionactions/setup/js/action_conclusion_otlp.cjs— no changes needed (passesstartMsthrough already)actions/setup/js/action_otlp.test.cjs— may need update if integration-level tests are addedGenerated by the Daily OTel Instrumentation Advisor workflow