Skip to content

Add dedicated gh-aw.<job>.agent OTLP span to isolate AI execution latency#26504

Merged
pelikhan merged 3 commits intomainfrom
copilot/otel-advisor-add-gh-aw-job-agent-span
Apr 15, 2026
Merged

Add dedicated gh-aw.<job>.agent OTLP span to isolate AI execution latency#26504
pelikhan merged 3 commits intomainfrom
copilot/otel-advisor-add-gh-aw-job-agent-span

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 15, 2026

Current OTLP instrumentation only exposes setup and conclusion spans, where conclusion includes AI execution plus post-agent work (safe-outputs/cleanup). This makes AI latency indistinguishable from downstream step latency.

  • Agent-phase span in sendJobConclusionSpan

    • Added a new child span: gh-aw.<jobName>.agent (or gh-aw.job.agent when unnamed).
    • Span bounds:
      • startMs: existing job start boundary (options.startMs / setup-end timestamp)
      • endMs: /tmp/gh-aw/agent_output.json mtimeMs (agent completion proxy)
    • Span is emitted only when both timestamps are valid and ordered.
  • Export behavior

    • New agent span follows existing conclusion span behavior:
      • mirrored to OTLP JSONL artifact
      • exported to OTLP endpoint when configured (skipJSONL: true on network export)
  • Targeted test coverage

    • Added tests for:
      • agent span emission when startMs + agent_output.json mtime are available
      • agent span omission when file mtime is unavailable
const agentStartMs = options.startMs;
let agentEndMs = null;
try {
  agentEndMs = fs.statSync("/tmp/gh-aw/agent_output.json").mtimeMs;
} catch {}

if (typeof agentStartMs === "number" && agentStartMs > 0 &&
    typeof agentEndMs === "number" && agentEndMs > agentStartMs) {
  // emit gh-aw.<job>.agent span
}

Copilot AI changed the title [WIP] Add dedicated gh-aw.job.agent span to measure AI execution latency Add dedicated gh-aw.<job>.agent OTLP span to isolate AI execution latency Apr 15, 2026
Copilot AI requested a review from pelikhan April 15, 2026 22:45
@pelikhan pelikhan marked this pull request as ready for review April 15, 2026 23:03
Copilot AI review requested due to automatic review settings April 15, 2026 23:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated OTLP span for the agent execution phase so AI runtime latency can be distinguished from post-agent conclusion work.

Changes:

  • Emit an additional gh-aw.<jobName>.agent (or gh-aw.job.agent) span inside sendJobConclusionSpan, bounded by options.startMs and /tmp/gh-aw/agent_output.json mtime.
  • Mirror the new span to the OTLP JSONL artifact and optionally export it to the configured OTLP endpoint.
  • Add unit tests covering agent-span emission and omission when agent_output.json mtime is unavailable.
Show a summary per file
File Description
actions/setup/js/send_otlp_span.cjs Adds agent-phase span creation/export alongside the existing conclusion span.
actions/setup/js/send_otlp_span.test.cjs Adds tests validating when the agent span is (and isn’t) emitted.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

actions/setup/js/send_otlp_span.cjs:856

  • The new agent span is described as a child span, but it currently uses the same parentSpanId as the conclusion span (the setup span). That makes the agent and conclusion spans siblings in the trace rather than nesting the agent phase within the conclusion span. Consider generating the conclusion spanId once, using it as parentSpanId for the agent span, and then reusing that same spanId for the conclusion payload (with the setup span as the conclusion's parent).
    const agentPayload = buildOTLPPayload({
      traceId,
      spanId: generateSpanId(),
      ...(parentSpanId ? { parentSpanId } : {}),
      spanName: jobName ? `gh-aw.${jobName}.agent` : "gh-aw.job.agent",
      startMs: agentStartMs,
      endMs: agentEndMs,
      serviceName,
      scopeVersion: version,
      attributes,
      resourceAttributes,
      statusCode,
      statusMessage,
      events: spanEvents,
    });
    appendToOTLPJSONL(agentPayload);
    if (endpoint) {
      await sendOTLPSpan(endpoint, agentPayload, { skipJSONL: true });
    }
  }

  const payload = buildOTLPPayload({
    traceId,
    spanId: generateSpanId(),
    ...(parentSpanId ? { parentSpanId } : {}),
    spanName,
  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment thread actions/setup/js/send_otlp_span.cjs Outdated
resourceAttributes,
statusCode,
statusMessage,
events: spanEvents,
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spanEvents timestamps are built using nowMs() earlier in the function, but this new agent span can end at agentEndMs (mtime) which may be earlier than nowMs(). That can produce OTLP spans where event timestamps fall after the span end time. Consider either recomputing events for the agent span with timestamps clamped to [agentStartMs, agentEndMs] (e.g., use agentEndMs for event time), or omit events from the agent span to avoid invalid timing.

Suggested change
events: spanEvents,

Copilot uses AI. Check for mistakes.

const conclusionBody = JSON.parse(mockFetch.mock.calls[1][1].body);
const conclusionSpan = conclusionBody.resourceSpans[0].scopeSpans[0].spans[0];
expect(conclusionSpan.name).toBe("gh-aw.agent.conclusion");
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests validate span names/timestamps, but they don't assert the intended parent/child relationship between the agent span and the conclusion span. Adding an assertion around parentSpanId/spanId would prevent regressions (e.g., agent span should be a child of the conclusion span if that's the desired nesting).

Suggested change
expect(conclusionSpan.name).toBe("gh-aw.agent.conclusion");
expect(conclusionSpan.name).toBe("gh-aw.agent.conclusion");
expect(agentSpan.traceId).toBe(conclusionSpan.traceId);
expect(agentSpan.parentSpanId).toBe(conclusionSpan.spanId);

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot mentioned this pull request Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 85/100

Excellent test quality

Metric Value
New/modified tests analyzed 2
✅ Design tests (behavioral contracts) 2 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 1 (50%)
Duplicate test clusters 0
Test inflation detected No (ratio: 1.48:1)
🚨 Coding-guideline violations None

Test Classification Details

Test File Classification Issues Detected
emits a dedicated gh-aw.<job>.agent span when startMs and agent_output mtime are available actions/setup/js/send_otlp_span.test.cjs ✅ Design None — verifies span name, start/end timestamps, and that both agent + conclusion spans are emitted
does not emit a dedicated agent span when agent_output mtime is unavailable actions/setup/js/send_otlp_span.test.cjs ✅ Design None — covers the ENOENT error path; asserts graceful fallback to conclusion span only

Flagged Tests — Requires Review

No tests were flagged. Both tests are behavioral contracts with observable-output assertions.


Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 2 tests (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). Both tests verify the observable OTLP output (span names, timestamps, call counts) and mocking is limited to legitimate external I/O targets (fs.statSync, fetch).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.


References: §24482849214

🧪 Test quality analysis by Test Quality Sentinel · ● 519.7K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 85/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). Both new tests verify behavioral contracts (OTLP span names, timestamps, call counts) and properly exercise the happy path and ENOENT error path.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot, review all comments.

@pelikhan pelikhan merged commit 9a627bb into main Apr 15, 2026
@pelikhan pelikhan deleted the copilot/otel-advisor-add-gh-aw-job-agent-span branch April 15, 2026 23:21
Copilot stopped work on behalf of pelikhan due to an error April 15, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[otel-advisor] add dedicated gh-aw.job.agent span to measure AI execution latency

3 participants