OTel: propagate GenAI semantic attrs to `gh-aw.agent.conclusion` spans by Copilot · Pull Request #33245 · github/gh-aw

Copilot · 2026-05-19T09:01:42Z

gh-aw currently emits GenAI semantic attributes only on the dedicated gh-aw.<job>.agent span, but production traces frequently surface only gh-aw.agent.conclusion. As a result, inference spans are not consistently recognized by OTel GenAI dashboards.

What changed
- Added GenAI inference attributes to the shared attribute set used by agent conclusion spans (jobName === "agent"):
  - gen_ai.operation.name = "chat"
  - gen_ai.workflow.name (when available)
  - gen_ai.response.finish_reasons (when stop_reason is available)
- This ensures gh-aw.agent.conclusion carries required/expected GenAI semantics even when the dedicated agent span is absent.
Deduplication cleanup
- Removed redundant per-agent-span pushes for the same GenAI attrs in the dedicated span block.
- Dedicated agent span now inherits these fields from the shared attribute list, keeping behavior aligned across both span types.
Test updates
- Extended send_otlp_span.test.cjs assertions to validate GenAI attrs on the conclusion span:
  - present for operation/workflow when available
  - present/absent for finish reasons based on runtime stop_reason

if (jobName === "agent") {
  attributes.push(buildAttr("gen_ai.operation.name", "chat"));
  if (workflowName) attributes.push(buildAttr("gen_ai.workflow.name", workflowName));
  if (runtimeMetrics.stopReason) {
    attributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [runtimeMetrics.stopReason]));
  }
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Propagates OpenTelemetry GenAI semantic attributes onto gh-aw.agent.conclusion spans so inference activity is recognized by GenAI dashboards even when the dedicated agent sub-span is missing.

Changes:

Add gen_ai.operation.name, gen_ai.workflow.name, and gen_ai.response.finish_reasons (when available) to the shared attribute set for jobName === "agent" conclusion spans.
Remove redundant GenAI attribute pushes from the dedicated agent span path so it inherits from the shared attribute list.
Extend unit tests to assert GenAI attributes on the conclusion span (plus several workflow/action lockfile updates).

Show a summary per file

File	Description
actions/setup/js/send_otlp_span.cjs	Adds GenAI semantic attrs to agent conclusion shared attributes; deduplicates agent-span-specific pushes.
actions/setup/js/send_otlp_span.test.cjs	Adds assertions that conclusion spans include GenAI semantic attrs (operation/workflow/finish reasons).
.github/workflows/stale-repo-identifier.lock.yml	Updates pinned `github/stale-repos` action SHA/version in the lock workflow.
.github/workflows/smoke-otel-backends.lock.yml	Adjusts Datadog MCP header value interpolation to use `DD_APP_KEY`.
.github/workflows/daily-geo-optimizer.lock.yml	Updates pinned `actions/checkout` and `actions/setup-python` versions/SHAs for the geo audit job.
.github/aw/actions-lock.json	Adds lock entries for additional action versions referenced by the updated lock workflows.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 6/6 changed files
Comments generated: 2

+    expect(conclusionAttrs["gen_ai.operation.name"]).toBe("chat");
+    expect(conclusionAttrs["gen_ai.workflow.name"]).toBe("otel-advisor");
  });



      - name: Run stale-repos
        id: stale-repos
-        uses: github/stale-repos@56718cc2bafc8c0fc5fd66751e0b32a33ed36c22 # v9.0.11 (source v9.0.8)
+        uses: github/stale-repos@5f2e18fc5432823f96c1feb69327f665c2acab59 # v9.0.8
        env:


github-actions

Skills-Based Review 🧠

Applied /tdd and /zoom-out based on the observability refactor + test coverage in this PR.

Key Themes

Test coverage gap: The new jobName === "agent" conditional lacks a negative test validating that non-agent jobs don't receive gen_ai.* attributes
Documentation context: Inline comments could better explain the production issue motivating this change and provide explicit line references for deduplication
OTel compliance: Strong adherence to OpenTelemetry GenAI semantic conventions

Positive Highlights

✅ Excellent test discipline: All three existing tests extended to validate conclusion span attributes — both presence (when data available) and absence (when not)
✅ Clean deduplication: Removing redundant attribute pushes improves maintainability and establishes a single source of truth
✅ Spec-driven design: GenAI attributes follow OTel semantic conventions precisely (operation.name, workflow.name, finish_reasons)
✅ Edge case coverage: Tests validate both stop_reason present and absent scenarios

Verdict

The implementation is solid and the test coverage is thorough for the happy path. The suggestions above are minor improvements to guard against regressions and improve code archaeology for future maintainers.

Learn more about the skills applied:

/tdd — Test-driven development: red-green-refactor loop
/zoom-out — Architectural consistency and broader context

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 3.3M

github-actions · 2026-05-19T09:29:28Z

+
+    const conclusionBody = JSON.parse(mockFetch.mock.calls[1][1].body);
+    const conclusionSpan = conclusionBody.resourceSpans[0].scopeSpans[0].spans[0];
+    const conclusionKeys = conclusionSpan.attributes.map(a => a.key);


[/tdd] The new if (jobName === "agent") conditional at line 1750 lacks an explicit negative test. Consider adding a test case that validates gen_ai.* attributes are NOT added to non-agent conclusion spans:

it("does not add gen_ai attributes to non-agent conclusion spans", async () => { fs.writeFileSync(logPath, "job_name=activation\n"); await sendJobConclusionSpan("gh-aw.agent.conclusion"); const conclusionBody = JSON.parse(mockFetch.mock.calls[0][1].body); const conclusionSpan = conclusionBody.resourceSpans[0].scopeSpans[0].spans[0]; const keys = conclusionSpan.attributes.map(a => a.key); expect(keys).not.toContain("gen_ai.operation.name"); expect(keys).not.toContain("gen_ai.workflow.name"); });

This guards against accidental attribute leakage to non-LLM jobs.

github-actions · 2026-05-19T09:29:28Z

  }
+  if (jobName === "agent") {
+    // Emit OTel GenAI semantic attributes on agent conclusion spans even when the
+    // dedicated agent sub-span is missing, so LLM activity remains discoverable.


[/zoom-out] The comment mentions "when the dedicated agent sub-span is missing" — is this a known production issue? Consider adding a reference to a GitHub issue, runbook, or ADR explaining:

Why the dedicated agent span might be absent in production traces

What monitoring alerts or dashboards motivated this change

Whether this is a temporary workaround or permanent architecture

This helps future maintainers understand the context when they encounter this code during debugging.

github-actions · 2026-05-19T09:29:28Z

-      agentAttributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [runtimeMetrics.stopReason]));
-    }
+    // gen_ai.operation.name / gen_ai.workflow.name / gen_ai.response.finish_reasons
+    // are already included via the shared attributes list above.


[/zoom-out] This comment could be more specific. Instead of "shared attributes list above", consider:

// gen_ai.operation.name, gen_ai.workflow.name, and gen_ai.response.finish_reasons // are already included via the jobName === "agent" block (lines 1750-1757).

Explicit line references make it easier for future maintainers to trace the deduplication during code review or debugging.

github-actions · 2026-05-19T09:31:38Z

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100

✅ Excellent test quality

Metric	Value
New/modified tests analyzed	3
✅ Design tests (behavioral contracts)	3 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	1 (33%)
Duplicate test clusters	0
Test inflation detected	No
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
GenAI attributes on conclusion span (operation & workflow)	`send_otlp_span.test.cjs:2823`	✅ Design	None
GenAI finish_reasons present when stop_reason available	`send_otlp_span.test.cjs:2979`	✅ Design	None
GenAI finish_reasons absent when stop_reason missing	`send_otlp_span.test.cjs:3014`	✅ Design	None

Strengths

✅ All tests verify behavioral contracts — Every test addition validates observable OTel span attributes sent to endpoints, not internal implementation details.

✅ Both positive and negative cases covered — Tests verify attributes are present when expected AND absent when they shouldn't be (e.g., finish_reasons only when stop_reason is available).

✅ Follows existing test patterns — Assertions parse the actual OTLP payload from mock fetch calls, ensuring end-to-end correctness.

✅ Proportional test growth — 17 lines of test code validate 3 different conditional branches in production code that actually decreased in size (refactoring that moved attributes to shared list).

Suggestions for Improvement

While the test quality is excellent, here are opportunities to strengthen edge case coverage:

Missing edge case: jobName !== "agent" path

The production code has this conditional:

if (jobName === "agent") {
  attributes.push(buildAttr("gen_ai.operation.name", "chat"));
  // ...
}

Consider adding a test that verifies GenAI attributes are NOT added to conclusion spans when jobName is something other than "agent" (e.g., "setup", "teardown"). This would validate the guard condition and catch regressions if someone accidentally removes the jobName === "agent" check.

Example test:

it("does not add GenAI attributes to non-agent job conclusion spans", async () => {
  // Test with jobName="setup" or another non-agent job
  // Assert gen_ai.operation.name is NOT present on conclusion span
});

This would bring edge case coverage from 33% to 50%, improving the overall score to 85/100.

Language Support

Tests analyzed:

🟨 JavaScript (*.test.cjs): 3 test additions (vitest)

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All test additions verify behavioral contracts and observable outputs.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 7.4M · ◷

github-actions

✅ Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All tests verify behavioral contracts with observable outputs.

Initial plan

1ee6c59

Copilot AI assigned Copilot and pelikhan May 19, 2026

Copilot started work on behalf of pelikhan May 19, 2026 09:01 View session

Copilot AI linked an issue May 19, 2026 that may be closed by this pull request

[grafana-otel-advisor] OTel improvement: surface gen_ai.operation.name and gen_ai.response.finish_reasons on agent.conclusion spans (dedicated agent sp [Content truncated due to length] #33223

Closed

8 tasks

Copilot AI and others added 2 commits May 19, 2026 09:08

Plan minimal OTel attribute propagation

cb0a15a

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Add GenAI attrs to agent conclusion spans

46aaf00

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Improve OTel instrumentation to surface GenAI attributes on spans~~ OTel: propagate GenAI semantic attrs to gh-aw.agent.conclusion spans May 19, 2026

Copilot AI requested a review from pelikhan May 19, 2026 09:17

Copilot finished work on behalf of pelikhan May 19, 2026 09:17

pelikhan marked this pull request as ready for review May 19, 2026 09:18

Copilot AI review requested due to automatic review settings May 19, 2026 09:18

Copilot started reviewing on behalf of pelikhan May 19, 2026 09:20 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 19, 2026

[aw] No-Op Runs #32279

Open

github-actions Bot reviewed May 19, 2026

View reviewed changes

github-actions Bot approved these changes May 19, 2026

View reviewed changes

pelikhan merged commit 176ca18 into main May 19, 2026
61 of 67 checks passed

pelikhan deleted the copilot/grafana-otel-advisor-otel-improvement-surface-gen branch May 19, 2026 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTel: propagate GenAI semantic attrs to `gh-aw.agent.conclusion` spans#33245

OTel: propagate GenAI semantic attrs to `gh-aw.agent.conclusion` spans#33245
pelikhan merged 3 commits into
mainfrom
copilot/grafana-otel-advisor-otel-improvement-surface-gen

Copilot AI commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 19, 2026

Uh oh!

github-actions Bot May 19, 2026

Uh oh!

github-actions Bot May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Verdict

Uh oh!

github-actions Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100

Test Classification Details

Strengths

Suggestions for Improvement

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 19, 2026 •

edited

Loading