OTel: propagate GenAI semantic attrs to gh-aw.agent.conclusion spans#33245
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
gh-aw.agent.conclusion spans
There was a problem hiding this comment.
Pull request overview
Propagates OpenTelemetry GenAI semantic attributes onto gh-aw.agent.conclusion spans so inference activity is recognized by GenAI dashboards even when the dedicated agent sub-span is missing.
Changes:
- Add
gen_ai.operation.name,gen_ai.workflow.name, andgen_ai.response.finish_reasons(when available) to the shared attribute set forjobName === "agent"conclusion spans. - Remove redundant GenAI attribute pushes from the dedicated agent span path so it inherits from the shared attribute list.
- Extend unit tests to assert GenAI attributes on the conclusion span (plus several workflow/action lockfile updates).
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/send_otlp_span.cjs | Adds GenAI semantic attrs to agent conclusion shared attributes; deduplicates agent-span-specific pushes. |
| actions/setup/js/send_otlp_span.test.cjs | Adds assertions that conclusion spans include GenAI semantic attrs (operation/workflow/finish reasons). |
| .github/workflows/stale-repo-identifier.lock.yml | Updates pinned github/stale-repos action SHA/version in the lock workflow. |
| .github/workflows/smoke-otel-backends.lock.yml | Adjusts Datadog MCP header value interpolation to use DD_APP_KEY. |
| .github/workflows/daily-geo-optimizer.lock.yml | Updates pinned actions/checkout and actions/setup-python versions/SHAs for the geo audit job. |
| .github/aw/actions-lock.json | Adds lock entries for additional action versions referenced by the updated lock workflows. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 6/6 changed files
- Comments generated: 2
| expect(conclusionAttrs["gen_ai.operation.name"]).toBe("chat"); | ||
| expect(conclusionAttrs["gen_ai.workflow.name"]).toBe("otel-advisor"); | ||
| }); | ||
|
|
| - name: Run stale-repos | ||
| id: stale-repos | ||
| uses: github/stale-repos@56718cc2bafc8c0fc5fd66751e0b32a33ed36c22 # v9.0.11 (source v9.0.8) | ||
| uses: github/stale-repos@5f2e18fc5432823f96c1feb69327f665c2acab59 # v9.0.8 | ||
| env: |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /zoom-out based on the observability refactor + test coverage in this PR.
Key Themes
- Test coverage gap: The new
jobName === "agent"conditional lacks a negative test validating that non-agent jobs don't receive gen_ai.* attributes - Documentation context: Inline comments could better explain the production issue motivating this change and provide explicit line references for deduplication
- OTel compliance: Strong adherence to OpenTelemetry GenAI semantic conventions
Positive Highlights
- ✅ Excellent test discipline: All three existing tests extended to validate conclusion span attributes — both presence (when data available) and absence (when not)
- ✅ Clean deduplication: Removing redundant attribute pushes improves maintainability and establishes a single source of truth
- ✅ Spec-driven design: GenAI attributes follow OTel semantic conventions precisely (operation.name, workflow.name, finish_reasons)
- ✅ Edge case coverage: Tests validate both
stop_reasonpresent and absent scenarios
Verdict
The implementation is solid and the test coverage is thorough for the happy path. The suggestions above are minor improvements to guard against regressions and improve code archaeology for future maintainers.
Learn more about the skills applied:
/tdd— Test-driven development: red-green-refactor loop/zoom-out— Architectural consistency and broader context
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 3.3M
|
|
||
| const conclusionBody = JSON.parse(mockFetch.mock.calls[1][1].body); | ||
| const conclusionSpan = conclusionBody.resourceSpans[0].scopeSpans[0].spans[0]; | ||
| const conclusionKeys = conclusionSpan.attributes.map(a => a.key); |
There was a problem hiding this comment.
[/tdd] The new if (jobName === "agent") conditional at line 1750 lacks an explicit negative test. Consider adding a test case that validates gen_ai.* attributes are NOT added to non-agent conclusion spans:
it("does not add gen_ai attributes to non-agent conclusion spans", async () => {
fs.writeFileSync(logPath, "job_name=activation\n");
await sendJobConclusionSpan("gh-aw.agent.conclusion");
const conclusionBody = JSON.parse(mockFetch.mock.calls[0][1].body);
const conclusionSpan = conclusionBody.resourceSpans[0].scopeSpans[0].spans[0];
const keys = conclusionSpan.attributes.map(a => a.key);
expect(keys).not.toContain("gen_ai.operation.name");
expect(keys).not.toContain("gen_ai.workflow.name");
});This guards against accidental attribute leakage to non-LLM jobs.
| } | ||
| if (jobName === "agent") { | ||
| // Emit OTel GenAI semantic attributes on agent conclusion spans even when the | ||
| // dedicated agent sub-span is missing, so LLM activity remains discoverable. |
There was a problem hiding this comment.
[/zoom-out] The comment mentions "when the dedicated agent sub-span is missing" — is this a known production issue? Consider adding a reference to a GitHub issue, runbook, or ADR explaining:
- Why the dedicated agent span might be absent in production traces
- What monitoring alerts or dashboards motivated this change
- Whether this is a temporary workaround or permanent architecture
This helps future maintainers understand the context when they encounter this code during debugging.
| agentAttributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [runtimeMetrics.stopReason])); | ||
| } | ||
| // gen_ai.operation.name / gen_ai.workflow.name / gen_ai.response.finish_reasons | ||
| // are already included via the shared attributes list above. |
There was a problem hiding this comment.
[/zoom-out] This comment could be more specific. Instead of "shared attributes list above", consider:
// gen_ai.operation.name, gen_ai.workflow.name, and gen_ai.response.finish_reasons
// are already included via the jobName === "agent" block (lines 1750-1757).Explicit line references make it easier for future maintainers to trace the deduplication during code review or debugging.
🧪 Test Quality Sentinel ReportTest Quality Score: 80/100✅ Excellent test quality
Test Classification Details
Strengths✅ All tests verify behavioral contracts — Every test addition validates observable OTel span attributes sent to endpoints, not internal implementation details. ✅ Both positive and negative cases covered — Tests verify attributes are present when expected AND absent when they shouldn't be (e.g., finish_reasons only when stop_reason is available). ✅ Follows existing test patterns — Assertions parse the actual OTLP payload from mock fetch calls, ensuring end-to-end correctness. ✅ Proportional test growth — 17 lines of test code validate 3 different conditional branches in production code that actually decreased in size (refactoring that moved attributes to shared list). Suggestions for ImprovementWhile the test quality is excellent, here are opportunities to strengthen edge case coverage: Missing edge case: The production code has this conditional: if (jobName === "agent") {
attributes.push(buildAttr("gen_ai.operation.name", "chat"));
// ...
}Consider adding a test that verifies GenAI attributes are NOT added to conclusion spans when Example test: it("does not add GenAI attributes to non-agent job conclusion spans", async () => {
// Test with jobName="setup" or another non-agent job
// Assert gen_ai.operation.name is NOT present on conclusion span
});This would bring edge case coverage from 33% to 50%, improving the overall score to 85/100. Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.
|
gh-awcurrently emits GenAI semantic attributes only on the dedicatedgh-aw.<job>.agentspan, but production traces frequently surface onlygh-aw.agent.conclusion. As a result, inference spans are not consistently recognized by OTel GenAI dashboards.What changed
jobName === "agent"):gen_ai.operation.name = "chat"gen_ai.workflow.name(when available)gen_ai.response.finish_reasons(whenstop_reasonis available)gh-aw.agent.conclusioncarries required/expected GenAI semantics even when the dedicated agent span is absent.Deduplication cleanup
Test updates
send_otlp_span.test.cjsassertions to validate GenAI attrs on the conclusion span:stop_reason