Skip to content

Make OTEL service names workflow-specific in compiled OTLP workflows#32488

Merged
pelikhan merged 9 commits into
mainfrom
copilot/fix-workflow-specific-otel-service-names
May 16, 2026
Merged

Make OTEL service names workflow-specific in compiled OTLP workflows#32488
pelikhan merged 9 commits into
mainfrom
copilot/fix-workflow-specific-otel-service-names

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

observability.otlp currently emits a single OTEL_SERVICE_NAME (gh-aw) across all compiled workflows, which collapses traces into one logical service and makes workflow-level filtering noisy.
This change scopes service naming to each workflow while preserving backward-compatible fallback behavior.

  • Compiler behavior

    • injectOTLPConfig now sets:
      • OTEL_SERVICE_NAME: gh-aw.<sanitized-workflow-id> when WorkflowID is available
      • OTEL_SERVICE_NAME: gh-aw when it is not
    • Added otelServiceName(workflowData *WorkflowData) to centralize naming logic.
  • Sanitization contract

    • Service suffix is derived via existing workflow-name sanitization (lowercase + separator/special-char normalization), producing stable backend-friendly names.
  • Targeted coverage updates

    • Updated OTLP injection expectations from fixed gh-aw to workflow-scoped names where applicable.
    • Added direct unit coverage for:
      • workflow ID present (including sanitization path)
      • empty workflow ID fallback
      • nil workflow data fallback
  • Docs

    • Updated OpenTelemetry reference to document gh-aw.<sanitized-workflow-id> and include a concrete transformation example.
func otelServiceName(workflowData *WorkflowData) string {
	const defaultServiceName = "gh-aw"
	if workflowData == nil {
		return defaultServiceName
	}
	sanitizedWorkflowID := SanitizeWorkflowName(workflowData.WorkflowID)
	if sanitizedWorkflowID == "" {
		return defaultServiceName
	}
	return defaultServiceName + "." + sanitizedWorkflowID
}

Copilot AI linked an issue May 15, 2026 that may be closed by this pull request
Copilot AI and others added 3 commits May 15, 2026 22:30
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix workflow-specific OTEL service names for observability Make OTEL service names workflow-specific in compiled OTLP workflows May 15, 2026
Copilot AI requested a review from pelikhan May 15, 2026 22:35
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot Use the workflow name as the workflow call ID, which also handles workflow calls. There is a more general name you can use.

Copilot AI and others added 2 commits May 15, 2026 22:48
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 15, 2026

@copilot Use the workflow name as the workflow call ID, which also handles workflow calls. There is a more general name you can use.

Updated in 3fdd227. OTEL service naming now derives from the compiled workflow name (workflow-call aware), with WorkflowID fallback when the name is unavailable.

@pelikhan pelikhan marked this pull request as ready for review May 15, 2026 22:52
Copilot AI review requested due to automatic review settings May 15, 2026 22:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates compiled OTLP workflows so OTEL_SERVICE_NAME can be workflow-specific instead of always gh-aw.

Changes:

  • Adds otelServiceName and uses it during OTLP env injection.
  • Updates/extends OTLP unit tests for service-name behavior.
  • Updates OpenTelemetry docs for the new service-name format.
Show a summary per file
File Description
pkg/workflow/observability_otlp.go Computes and injects workflow-scoped OTEL service names.
pkg/workflow/observability_otlp_test.go Updates OTLP injection expectations and adds helper tests.
docs/src/content/docs/reference/open-telemetry.md Documents the new OTEL_SERVICE_NAME behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

pkg/workflow/observability_otlp_test.go:666

  • This test locks in display-name precedence, but the workflow-scoped service name should be based on WorkflowID when it is present. Please cover the normal compiler case where both Name and WorkflowID are populated and assert the WorkflowID-derived suffix so duplicate display names across files cannot regress.
	t.Run("uses workflow-specific service name when workflow name is present", func(t *testing.T) {
		got := otelServiceName(&WorkflowData{Name: "Repo Triage/Weekly"})
		assert.Equal(t, "gh-aw.repo-triage-weekly", got)
  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment thread pkg/workflow/observability_otlp.go Outdated
Comment on lines +456 to +460
// Prefer the compiled workflow name because it also reflects workflow_call
// invocations; fall back to WorkflowID for compatibility in edge/test cases.
workflowNameOrID := strings.TrimSpace(workflowData.Name)
if workflowNameOrID == "" {
workflowNameOrID = workflowData.WorkflowID
| --- | --- |
| `OTEL_EXPORTER_OTLP_HEADERS` | Comma-separated `key=value` headers for the first endpoint (when headers are configured). |
| `OTEL_SERVICE_NAME` | Always `gh-aw`. |
| `OTEL_SERVICE_NAME` | `gh-aw.<sanitized-workflow-name>` when the workflow name is available (for example, `Repo Triage/Weekly` → `gh-aw.repo-triage-weekly`); otherwise `gh-aw`. |
Comment thread pkg/workflow/observability_otlp_test.go Outdated
Comment on lines 361 to 365
t.Run("OTEL_SERVICE_NAME includes sanitized workflow name when available", func(t *testing.T) {
c := newCompiler()
wd := &WorkflowData{
Name: "Repo Triage/Weekly",
ParsedFrontmatter: &FrontmatterConfig{
@github-actions github-actions Bot mentioned this pull request May 15, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 5.5M

got := otelServiceName(&WorkflowData{Name: "Repo Triage/Weekly"})
assert.Equal(t, "gh-aw.repo-triage-weekly", got)
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The test value WorkflowID: "Repo Triage/Weekly" doesn't reflect what WorkflowID actually looks like at runtime — GetWorkflowIDFromPath always returns a plain basename without extension (e.g. "repo-triage"), never a slash-separated display name.

Using a realistic value would make this test a better specification and avoid the false implication that WorkflowID can contain slashes or spaces:

t.Run("falls back to workflow ID when workflow name is empty", func(t *testing.T) {
    got := otelServiceName(&WorkflowData{WorkflowID: "repo-triage"})
    assert.Equal(t, "gh-aw.repo-triage", got)
})

The current value still exercises SanitizeWorkflowName, but it conflates the WorkflowID and Name concepts in a way that could mislead future readers.

Comment thread pkg/workflow/observability_otlp.go Outdated
// invocations; fall back to WorkflowID for compatibility in edge/test cases.
workflowNameOrID := strings.TrimSpace(workflowData.Name)
if workflowNameOrID == "" {
workflowNameOrID = workflowData.WorkflowID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] strings.TrimSpace is applied to workflowData.Name on the previous line but not to workflowData.WorkflowID here. In practice WorkflowID is set from filepath.Base(path) so it won't have surrounding whitespace, but the asymmetry is a latent bug if that invariant ever changes. A quick fix keeps the function consistent:

workflowNameOrID = strings.TrimSpace(workflowData.WorkflowID)

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

Excellent test quality

Metric Value
New/modified tests analyzed 3
✅ Design tests (behavioral contracts) 3 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 2 (67%)
Duplicate test clusters 0
Test inflation detected No
🚨 Coding-guideline violations 0

Test Classification Details

View all 3 analyzed tests
Test File Classification Issues Detected
TestInjectOTLPConfig — subtest "OTEL_SERVICE_NAME includes sanitized workflow name when available" observability_otlp_test.go:358 ✅ Design None
TestOTELServiceName — subtest "uses workflow-specific service name when workflow name is present" (updated fixture) observability_otlp_test.go:663 ✅ Design Minor: assert.Equal lacks descriptive message
TestOTELServiceName — subtest "falls back to workflow ID when workflow name is empty" (new) observability_otlp_test.go:668 ✅ Design Minor: assert.Equal lacks descriptive message

Analysis

All three changed/added tests are behavioral contract tests — they assert on the observable output of otelServiceName() and injectOTLPConfig(), verifying that OTEL_SERVICE_NAME is derived from the workflow Name field (primary) with graceful fallback to WorkflowID when Name is empty, and finally to a bare "gh-aw" prefix when both are empty.

The new falls back to workflow ID when workflow name is empty subtest is the highest-value addition: it explicitly guards the fallback contract, ensuring a behavioral regression would be caught if the priority logic changes.

Minor Notes (non-blocking)

Two assert.Equal calls in TestOTELServiceName lack descriptive message arguments (e.g., assert.Equal(t, "gh-aw.repo-triage-weekly", got) without a trailing context string). The codebase guideline requires descriptive messages on all assertions. These follow a pre-existing pattern in the surrounding context but are worth cleaning up.


Score Breakdown

Component Points Notes
Behavioral Coverage (40 pts) 40/40 3/3 tests are design tests
Error/Edge Case Coverage (30 pts) 20/30 2/3 tests cover fallback/edge cases
Low Duplication (20 pts) 20/20 No duplicate clusters
Proportional Growth (10 pts) 10/10 +25 test lines vs +29 prod lines (ratio 0.86)

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 3 subtests — unit (//go:build !integration) ✅
  • 🟨 JavaScript: no test files changed

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). The test changes correctly align behavioral coverage with the new workflow-name-based OTEL service naming contract.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct

Goal: Shift toward tests that describe the system's behavioral contract.

References: §25945201645

🧪 Test quality analysis by Test Quality Sentinel · ● 8.7M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 analyzed tests verify behavioral contracts directly, including the new fallback coverage for the WorkflowID-based service name.

@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments

Unresolved review feedback remains on OTEL service naming precedence and the workflow-id fixture. Please update the implementation to prefer WorkflowID when available, adjust the tests/docs to use a realistic basename WorkflowID, and then request another review.

Generated by 👨‍🍳 PR Sous Chef ·

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

Copilot AI and others added 2 commits May 15, 2026 23:33
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 15, 2026

@copilot review all comments

Addressed all actionable review feedback in commits fdbf563 and 234d1d4:

  • OTEL service naming now prefers WorkflowID (with fallback to Name, then gh-aw)
  • Tests updated to validate ID-first behavior and precedence
  • OpenTelemetry docs updated to match the WorkflowID-first contract
  • Added missing assertion messages requested by validation feedback

@pelikhan pelikhan merged commit 30c771c into main May 16, 2026
7 checks passed
@pelikhan pelikhan deleted the copilot/fix-workflow-specific-otel-service-names branch May 16, 2026 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Workflow-Specific OTEL Service Names

3 participants