Skip to content

Improve OTel trace signal quality: stack traces, route cardinality, rate-limit status, and container resource metadata#6588

Merged
lpcox merged 4 commits into
mainfrom
copilot/go-fan-review-opentelemetry-sdk
May 27, 2026
Merged

Improve OTel trace signal quality: stack traces, route cardinality, rate-limit status, and container resource metadata#6588
lpcox merged 4 commits into
mainfrom
copilot/go-fan-review-opentelemetry-sdk

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 27, 2026

This PR addresses observability gaps in the current OpenTelemetry integration: error spans were missing stack traces, HTTP server spans used high-cardinality paths, rate-limit failures were not consistently marked as errors, and container metadata was not attached to resources.

  • Error spans now include stack traces

    • Updated RecordError call sites in internal/server/unified.go to pass oteltrace.WithStackTrace(true) for tool denial, tool-call limit, DIFC deny, circuit-breaker open, and backend execution failure paths.
  • Rate-limit outcomes are explicitly error-classified

    • When rate limiting is detected in backend tool execution, both spans now set:
      • tracing.RateLimitHit=true
      • span status = codes.Error with "rate limit exceeded"
    • This now applies to both gateway.backend.execute and the enclosing mcp.tool_call span.
  • HTTP server spans now use semantic-convention route/path attributes

    • internal/tracing/http.go now always records semconv.URLPathKey from r.URL.Path.
    • semconv.HTTPRouteKey is set only when r.Pattern is available (normalized to strip method prefix and method-matched), and is omitted otherwise.
    • This preserves low-cardinality route templates for routed handlers while retaining concrete request paths for debugging.
  • OTel resource now includes container detector

    • Added resource.WithContainer() in internal/tracing/provider.go so container attributes are emitted automatically when available.
  • Focused coverage for route attribute behavior

    • Updated TestWrapHTTPHandler_UsesHTTPRouteWhenPatternAvailable in internal/tracing/provider_test.go to verify http.route template capture while keeping url.path.
    • Added TestWrapHTTPHandler_UsesURLPathWhenPatternUnavailable to verify fallback behavior when no route template is available.
route := r.Pattern
if method, path, ok := strings.Cut(route, " "); ok && strings.EqualFold(method, r.Method) {
    route = path
} else if ok {
    route = ""
}

attrs := []attribute.KeyValue{
    semconv.HTTPRequestMethodKey.String(r.Method),
    semconv.URLPathKey.String(r.URL.Path),
}
if route != "" {
    attrs = append(attrs, semconv.HTTPRouteKey.String(route))
}

Copilot AI linked an issue May 27, 2026 that may be closed by this pull request
6 tasks
Copilot AI changed the title [WIP] Review OpenTelemetry Go SDK integration Improve OTel trace signal quality: stack traces, route cardinality, rate-limit status, and container resource metadata May 27, 2026
Copilot finished work on behalf of lpcox May 27, 2026 15:13
Copilot AI requested a review from lpcox May 27, 2026 15:13
@lpcox lpcox marked this pull request as ready for review May 27, 2026 15:20
Copilot AI review requested due to automatic review settings May 27, 2026 15:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves OpenTelemetry trace signal quality across the gateway by adding stack traces to recorded errors, reducing HTTP span cardinality via route templates, explicitly classifying rate-limit outcomes as errors, and enriching OTEL resources with container metadata.

Changes:

  • Add oteltrace.WithStackTrace(true) to multiple RecordError call sites in internal/server/unified.go.
  • Update HTTP server span attributes to prefer http.route (derived from r.Pattern) instead of url.path, and add a focused unit test for route-template capture.
  • Add container resource detection via resource.WithContainer() during tracer provider initialization.
Show a summary per file
File Description
internal/tracing/provider.go Adds container resource detector during OTEL resource initialization.
internal/tracing/http.go Switches HTTP span attribute from url.path to http.route, using r.Pattern when available.
internal/tracing/provider_test.go Adds coverage to ensure route templates are captured as http.route.
internal/server/unified.go Records stack traces on error spans and marks rate-limit execution spans as errors.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

internal/tracing/provider.go:149

  • resource.New can return a partially populated Resource along with a non-nil error (e.g., one detector fails). With resource.WithContainer() added, detector failures are more likely, and the current code discards all resource attributes by replacing with resource.Empty(). Consider keeping the returned res and only logging the warning (or only falling back to resource.Empty() when res is nil/empty).
	res, err := resource.New(ctx,
		resource.WithTelemetrySDK(),
		resource.WithSchemaURL(semconv.SchemaURL),
		resource.WithContainer(),
		resource.WithAttributes(
			semconv.ServiceName(serviceName),
			semconv.ServiceVersion(version.Get()),
		),
		resource.WithProcessPID(),
		resource.WithHost(),
	)
	if err != nil {
		// Non-fatal: proceed with empty resource
		logTracing.Printf("Warning: failed to create OTEL resource: %v", err)
		res = resource.Empty()
	}
  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment thread internal/tracing/http.go Outdated
Comment on lines +50 to +60
route := r.Pattern
if method, path, ok := strings.Cut(route, " "); ok && strings.EqualFold(method, r.Method) {
route = path
}
if route == "" {
route = r.URL.Path
}

attrs := append([]attribute.KeyValue{
semconv.HTTPRequestMethodKey.String(r.Method),
semconv.URLPathKey.String(r.URL.Path),
semconv.HTTPRouteKey.String(route),
Comment on lines +358 to +371
var foundRoute, foundPath bool
for _, attr := range spans[0].Attributes {
if attr.Key == semconv.HTTPRouteKey {
assert.Equal(t, "/mcp/{serverID}", attr.Value.AsString())
foundRoute = true
}
if attr.Key == semconv.URLPathKey {
assert.Equal(t, "/mcp/github", attr.Value.AsString())
foundPath = true
}
}
assert.True(t, foundRoute, "http.route attribute must be present")
assert.False(t, foundPath, "url.path attribute should not be present when route template exists")
}
Comment thread internal/server/unified.go Outdated
if rateLimited, resetAt := isRateLimitToolResult(backendResult); rateLimited {
cb.RecordRateLimit(resetAt)
execSpan.SetAttributes(tracing.RateLimitHit.Bool(true))
execSpan.SetStatus(codes.Error, "rate limit exceeded")
@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented May 27, 2026

@copilot address review feedback

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 27, 2026

@copilot address review feedback

Addressed the review feedback in commit 058ccce.

  • http.route is now only set when a method-matched route template is available, while url.path is always recorded.
  • Updated tracing tests to assert route-template behavior and fallback behavior when no template exists.
  • Rate-limit responses now also set error status/attributes on the top-level mcp.tool_call span (in addition to backend execution span).

Copilot finished work on behalf of lpcox May 27, 2026 15:46
@lpcox lpcox merged commit 329b508 into main May 27, 2026
22 checks passed
@lpcox lpcox deleted the copilot/go-fan-review-opentelemetry-sdk branch May 27, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[go-fan] Go Module Review: go.opentelemetry.io/otel

3 participants