Improve OTel trace signal quality: stack traces, route cardinality, rate-limit status, and container resource metadata#6588
Merged
Conversation
6 tasks
Copilot
AI
changed the title
[WIP] Review OpenTelemetry Go SDK integration
Improve OTel trace signal quality: stack traces, route cardinality, rate-limit status, and container resource metadata
May 27, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves OpenTelemetry trace signal quality across the gateway by adding stack traces to recorded errors, reducing HTTP span cardinality via route templates, explicitly classifying rate-limit outcomes as errors, and enriching OTEL resources with container metadata.
Changes:
- Add
oteltrace.WithStackTrace(true)to multipleRecordErrorcall sites ininternal/server/unified.go. - Update HTTP server span attributes to prefer
http.route(derived fromr.Pattern) instead ofurl.path, and add a focused unit test for route-template capture. - Add container resource detection via
resource.WithContainer()during tracer provider initialization.
Show a summary per file
| File | Description |
|---|---|
| internal/tracing/provider.go | Adds container resource detector during OTEL resource initialization. |
| internal/tracing/http.go | Switches HTTP span attribute from url.path to http.route, using r.Pattern when available. |
| internal/tracing/provider_test.go | Adds coverage to ensure route templates are captured as http.route. |
| internal/server/unified.go | Records stack traces on error spans and marks rate-limit execution spans as errors. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
internal/tracing/provider.go:149
resource.Newcan return a partially populated Resource along with a non-nil error (e.g., one detector fails). Withresource.WithContainer()added, detector failures are more likely, and the current code discards all resource attributes by replacing withresource.Empty(). Consider keeping the returnedresand only logging the warning (or only falling back toresource.Empty()whenresis nil/empty).
res, err := resource.New(ctx,
resource.WithTelemetrySDK(),
resource.WithSchemaURL(semconv.SchemaURL),
resource.WithContainer(),
resource.WithAttributes(
semconv.ServiceName(serviceName),
semconv.ServiceVersion(version.Get()),
),
resource.WithProcessPID(),
resource.WithHost(),
)
if err != nil {
// Non-fatal: proceed with empty resource
logTracing.Printf("Warning: failed to create OTEL resource: %v", err)
res = resource.Empty()
}
- Files reviewed: 4/4 changed files
- Comments generated: 3
Comment on lines
+50
to
+60
| route := r.Pattern | ||
| if method, path, ok := strings.Cut(route, " "); ok && strings.EqualFold(method, r.Method) { | ||
| route = path | ||
| } | ||
| if route == "" { | ||
| route = r.URL.Path | ||
| } | ||
|
|
||
| attrs := append([]attribute.KeyValue{ | ||
| semconv.HTTPRequestMethodKey.String(r.Method), | ||
| semconv.URLPathKey.String(r.URL.Path), | ||
| semconv.HTTPRouteKey.String(route), |
Comment on lines
+358
to
+371
| var foundRoute, foundPath bool | ||
| for _, attr := range spans[0].Attributes { | ||
| if attr.Key == semconv.HTTPRouteKey { | ||
| assert.Equal(t, "/mcp/{serverID}", attr.Value.AsString()) | ||
| foundRoute = true | ||
| } | ||
| if attr.Key == semconv.URLPathKey { | ||
| assert.Equal(t, "/mcp/github", attr.Value.AsString()) | ||
| foundPath = true | ||
| } | ||
| } | ||
| assert.True(t, foundRoute, "http.route attribute must be present") | ||
| assert.False(t, foundPath, "url.path attribute should not be present when route template exists") | ||
| } |
| if rateLimited, resetAt := isRateLimitToolResult(backendResult); rateLimited { | ||
| cb.RecordRateLimit(resetAt) | ||
| execSpan.SetAttributes(tracing.RateLimitHit.Bool(true)) | ||
| execSpan.SetStatus(codes.Error, "rate limit exceeded") |
Collaborator
|
@copilot address review feedback |
Contributor
Author
Addressed the review feedback in commit
|
This was referenced May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses observability gaps in the current OpenTelemetry integration: error spans were missing stack traces, HTTP server spans used high-cardinality paths, rate-limit failures were not consistently marked as errors, and container metadata was not attached to resources.
Error spans now include stack traces
RecordErrorcall sites ininternal/server/unified.goto passoteltrace.WithStackTrace(true)for tool denial, tool-call limit, DIFC deny, circuit-breaker open, and backend execution failure paths.Rate-limit outcomes are explicitly error-classified
tracing.RateLimitHit=truespan status = codes.Errorwith"rate limit exceeded"gateway.backend.executeand the enclosingmcp.tool_callspan.HTTP server spans now use semantic-convention route/path attributes
internal/tracing/http.gonow always recordssemconv.URLPathKeyfromr.URL.Path.semconv.HTTPRouteKeyis set only whenr.Patternis available (normalized to strip method prefix and method-matched), and is omitted otherwise.OTel resource now includes container detector
resource.WithContainer()ininternal/tracing/provider.goso container attributes are emitted automatically when available.Focused coverage for route attribute behavior
TestWrapHTTPHandler_UsesHTTPRouteWhenPatternAvailableininternal/tracing/provider_test.goto verifyhttp.routetemplate capture while keepingurl.path.TestWrapHTTPHandler_UsesURLPathWhenPatternUnavailableto verify fallback behavior when no route template is available.