Skip to content

Deploy OpenTelemetry tracing across all services #46

@haasonsaas

Description

@haasonsaas

Problem

Only gate has OpenTelemetry tracing, and even that exports to stdout (not OTLP). The other 13 services have zero distributed tracing capability.

When a request flows through identity -> llm-gateway -> prompts -> meter, there is no way to trace it end-to-end. Debugging cross-service latency or failure requires correlating timestamps across separate log streams.

Proposal

Phase 1: Tracing primitives in service-runtime

// service-runtime/tracing/tracing.go
func InitTracer(serviceName string, opts ...Option) (*sdktrace.TracerProvider, error)

type Option func(*tracerConfig)
func WithOTLPEndpoint(endpoint string) Option  // default: OTEL_EXPORTER_OTLP_ENDPOINT env var
func WithStdout() Option                       // for local dev

Phase 2: Auto-instrumentation middleware

// service-runtime/tracing/middleware.go
func HTTPMiddleware(next http.Handler) http.Handler
// Creates a span per request, propagates trace context via W3C headers,
// records status code, duration, and route pattern as attributes

Phase 3: Outbound propagation

// service-runtime/tracing/transport.go
func NewTracingTransport(base http.RoundTripper) http.RoundTripper
// Injects trace context into outbound HTTP requests (identity client, mTLS calls)

Phase 4: Adopt in all services

  • Gate: migrate from stdout exporter to OTLP
  • All others: add tracing.InitTracer() + middleware in main.go

Infrastructure

  • OTLP collector (Grafana Alloy, Jaeger, or similar) needed in the cluster
  • Can start with Jaeger all-in-one for dev, production Tempo/Jaeger later

Impact

  • End-to-end request tracing across the entire platform
  • Latency attribution per service hop
  • Error correlation across service boundaries

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions