Problem
Only gate has OpenTelemetry tracing, and even that exports to stdout (not OTLP). The other 13 services have zero distributed tracing capability.
When a request flows through identity -> llm-gateway -> prompts -> meter, there is no way to trace it end-to-end. Debugging cross-service latency or failure requires correlating timestamps across separate log streams.
Proposal
Phase 1: Tracing primitives in service-runtime
// service-runtime/tracing/tracing.go
func InitTracer(serviceName string, opts ...Option) (*sdktrace.TracerProvider, error)
type Option func(*tracerConfig)
func WithOTLPEndpoint(endpoint string) Option // default: OTEL_EXPORTER_OTLP_ENDPOINT env var
func WithStdout() Option // for local dev
Phase 2: Auto-instrumentation middleware
// service-runtime/tracing/middleware.go
func HTTPMiddleware(next http.Handler) http.Handler
// Creates a span per request, propagates trace context via W3C headers,
// records status code, duration, and route pattern as attributes
Phase 3: Outbound propagation
// service-runtime/tracing/transport.go
func NewTracingTransport(base http.RoundTripper) http.RoundTripper
// Injects trace context into outbound HTTP requests (identity client, mTLS calls)
Phase 4: Adopt in all services
- Gate: migrate from stdout exporter to OTLP
- All others: add
tracing.InitTracer() + middleware in main.go
Infrastructure
- OTLP collector (Grafana Alloy, Jaeger, or similar) needed in the cluster
- Can start with Jaeger all-in-one for dev, production Tempo/Jaeger later
Impact
- End-to-end request tracing across the entire platform
- Latency attribution per service hop
- Error correlation across service boundaries
Problem
Only gate has OpenTelemetry tracing, and even that exports to stdout (not OTLP). The other 13 services have zero distributed tracing capability.
When a request flows through identity -> llm-gateway -> prompts -> meter, there is no way to trace it end-to-end. Debugging cross-service latency or failure requires correlating timestamps across separate log streams.
Proposal
Phase 1: Tracing primitives in service-runtime
Phase 2: Auto-instrumentation middleware
Phase 3: Outbound propagation
Phase 4: Adopt in all services
tracing.InitTracer()+ middleware in main.goInfrastructure
Impact