Add OTel tracing to the deploy/build path#602
Conversation
Instruments the full deploy flow so we can see a connected trace from `miren deploy` CLI through to "new version activated" in Jaeger/etc. CLI sets up tracing (gated on OTEL_EXPORTER_OTLP_ENDPOINT) and creates a root span with phase children (create_deployment, upload, finalize). The RPC framework propagates context automatically to the server, where build.go adds spans for each major phase (receive_tar, setup, buildkit, locate_artifact, create_version, activate). BuildKit client gets the global TracerProvider so gRPC spans connect into the trace, and OTEL env vars are forwarded to the buildkitd container for daemon-internal spans. Launcher controller gets standalone spans with app/version attributes for correlation. SetupTracing now respects caller-provided ServiceName so the CLI can identify as "miren-cli" instead of the default "miren". All spans are no-ops when OTEL_EXPORTER_OTLP_ENDPOINT is unset.
📝 WalkthroughWalkthroughOpenTelemetry tracing was added across multiple components. The CLI deploy flow gained spans for deploy, create_deployment, upload, and finalize and a package tracer. BuildKit client creation now wires a tracer provider and forwards OTEL environment variables to the daemon. The deployment launcher starts a reconcile span with app attributes and records errors/status. The build server wraps build phases (receive_tar, setup, buildkit, build, locate_artifact, create_version, activate) in spans. Shared tracing setup now only injects a default service name when one is absent. No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Comment |
| } | ||
|
|
||
| // Set up OTel tracing if an OTLP endpoint is configured | ||
| if os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") != "" { |
There was a problem hiding this comment.
I wonder if we should let the server expose it's version of this value back to the CLI rather than the CLI having to have it's own version. Fine for now though!
There was a problem hiding this comment.
Gooood point let me think about whether there's a way to sneak that in
There was a problem hiding this comment.
Filed MIR-707 to explore a proxy exporter approach — CLI would ship its spans through the existing RPC connection to the server, which re-exports them to the collector. That way the CLI gets tracing without needing its own collector config. Keeping the CLI span code as-is for now since it'll serve as the scaffold for that.
There was a problem hiding this comment.
ah! yeah, that's definitely a new task. sounds good!
- Use returned span context (not _) in locate_artifact, create_version, and activate spans so child calls appear under the correct parent - Forward OTEL_SERVICE_NAME to buildkitd container so daemon spans are identifiable in traces
Building on the OTel infrastructure work, this lights up tracing for
the deploy path. With a collector running,
miren deployproduces anice end-to-end trace showing how time breaks down across CLI, RPC,
server-side build phases, and BuildKit internals.
The CLI acts as the trace root and sets up tracing when
OTEL_EXPORTER_OTLP_ENDPOINTis set. The RPC framework propagatescontext to the server automatically from there. On the server side,
BuildFromTargets spans for each phase: tar receive, setup/stackdetection, buildkit build, artifact lookup, version creation, and
activation.
BuildKit gets two layers of tracing — the client gets the global
TracerProvider for gRPC spans, and the daemon container gets OTEL env
vars forwarded so it can export its own internal per-step spans
(dockerfile parse, layer pulls, RUN steps, etc).
The launcher controller gets standalone
launcher.reconcilespans withapp/version attributes. These aren't connected to the deploy trace yet
(would need a data model change to thread context through), but the
attributes make correlation easy enough for now.
Everything is a no-op when
OTEL_EXPORTER_OTLP_ENDPOINTisn't set.