Part of the Observability — OpenTelemetry Tracing v1 initiative (master tracking: #108). Effort: S (2–3 engineer-days). Risk: medium (resolution precedence, gate logic, shutdown ordering). Depends on: Phase 1 (#102).
Goal
Resolve tracing config (CLI > env > forge.yaml > default), enforce the enable+endpoint gate, install the provider via the Phase 0 seam, register shutdown flush so spans land before the agent exits.
Files
| File |
Change |
forge-cli config types (alongside cors_origins) |
Add observability.tracing block to the forge.yaml struct |
forge-cli runner wiring (runner.go) |
Resolve config, build egress transport, call observability.NewTracerProvider, runtime.SetTracerProvider, store provider for shutdown |
forge-cli/cmd/run.go |
Add CLI flags; pass graceful-shutdown flush into existing --shutdown-timeout path |
forge.yaml block
observability:
tracing:
enabled: false
endpoint: "" # e.g. https://otel.initializ.ai:4318
protocol: http/protobuf # http/protobuf | grpc
sampler: parentbased_always_on
sampler_ratio: 1.0
headers: {} # prefer OTEL_EXPORTER_OTLP_HEADERS for secrets
timeout: 10s
redact: true
capture_content: false # enterprise opt-in (inert in v1)
Resolution order (highest first), mirroring cors_origins
- CLI flags:
--tracing (bool→enabled), --tracing-endpoint, --tracing-protocol, --tracing-sampler
- Env:
FORGE_TRACING_ENABLED; standard OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_PROTOCOL, OTEL_EXPORTER_OTLP_HEADERS, OTEL_SERVICE_NAME, OTEL_TRACES_SAMPLER, OTEL_TRACES_SAMPLER_ARG, OTEL_RESOURCE_ATTRIBUTES
observability.tracing in forge.yaml
- Defaults
Gate (exact behavior)
active = cfg.Enabled && resolvedEndpoint != ""
if cfg.Enabled && resolvedEndpoint == "":
log.Warn("tracing enabled but no OTLP endpoint resolved; tracing disabled")
-> leave runtime on noop, do not construct provider
if !active:
-> noop (Phase 0 default), no provider constructed
if active:
-> construct provider, SetTracerProvider, register shutdown flush
Standard OTEL_* env may supply endpoint/protocol/etc., but enabled (flag / FORGE_TRACING_ENABLED / yaml) is still the gate. Document this in forge run --help.
Shutdown
On graceful shutdown, call provider.Shutdown(ctx) bounded by --shutdown-timeout so the batch processor flushes. Short-lived and scheduled runs must flush before exit or spans are lost.
Verify
go build ./...
# disabled by default:
forge run --port 8099 &
# logs must NOT mention an exporter; noop active
# enabled + endpoint (run a local collector or use otel-tui / jaeger all-in-one OTLP):
docker run --rm -p 4318:4318 -p 16686:16686 jaegertracing/all-in-one:latest &
FORGE_TRACING_ENABLED=true OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
forge run --port 8098
# enabled but NO endpoint: confirm warning + agent still starts on noop:
forge run --tracing --port 8097
# expect WARN, agent healthy
Anti-patterns to avoid
- Auto-enabling when only an endpoint is set (enable flag is mandatory).
- Panicking on missing endpoint (always log a warning and install the no-op tracer — never crash the agent over telemetry config).
- Putting OTLP header secrets in
forge.yaml examples (use OTEL_EXPORTER_OTLP_HEADERS env var instead).
Goal
Resolve tracing config (CLI > env >
forge.yaml> default), enforce the enable+endpoint gate, install the provider via the Phase 0 seam, register shutdown flush so spans land before the agent exits.Files
cors_origins)observability.tracingblock to theforge.yamlstructforge-clirunner wiring (runner.go)observability.NewTracerProvider,runtime.SetTracerProvider, store provider for shutdownforge-cli/cmd/run.go--shutdown-timeoutpathforge.yamlblockResolution order (highest first), mirroring
cors_origins--tracing(bool→enabled),--tracing-endpoint,--tracing-protocol,--tracing-samplerFORGE_TRACING_ENABLED; standardOTEL_EXPORTER_OTLP_ENDPOINT,OTEL_EXPORTER_OTLP_PROTOCOL,OTEL_EXPORTER_OTLP_HEADERS,OTEL_SERVICE_NAME,OTEL_TRACES_SAMPLER,OTEL_TRACES_SAMPLER_ARG,OTEL_RESOURCE_ATTRIBUTESobservability.tracingin forge.yamlGate (exact behavior)
Standard
OTEL_*env may supply endpoint/protocol/etc., butenabled(flag /FORGE_TRACING_ENABLED/ yaml) is still the gate. Document this inforge run --help.Shutdown
On graceful shutdown, call
provider.Shutdown(ctx)bounded by--shutdown-timeoutso the batch processor flushes. Short-lived and scheduled runs must flush before exit or spans are lost.Verify
Anti-patterns to avoid
forge.yamlexamples (useOTEL_EXPORTER_OTLP_HEADERSenv var instead).