Skip to content

OTel Phase 2 — Config resolution + CLI flags + runner wiring #103

@initializ-mk

Description

@initializ-mk

Part of the Observability — OpenTelemetry Tracing v1 initiative (master tracking: #108). Effort: S (2–3 engineer-days). Risk: medium (resolution precedence, gate logic, shutdown ordering). Depends on: Phase 1 (#102).

Goal

Resolve tracing config (CLI > env > forge.yaml > default), enforce the enable+endpoint gate, install the provider via the Phase 0 seam, register shutdown flush so spans land before the agent exits.

Files

File Change
forge-cli config types (alongside cors_origins) Add observability.tracing block to the forge.yaml struct
forge-cli runner wiring (runner.go) Resolve config, build egress transport, call observability.NewTracerProvider, runtime.SetTracerProvider, store provider for shutdown
forge-cli/cmd/run.go Add CLI flags; pass graceful-shutdown flush into existing --shutdown-timeout path

forge.yaml block

observability:
  tracing:
    enabled: false
    endpoint: ""                    # e.g. https://otel.initializ.ai:4318
    protocol: http/protobuf         # http/protobuf | grpc
    sampler: parentbased_always_on
    sampler_ratio: 1.0
    headers: {}                     # prefer OTEL_EXPORTER_OTLP_HEADERS for secrets
    timeout: 10s
    redact: true
    capture_content: false          # enterprise opt-in (inert in v1)

Resolution order (highest first), mirroring cors_origins

  1. CLI flags: --tracing (bool→enabled), --tracing-endpoint, --tracing-protocol, --tracing-sampler
  2. Env: FORGE_TRACING_ENABLED; standard OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_PROTOCOL, OTEL_EXPORTER_OTLP_HEADERS, OTEL_SERVICE_NAME, OTEL_TRACES_SAMPLER, OTEL_TRACES_SAMPLER_ARG, OTEL_RESOURCE_ATTRIBUTES
  3. observability.tracing in forge.yaml
  4. Defaults

Gate (exact behavior)

active = cfg.Enabled && resolvedEndpoint != ""
if cfg.Enabled && resolvedEndpoint == "":
    log.Warn("tracing enabled but no OTLP endpoint resolved; tracing disabled")
    -> leave runtime on noop, do not construct provider
if !active:
    -> noop (Phase 0 default), no provider constructed
if active:
    -> construct provider, SetTracerProvider, register shutdown flush

Standard OTEL_* env may supply endpoint/protocol/etc., but enabled (flag / FORGE_TRACING_ENABLED / yaml) is still the gate. Document this in forge run --help.

Shutdown

On graceful shutdown, call provider.Shutdown(ctx) bounded by --shutdown-timeout so the batch processor flushes. Short-lived and scheduled runs must flush before exit or spans are lost.

Verify

go build ./...

# disabled by default:
forge run --port 8099 &
# logs must NOT mention an exporter; noop active

# enabled + endpoint (run a local collector or use otel-tui / jaeger all-in-one OTLP):
docker run --rm -p 4318:4318 -p 16686:16686 jaegertracing/all-in-one:latest &
FORGE_TRACING_ENABLED=true OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
  forge run --port 8098

# enabled but NO endpoint: confirm warning + agent still starts on noop:
forge run --tracing --port 8097
# expect WARN, agent healthy

Anti-patterns to avoid

  • Auto-enabling when only an endpoint is set (enable flag is mandatory).
  • Panicking on missing endpoint (always log a warning and install the no-op tracer — never crash the agent over telemetry config).
  • Putting OTLP header secrets in forge.yaml examples (use OTEL_EXPORTER_OTLP_HEADERS env var instead).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions