-
Notifications
You must be signed in to change notification settings - Fork 0
Observability
Authoritative rule:
agents/rules/observability.md
· Logging companion:
agents/rules/logging.md
NoETL is a distributed runtime: gateway → server → NATS → worker → tools → server (event log). When something misbehaves in production the only way to find it is the trace. This rule constrains what gets traced / measured and how IDs flow through the system.
Every new feature ships with three artefacts in the same change set:
- A span covering the operation
(
tracing::info_span!("command.claim", ...)). - At least one metric capturing throughput / latency /
failure rate (
noetl_*_total,noetl_*_duration_seconds,noetl_*_lag). -
execution_idcorrelation as a span attribute / metric exemplar.
"We'll add observability later" = production debugging becomes guesswork.
Floods of INFO logs are not observability. Counters / histograms / gauges scale; logs don't.
| Want to know | Use | Not |
|---|---|---|
| Throughput | counter noetl_*_total
|
grep INFO |
| Slow tail | histogram noetl_*_duration_seconds
|
grep slow lines |
| Backlog | gauge noetl_*_lag
|
watch debug logs |
| Per-execution failure | structured event + WARN/ERROR with execution_id
|
spam INFO |
Identifiers that need to exist before the row hits the
database — execution_id, command_id, event_id — are
generated by the application using a snowflake algorithm.
The database's gen_snowflake() stays as a fallback for
out-of-band writes.
Reasons:
- Spans need the id at span-open time, not after the round-trip.
- Retries are idempotent only if the id is stable.
- Cross-component coordination needs the id before publish.
- Test fixtures need deterministic ids.
- Sharded / multi-cluster deployments can't share a DB sequence.
- HTTP requests:
execution_idin path or query param, not just in body. - NATS messages:
execution_idas a header / message attribute. - Tracing spans:
execution_idas a span field. - Metrics:
execution_idis NOT a label (cardinality bomb); it IS a span attribute for exemplar correlation. - Logs: every WARN/ERROR carries
execution_idas a structured field.
Boundary: execution_id rides every wire format. Recipients
missing it should WARN, generate a synthetic id, and continue.
| Component | Endpoint | Metrics |
|---|---|---|
| Rust worker |
/metrics on each pod |
noetl_worker_commands_*, noetl_worker_tool_dispatch_*, noetl_worker_nats_consumer_*_pending, noetl_worker_result_store_put_*, noetl_worker_concurrent_dispatches
|
| Python server | /metrics |
noetl_server_* (HTTP latency per route, event-log INSERT rate, command publish rate) |
| Projector | /metrics |
noetl_projector_inserted_total, noetl_projector_batch_size, noetl_projector_lag_seconds
|
| NATS supercluster | :8222/jsz |
Stream + consumer lag (KEDA scaler reads these) |
-
Deployment Validation — every change
validates
/metricsreachability on kind. - noetl-server wiki: https://github.com/noetl/server/wiki — per-component metric details.
- noetl-worker wiki: https://github.com/noetl/worker/wiki — worker observability harness (PR-2e).
- Home — overview
- Repo Map
- Releases
- Sessions Log
- Secrets Wallet (#61) — SECURITY (design)
- Rust Server Port (#49) — PRIMARY
- Decoupled Context + Event Chain (#115) — RFC (design), reframes #101
- Orchestrator Scaling (#101) — reframed by #115; consume side = #115 Phase 1
- Event WAL + Derivable Storage (#104) — Round 01 (locator) PR open
- WASM Plug-in Compilation (#105) — system-pool plug-in hot-reload (ADR Phase 4)
- System Pool Design (#46) — PRIMARY
- Regression Baseline Migration (#98) — e2e
- Subscription / Listener Tool (#90) — RFC
- Container Tool Callback (#43)
- Rust Worker Parity Gaps (#47 · #48)
- Event Envelope Reconciliation (#51 in TaskList)
- Cursor Loop Mode (#100) — server v3.8.0 + tools v3.10.1, 2026-06-15
- Transfer Tool Credentials (#99) — tools v3.10.0 + worker v5.22.0, 2026-06-14
- Explicit Input Binding (#77) — v3.0.0 shipped 2026-06-09
- Rust Worker Migration (#30)
- Python Services → Rust (#45)
- Issue Tracking
- Wiki Convention
- Handoffs
- Deployment Validation
- Execution Model
- Data Access Boundary
- Observability
- noetl/noetl wiki — app + DSL
- noetl/server wiki — Rust control plane
- noetl/worker wiki — Rust pull worker
- noetl/tools wiki — tool registry crate
- noetl/cli wiki — CLI + local mode
- noetl/gateway wiki — gatekeeper
- noetl/ops wiki — Helm + manifests
- noetl/travel wiki — domain SPA reference
- Docs site — engineer-facing architecture