feat(observability): per-call OpenTelemetry spans (closes #9 — final FleetQ ask) by escapeboy · Pull Request #16 · escapeboy/boruna

escapeboy · 2026-04-25T16:24:32Z

Sprint 0.4-S5 — closes #9 (FleetQ P2, the LAST ask)

Final FleetQ implementer-feedback ask. Closes the letter completely — ALL 9 P1/P2 asks shipped (#3, #5, #6, #7, #8, #9 + the two P0s in v0.2.0). 6 sprints in a row driven by FleetQ signal.

Surface

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4318
export OTEL_SERVICE_NAME=my-agent # optional, defaults to "boruna"
boruna run app.ax --policy allow-all --live
```

Each capability call emits a `boruna.cap` span:

```
boruna.cap {cap.name="net.fetch", bytes_in=247, bytes_out=1834, cap.budget_remaining=42} 142ms
boruna.cap {cap.name="llm.call", bytes_in=890, bytes_out=2103, cap.budget_remaining=8} 7.4s
boruna.cap {cap.name="fs.read", bytes_in=64, bytes_out=12834} 8ms
```

What's in this PR

Always-on tracing (zero-cost when no subscriber)

`tracing = "0.1"` is now a non-optional dep on `boruna-vm`. `CapabilityGateway::call` wraps the call body in a `tracing::info_span!` named `boruna.cap`. When no subscriber is installed (the default), span macros are essentially no-ops — single atomic check, a few stack-allocated structs, no allocations.

Span attributes:

Field	Type	Notes
`cap.name`	string	e.g. `"net.fetch"`
`bytes_in`	u64	sum of UTF-8 string content reachable from args (recurses through containers)
`bytes_out`	u64	UTF-8 string content of result (recurses through Record/Enum/etc)
`cap.budget_remaining`	u64	post-call quota when a budget rule applies
`error.kind`	string	only on failure: `denied` / `budget_exceeded` / `runtime_error`

`telemetry` Cargo feature

Adds `opentelemetry 0.27` + `opentelemetry_sdk 0.27` (rt-tokio) + `opentelemetry-otlp 0.27` (http-proto + reqwest-client, default-features off) + `tracing-opentelemetry 0.28` + `tracing-subscriber 0.3` + `tokio` (workspace, optional).

New module `boruna_vm::telemetry` with `init() -> TelemetryHandle`:

Endpoint env var unset → `Disabled` no-op handle (Boruna behaves identically to a non-telemetry build).
Endpoint set → installs OTLP-over-HTTP exporter into the global tracing subscriber registry.
`TelemetryHandle::Drop` calls `force_flush()`.

CLI integration

New `telemetry` feature on `boruna-cli`. When built with `--features telemetry`, `main` starts a tokio runtime, calls `init_telemetry()` BEFORE parsing CLI args, holds the handle for the binary lifetime, and on shutdown drops the handle THEN drains the runtime with a 5-second timeout (so in-flight OTel HTTP POSTs complete instead of being killed by `process::exit`).

Determinism contract (per ADR 001)

Spans are operational metadata only. Their content (durations, byte counts) is never fed into an `EventLog`, `AuditLog`, or `EvidenceBundle`. A replayed run produces identical replay state but may produce different span durations on a faster/slower host — by design. Documented in `CapabilityGateway::call` doc comment + `boruna_vm::telemetry` module doc.

Tests

4 new VM tests using a per-test scoped tracing subscriber with proper Id-keyed span matching:
- `test_capability_call_emits_boruna_cap_span_with_attributes`
- `test_capability_call_records_bytes_out_for_string_returning_handler`
- `test_capability_call_records_error_kind_denied`
- `test_capability_call_records_error_kind_budget_exceeded`
1 telemetry-module test (Disabled handle Drop is a clean no-op)
All 591+ existing workspace tests pass
`cargo clippy --workspace -- -D warnings` clean (with and without `--features boruna-vm/telemetry`)
`cargo fmt --all -- --check` clean

Review

`ce-correctness-reviewer` surfaced 5 HIGH findings. All addressed before commit:

#	Finding	Fix
1	Env-var mutation in test was UB risk under cargo's parallel runner (POSIX setenv/getenv data race)	REMOVED the test rather than introduce flakes; documented why in the test module
2	Test matcher was logically broken — `bytes_out` captured but never asserted	REWROTE harness with proper Id-keyed span matching + explicit `bytes_out` assertion
3	`cap.budget_remaining=0` ambiguous (last-allowed vs rejected)	DOCUMENTED post-call semantics; integrators join `(cap.budget_remaining, error.kind)` to disambiguate
4	`process::exit(1)` killed in-flight OTel batches	ADDED `runtime.shutdown_timeout(5s)` before exit
5	`approx_value_bytes` ignored Record/Enum payloads — `bytes_out` structurally 0 for the dominant capability shape	RECURSED into `Record.fields` and `Enum.payload`

Documented limitations

Calling `init()` twice silently overwrites the propagator (CLI is the single owner of the global subscriber)
Empty-string `OTEL_EXPORTER_OTLP_ENDPOINT` silently disables (same behavior as unset)
Boruna args are intentionally NOT in span attributes (privacy + size; integrators wanting this can write a custom subscriber)

Closes

Closes [P2] Per-call observability hooks (OpenTelemetry) #9 (FleetQ P2: per-call OpenTelemetry observability)

FleetQ feedback letter status: COMPLETE

Ask	Status	PR
#3 versioned `capability_set_hash` (P1)	✅ Closed	#10
#4 streaming output from `boruna_run` (P1)	⏸ Not addressed (deferred to 0.4.0)	—
#5 structured resource limits (P1)	✅ Closed	#13
#6 stable MCP response schemas (P1)	✅ Closed	#11
#7 record/replay for net.fetch (P2)	✅ Closed	#15
#8 output schema validation gate (P2)	✅ Closed	#14
#9 OTel observability (P2)	✅ Closed	this PR
Plus 2 P0s in v0.2.0 (fine-grained policy, multi-target binaries)	✅ Shipped 2026-04-25	—

Note: #4 (streaming output) was tracked in the original feedback but not in the immediate sprint queue — defer to 0.4.0 alongside other operations work.

Next sprint pivots to `0.3-S2` (persistent state) — the critical-path 0.3.0 work that ADR PR #12 unblocks.

🤖 Generated with Claude Code

Sprint 0.4-S5. The LAST FleetQ ask. Closes the implementer feedback letter completely (6 sprints in a row, 9/9 P1+P2 asks shipped). ## What changed ### Always-on tracing (zero-cost when no subscriber) - New non-optional `tracing = "0.1"` dep on `boruna-vm`. - `CapabilityGateway::call` now wraps the call body in a `tracing::info_span!` named `boruna.cap` with attributes: cap.name (str), bytes_in (u64), bytes_out (u64), cap.budget_remaining (u64, post-call quota), error.kind (str: "denied" / "budget_exceeded" / "runtime_error", set only on the failure path) - When no subscriber is installed (default), span macros expand to essentially no-ops (single atomic check + a few stack structs). - `approx_value_bytes` recurses through every container variant (List, Map, Record, Enum, Some/Ok/Err) so bytes_out for record- returning capabilities (db.query, llm.call) is meaningful — not structurally zero. ### `telemetry` Cargo feature - Adds `opentelemetry 0.27` + `opentelemetry_sdk 0.27` (rt-tokio) + `opentelemetry-otlp 0.27` (http-proto + reqwest-client, default- features off) + `tracing-opentelemetry 0.28` + `tracing-subscriber 0.3` + `tokio` (workspace, optional). - New module `boruna_vm::telemetry` with `init() -> TelemetryHandle`. - Reads `OTEL_EXPORTER_OTLP_ENDPOINT` (the OTel standard env var) and optional `OTEL_SERVICE_NAME` (defaults to "boruna"). - Endpoint unset → returns Disabled no-op handle (Boruna behaves identically to a non-telemetry build). - Endpoint set → installs OTLP-over-HTTP exporter into the global tracing subscriber registry. - `TelemetryHandle::Drop` calls `force_flush()` on the provider. ### CLI integration - New `telemetry` feature on `boruna-cli` that depends on `boruna-vm/telemetry` and `tokio`. - When built with `--features telemetry`, `main` starts a tokio runtime, enters its context, calls `init_telemetry()` BEFORE parsing CLI args, holds the handle for the binary lifetime. - On shutdown: drops the handle (queues flush), then drops the runtime guard, then calls `runtime.shutdown_timeout(5s)` so in- flight OTel HTTP POSTs get a chance to complete instead of being killed by `process::exit`. ## Tests - 4 new VM tests (`test_capability_call_emits_boruna_cap_span_with_attributes`, `..._records_bytes_out_for_string_returning_handler`, `..._records_error_kind_denied`, `..._records_error_kind_budget_exceeded`) using a per-test scoped tracing subscriber that captures spans by Id (proper matching, not best-effort). - 1 telemetry-module test (Disabled handle Drop is a clean no-op). - All 591+ existing workspace tests pass. - `cargo clippy --workspace -- -D warnings` clean (with and without --features boruna-vm/telemetry). - `cargo fmt --all -- --check` clean. ## Review ce-correctness-reviewer surfaced 5 HIGH findings. All addressed before commit: 1. Env-var mutation in test was UB risk under cargo's parallel test runner (POSIX setenv/getenv data race). REMOVED the test rather than introduce flakes; documented why in the test module. 2. Test matcher was logically broken — `bytes_out` was captured but never asserted. REWROTE the test harness with proper Id-keyed span matching (HashMap<u64, CapturedSpan>) and added an explicit `bytes_out` assertion in the new test_records_bytes_out_for_string_returning_handler test. 3. cap.budget_remaining=0 was ambiguous (last-allowed call vs rejected). DOCUMENTED the post-call semantics in the call() doc comment; integrators must join (cap.budget_remaining, error.kind) to disambiguate. 4. process::exit(1) in CLI killed in-flight OTel batches. ADDED explicit runtime.shutdown_timeout(5s) before exit. 5. approx_value_bytes ignored Record/Enum payloads — bytes_out was structurally 0 for the dominant capability shape (db.query, llm.call). RECURSED into Record.fields and Enum.payload. ## Documented limitations - Calling init() twice silently overwrites the propagator (CLI is the single owner of the global subscriber). - Empty-string OTEL_EXPORTER_OTLP_ENDPOINT silently disables (matches the "absent → off" semantics; a Docker compose with `=` typo gets the same behavior as no env var). ## Design `docs/design-otel.md` — span shape, attribute table, library-version pin set, determinism contract, BYO-subscriber fallback path. ## Closes - Closes #9 (FleetQ P2: per-call OpenTelemetry observability) ## FleetQ status after this PR **ALL 9 P1/P2 asks closed** (#3, #5, #6, #7, #8, #9 + the two P0s shipped in v0.2.0). The FleetQ implementer feedback letter is fully addressed. Next sprint pivots to 0.3-S2 (persistent state) — the critical-path 0.3.0 work the ADR (PR #12) unblocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the 5 HIGH review findings and how each was addressed. Notable: the env-var-mutating test was UB risk under cargo's parallel runner (POSIX setenv/getenv data race per Rust 2024) — removed and documented. Also captures the closing arc: 6 sprints in a row driven by FleetQ implementer feedback. 9/9 P1+P2 asks shipped + 2 P0s already in v0.2.0. Next pivot: 0.3-S2 (persistent workflow state) once the 7 queued PRs settle and the persistence ADR (PR #12) merges. Establishes new project conventions: - Pre-flight dep probes for unfamiliar version-pin sets (5 min of /tmp/probe saves hours of in-tree archaeology). - For library-mediated infrastructure requiring tokio: handle Drop before runtime shutdown_timeout before process::exit. Drop alone doesn't drain in-flight async work that process::exit kills. - For test harnesses recording side-effects, prefer Id-keyed dispatch over best-effort matchers — the cost is a HashMap; the payoff is correctness in multi-event tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

escapeboy and others added 2 commits April 25, 2026 19:23

escapeboy merged commit 0bc58b6 into master Apr 25, 2026
1 of 3 checks passed

escapeboy deleted the feat/0.4-s5-otel-observability branch April 25, 2026 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): per-call OpenTelemetry spans (closes #9 — final FleetQ ask)#16

feat(observability): per-call OpenTelemetry spans (closes #9 — final FleetQ ask)#16
escapeboy merged 2 commits intomasterfrom
feat/0.4-s5-otel-observability

escapeboy commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

escapeboy commented Apr 25, 2026

Sprint 0.4-S5 — closes #9 (FleetQ P2, the LAST ask)

Surface

What's in this PR

Always-on tracing (zero-cost when no subscriber)

`telemetry` Cargo feature

CLI integration

Determinism contract (per ADR 001)

Tests

Review

Documented limitations

Closes

FleetQ feedback letter status: COMPLETE

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant