Skip to content

feat(observability): per-call OpenTelemetry spans (closes #9 — final FleetQ ask)#16

Merged
escapeboy merged 2 commits intomasterfrom
feat/0.4-s5-otel-observability
Apr 25, 2026
Merged

feat(observability): per-call OpenTelemetry spans (closes #9 — final FleetQ ask)#16
escapeboy merged 2 commits intomasterfrom
feat/0.4-s5-otel-observability

Conversation

@escapeboy
Copy link
Copy Markdown
Owner

Sprint 0.4-S5 — closes #9 (FleetQ P2, the LAST ask)

Final FleetQ implementer-feedback ask. Closes the letter completely — ALL 9 P1/P2 asks shipped (#3, #5, #6, #7, #8, #9 + the two P0s in v0.2.0). 6 sprints in a row driven by FleetQ signal.

Surface

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector.internal:4318
export OTEL_SERVICE_NAME=my-agent # optional, defaults to "boruna"
boruna run app.ax --policy allow-all --live
```

Each capability call emits a `boruna.cap` span:

```
boruna.cap {cap.name="net.fetch", bytes_in=247, bytes_out=1834, cap.budget_remaining=42} 142ms
boruna.cap {cap.name="llm.call", bytes_in=890, bytes_out=2103, cap.budget_remaining=8} 7.4s
boruna.cap {cap.name="fs.read", bytes_in=64, bytes_out=12834} 8ms
```

What's in this PR

Always-on tracing (zero-cost when no subscriber)

`tracing = "0.1"` is now a non-optional dep on `boruna-vm`. `CapabilityGateway::call` wraps the call body in a `tracing::info_span!` named `boruna.cap`. When no subscriber is installed (the default), span macros are essentially no-ops — single atomic check, a few stack-allocated structs, no allocations.

Span attributes:

Field Type Notes
`cap.name` string e.g. `"net.fetch"`
`bytes_in` u64 sum of UTF-8 string content reachable from args (recurses through containers)
`bytes_out` u64 UTF-8 string content of result (recurses through Record/Enum/etc)
`cap.budget_remaining` u64 post-call quota when a budget rule applies
`error.kind` string only on failure: `denied` / `budget_exceeded` / `runtime_error`

`telemetry` Cargo feature

Adds `opentelemetry 0.27` + `opentelemetry_sdk 0.27` (rt-tokio) + `opentelemetry-otlp 0.27` (http-proto + reqwest-client, default-features off) + `tracing-opentelemetry 0.28` + `tracing-subscriber 0.3` + `tokio` (workspace, optional).

New module `boruna_vm::telemetry` with `init() -> TelemetryHandle`:

  • Endpoint env var unset → `Disabled` no-op handle (Boruna behaves identically to a non-telemetry build).
  • Endpoint set → installs OTLP-over-HTTP exporter into the global tracing subscriber registry.
  • `TelemetryHandle::Drop` calls `force_flush()`.

CLI integration

New `telemetry` feature on `boruna-cli`. When built with `--features telemetry`, `main` starts a tokio runtime, calls `init_telemetry()` BEFORE parsing CLI args, holds the handle for the binary lifetime, and on shutdown drops the handle THEN drains the runtime with a 5-second timeout (so in-flight OTel HTTP POSTs complete instead of being killed by `process::exit`).

Determinism contract (per ADR 001)

Spans are operational metadata only. Their content (durations, byte counts) is never fed into an `EventLog`, `AuditLog`, or `EvidenceBundle`. A replayed run produces identical replay state but may produce different span durations on a faster/slower host — by design. Documented in `CapabilityGateway::call` doc comment + `boruna_vm::telemetry` module doc.

Tests

  • 4 new VM tests using a per-test scoped tracing subscriber with proper Id-keyed span matching:
    • `test_capability_call_emits_boruna_cap_span_with_attributes`
    • `test_capability_call_records_bytes_out_for_string_returning_handler`
    • `test_capability_call_records_error_kind_denied`
    • `test_capability_call_records_error_kind_budget_exceeded`
  • 1 telemetry-module test (Disabled handle Drop is a clean no-op)
  • All 591+ existing workspace tests pass
  • `cargo clippy --workspace -- -D warnings` clean (with and without `--features boruna-vm/telemetry`)
  • `cargo fmt --all -- --check` clean

Review

`ce-correctness-reviewer` surfaced 5 HIGH findings. All addressed before commit:

# Finding Fix
1 Env-var mutation in test was UB risk under cargo's parallel runner (POSIX setenv/getenv data race) REMOVED the test rather than introduce flakes; documented why in the test module
2 Test matcher was logically broken — `bytes_out` captured but never asserted REWROTE harness with proper Id-keyed span matching + explicit `bytes_out` assertion
3 `cap.budget_remaining=0` ambiguous (last-allowed vs rejected) DOCUMENTED post-call semantics; integrators join `(cap.budget_remaining, error.kind)` to disambiguate
4 `process::exit(1)` killed in-flight OTel batches ADDED `runtime.shutdown_timeout(5s)` before exit
5 `approx_value_bytes` ignored Record/Enum payloads — `bytes_out` structurally 0 for the dominant capability shape RECURSED into `Record.fields` and `Enum.payload`

Documented limitations

  • Calling `init()` twice silently overwrites the propagator (CLI is the single owner of the global subscriber)
  • Empty-string `OTEL_EXPORTER_OTLP_ENDPOINT` silently disables (same behavior as unset)
  • Boruna args are intentionally NOT in span attributes (privacy + size; integrators wanting this can write a custom subscriber)

Closes

FleetQ feedback letter status: COMPLETE

Ask Status PR
#3 versioned `capability_set_hash` (P1) ✅ Closed #10
#4 streaming output from `boruna_run` (P1) ⏸ Not addressed (deferred to 0.4.0)
#5 structured resource limits (P1) ✅ Closed #13
#6 stable MCP response schemas (P1) ✅ Closed #11
#7 record/replay for net.fetch (P2) ✅ Closed #15
#8 output schema validation gate (P2) ✅ Closed #14
#9 OTel observability (P2) ✅ Closed this PR
Plus 2 P0s in v0.2.0 (fine-grained policy, multi-target binaries) ✅ Shipped 2026-04-25

Note: #4 (streaming output) was tracked in the original feedback but not in the immediate sprint queue — defer to 0.4.0 alongside other operations work.

Next sprint pivots to `0.3-S2` (persistent state) — the critical-path 0.3.0 work that ADR PR #12 unblocks.

🤖 Generated with Claude Code

escapeboy and others added 2 commits April 25, 2026 19:23
Sprint 0.4-S5. The LAST FleetQ ask. Closes the implementer feedback
letter completely (6 sprints in a row, 9/9 P1+P2 asks shipped).

## What changed

### Always-on tracing (zero-cost when no subscriber)

- New non-optional `tracing = "0.1"` dep on `boruna-vm`.
- `CapabilityGateway::call` now wraps the call body in a
  `tracing::info_span!` named `boruna.cap` with attributes:
    cap.name (str), bytes_in (u64), bytes_out (u64),
    cap.budget_remaining (u64, post-call quota),
    error.kind (str: "denied" / "budget_exceeded" / "runtime_error",
    set only on the failure path)
- When no subscriber is installed (default), span macros expand to
  essentially no-ops (single atomic check + a few stack structs).
- `approx_value_bytes` recurses through every container variant
  (List, Map, Record, Enum, Some/Ok/Err) so bytes_out for record-
  returning capabilities (db.query, llm.call) is meaningful — not
  structurally zero.

### `telemetry` Cargo feature

- Adds `opentelemetry 0.27` + `opentelemetry_sdk 0.27` (rt-tokio) +
  `opentelemetry-otlp 0.27` (http-proto + reqwest-client, default-
  features off) + `tracing-opentelemetry 0.28` + `tracing-subscriber 0.3`
  + `tokio` (workspace, optional).
- New module `boruna_vm::telemetry` with `init() -> TelemetryHandle`.
- Reads `OTEL_EXPORTER_OTLP_ENDPOINT` (the OTel standard env var) and
  optional `OTEL_SERVICE_NAME` (defaults to "boruna").
- Endpoint unset → returns Disabled no-op handle (Boruna behaves
  identically to a non-telemetry build).
- Endpoint set → installs OTLP-over-HTTP exporter into the global
  tracing subscriber registry.
- `TelemetryHandle::Drop` calls `force_flush()` on the provider.

### CLI integration

- New `telemetry` feature on `boruna-cli` that depends on
  `boruna-vm/telemetry` and `tokio`.
- When built with `--features telemetry`, `main` starts a tokio
  runtime, enters its context, calls `init_telemetry()` BEFORE parsing
  CLI args, holds the handle for the binary lifetime.
- On shutdown: drops the handle (queues flush), then drops the
  runtime guard, then calls `runtime.shutdown_timeout(5s)` so in-
  flight OTel HTTP POSTs get a chance to complete instead of being
  killed by `process::exit`.

## Tests

- 4 new VM tests (`test_capability_call_emits_boruna_cap_span_with_attributes`,
  `..._records_bytes_out_for_string_returning_handler`,
  `..._records_error_kind_denied`, `..._records_error_kind_budget_exceeded`)
  using a per-test scoped tracing subscriber that captures spans by Id
  (proper matching, not best-effort).
- 1 telemetry-module test (Disabled handle Drop is a clean no-op).
- All 591+ existing workspace tests pass.
- `cargo clippy --workspace -- -D warnings` clean (with and without
  --features boruna-vm/telemetry).
- `cargo fmt --all -- --check` clean.

## Review

ce-correctness-reviewer surfaced 5 HIGH findings. All addressed before
commit:

1. Env-var mutation in test was UB risk under cargo's parallel test
   runner (POSIX setenv/getenv data race). REMOVED the test rather
   than introduce flakes; documented why in the test module.
2. Test matcher was logically broken — `bytes_out` was captured but
   never asserted. REWROTE the test harness with proper Id-keyed span
   matching (HashMap<u64, CapturedSpan>) and added an explicit
   `bytes_out` assertion in the new
   test_records_bytes_out_for_string_returning_handler test.
3. cap.budget_remaining=0 was ambiguous (last-allowed call vs
   rejected). DOCUMENTED the post-call semantics in the call() doc
   comment; integrators must join (cap.budget_remaining, error.kind)
   to disambiguate.
4. process::exit(1) in CLI killed in-flight OTel batches. ADDED
   explicit runtime.shutdown_timeout(5s) before exit.
5. approx_value_bytes ignored Record/Enum payloads — bytes_out was
   structurally 0 for the dominant capability shape (db.query,
   llm.call). RECURSED into Record.fields and Enum.payload.

## Documented limitations

- Calling init() twice silently overwrites the propagator (CLI is the
  single owner of the global subscriber).
- Empty-string OTEL_EXPORTER_OTLP_ENDPOINT silently disables (matches
  the "absent → off" semantics; a Docker compose with `=` typo gets
  the same behavior as no env var).

## Design

`docs/design-otel.md` — span shape, attribute table, library-version
pin set, determinism contract, BYO-subscriber fallback path.

## Closes

- Closes #9 (FleetQ P2: per-call OpenTelemetry observability)

## FleetQ status after this PR

**ALL 9 P1/P2 asks closed** (#3, #5, #6, #7, #8, #9 + the two P0s
shipped in v0.2.0). The FleetQ implementer feedback letter is fully
addressed. Next sprint pivots to 0.3-S2 (persistent state) — the
critical-path 0.3.0 work the ADR (PR #12) unblocks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the 5 HIGH review findings and how each was addressed. Notable:
the env-var-mutating test was UB risk under cargo's parallel runner
(POSIX setenv/getenv data race per Rust 2024) — removed and documented.

Also captures the closing arc: 6 sprints in a row driven by FleetQ
implementer feedback. 9/9 P1+P2 asks shipped + 2 P0s already in v0.2.0.
Next pivot: 0.3-S2 (persistent workflow state) once the 7 queued PRs
settle and the persistence ADR (PR #12) merges.

Establishes new project conventions:
- Pre-flight dep probes for unfamiliar version-pin sets (5 min of
  /tmp/probe saves hours of in-tree archaeology).
- For library-mediated infrastructure requiring tokio: handle Drop
  before runtime shutdown_timeout before process::exit. Drop alone
  doesn't drain in-flight async work that process::exit kills.
- For test harnesses recording side-effects, prefer Id-keyed dispatch
  over best-effort matchers — the cost is a HashMap; the payoff is
  correctness in multi-event tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@escapeboy escapeboy merged commit 0bc58b6 into master Apr 25, 2026
1 of 3 checks passed
@escapeboy escapeboy deleted the feat/0.4-s5-otel-observability branch April 25, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P2] Per-call observability hooks (OpenTelemetry)

1 participant