Skip to content

Operations and Observability

Ivan Seredkin edited this page May 23, 2026 · 2 revisions

Operations and Observability

Local-side health and observability. Where the daemon and the CLI surface their internal state so operators can tell whether budi is working without reading the source code.

This page is split into two halves. Local ops is everything the daemon and CLI expose on a single machine — budi doctor, pricing status, the SQLite schema check, the e2e regression guards. Cloud ops is what gets surfaced for team / fleet visibility once cloud sync is enabled; in the v8.x cloud alpha that surface is intentionally minimal and most of it is still being built.

Local ops

budi doctor

The canonical health check. Runs without requiring the cloud and reports daemon, schema, transcript visibility, statusline wiring, attribution ratios, and legacy-residue cleanup. Implementation: crates/budi-cli/src/commands/doctor.rs.

Three attribution-quality checks ride alongside the connectivity checks:

Check Trigger Window
Session visibility Fails when a window has assistant rows but zero returned sessions — indicates a session-attribution bug today, 7d, 30d
Branch attribution (per provider) Yellow at > 10 % of assistant rows missing git_branch, red at > 50 % 7d
Activity attribution (per provider) Red at ≥ 99.9 % missing activity tags with ≥ 5 rows in window (silent classifier regression), yellow at > 90 % 7d

A first-run nudge fires when the DB has no assistant activity yet, so day-zero users don't misread an empty attribution dashboard as a setup failure.

budi doctor also reports retained legacy proxy residue read-only — see crates/budi-core/src/legacy_proxy.rs. The proxy-routing era is over; budi uninstall is the managed-cleanup path that removes residue from shell profiles and agent configs.

budi status

Quick overview: daemon up, today's cost, first-run hints. Designed for "is this thing on" rather than diagnosing a broken install. Implementation: crates/budi-cli/src/commands/status.rs.

Cost-confidence semantics

Every messages row carries a cost_confidence field that drives the ~ prefix in CLI / dashboard rendering. Values:

Value Meaning
estimated Tokens parsed from local transcript; cost computed via the manifest. Default for JSONL ingestion.
exact Per-request truth-up from a provider's billing API. Cursor Usage API rows; Copilot Chat billing-API rows.
estimated_unknown_model Model id not present in the manifest. cost_cents = 0, surfaced as a warn icon on the row and a count in budi pricing status. Auto-backfills via pricing_source = 'backfilled:vNNN' once upstream catches up.
proxy_estimated (legacy) Pre-v8.2 proxy-era rows. Read-only; no new writes.
otel_exact (legacy) Pre-v8.x OTEL-era rows. Read-only; OTEL ingestion has been removed.

Pricing observability

Surface What it shows
GET /pricing/status Manifest version, on-disk-cache mtime, embedded-baseline version, rejected upstream rows (rejected_upstream_rows[]), unknown-model counts
budi pricing status Text rendering of the above for humans
budi pricing status --format json Same data, JSON shape
POST /pricing/refresh (loopback only) Trigger an on-demand refresh tick
BUDI_PRICING_REFRESH=0 Disable the 24 h refresh worker (also disables the team-pricing pull)

The rejected_upstream_rows[] surface is load-bearing for operators: per the 2026-04-22 amendment to Model Pricing – Embedded Baseline and Runtime Refresh (ADR-0091 §2), per-row sanity rejection prevents one bad upstream row from blocking the whole refresh. Rejected rows are still visible — operators can see what got dropped and why.

Schema audits and repair

Surface What it does
budi db check Reports schema drift, missing indices, orphaned rows
budi db check --fix Upgrades the schema to current; repairs drift in place
budi db import Re-runs JSONL ingestion against every transcript on disk (idempotent — duplicate offsets are skipped)
budi db import --force Clear all data and re-ingest from scratch

budi db <verb> is the only surface for database operations. The CLI never opens SQLite directly — every query goes through the daemon HTTP API.

Session health vitals

budi sessions latest and budi sessions <id> render four vitals (green / yellow / red) for the session:

Vital Minimum samples to score
context_drag ≥ 5 assistant messages in the post-compact, dominant-model effective set
cache_efficiency ≥ 4 assistant messages in the recent model run
cost_acceleration ≥ 6 assistant messages and ≥ $0.25 session cost
thrashing (permanently N/A — load_tool_events is stubbed pending rebuild on top of the tool-outcome signal)

The overall verdict needs ≥ 2 of the remaining 3 to fire; otherwise it stays insufficient_data with a tip that surfaces the assistant-message count, so users can tell data IS flowing during the warm-up window. Tips are provider-aware via the ProviderKind enum (Claude Code → /compact / /clear, Cursor → "new composer session", Other → neutral).

Vitals freshness is bounded by the tailer's notify debounce (~500 ms) plus one SQLite write — no rollup-table dependency.

End-to-end regression guards

Shell-driven end-to-end tests live under scripts/e2e/. They exercise real release binaries (budi + budi-daemon) against an isolated $HOME so they never touch real user data. Each script pins a specific bug or contract; scripts/e2e/README.md is the index.

Design rules (from SOUL.md):

  • No shared mutable state. Every script allocates its own ports and HOME; parallel runs are safe.
  • Fail loud, fail fast. set -euo pipefail; daemon log printed on any failure.
  • Negative-path provable. Each regression test must fail when the fix it guards is reverted (verified before merging).
  • Keep fixtures minimal. Mock upstream responses live inline; no binary fixtures checked in.

Logs

The daemon emits structured JSON logs to stdout. On launchd, systemd, and Task Scheduler, the platform service captures stdout into the system log; the journal locations are listed in Daemon Lifecycle and Autostart.

Notable warn-level events (each fires once per (provider, key) per daemon run to avoid log flooding):

  • unknown_model — pricing lookup found no manifest entry
  • cursor_bubble_schema_unrecognized — Cursor's cursorDiskKV table shape changed; degrading to Usage API fallback
  • cloud_sync_auth_error — 401 from ingest; sync parked until re-auth
  • cloud_sync_schema_mismatch — 422 from ingest; sync parked until daemon upgrade
  • rejected_upstream_rows — at least one row of the LiteLLM manifest failed per-row sanity checks; structured payload lists the offenders

Cloud ops

The cloud alpha (R4) ships with a minimal operator surface. As of v8.5.x:

  • Per-device sync status via GET /v1/ingest/status (cloud-side): server's view of the device's watermark, last-confirmed payload time, sync health
  • Workspace-level dashboards at app.getbudi.dev: cost by day, by user/device, by repo, by model, by ticket. Manager sees aggregated org data; member sees own data.
  • Row-level security enforced via Supabase RLS policies — see ADR-0083 §8

What does not exist yet, and is tracked as forward work in Release and Versioning:

  • Datadog index for the cloud ingest service
  • OpsGenie escalation policy
  • Slack alert channel for ingest-side incidents
  • Published SLOs

Once those surfaces ship for app.getbudi.dev, the operator playbook lives here.

Reference implementations

Out of scope here

Clone this wiki locally