Skip to content

Cloud Sync Mechanics

Ivan Seredkin edited this page May 23, 2026 · 2 revisions

Cloud Sync Mechanics

The operational sibling of Cloud Data Contract and Privacy Boundary (ADR-0083). That ADR pins what can leave the machine and what shape the wire format takes; this page documents how the daemon-side sync worker delivers it β€” the loop, the watermark, retry behavior, manual paths, and the onboarding UX.

Cloud sync is disabled by default. The daemon never phones home, never opts in automatically, and never carries telemetry that the user has not explicitly enabled by running budi cloud init or budi cloud join.

The sync worker

A single background task inside budi-daemon (workers/cloud_sync.rs) owns the upload path. The worker:

  1. Reads from rollup tables only. The envelope is built from message_rollups_daily aggregates and a curated projection of session metadata. The worker has no path to messages.raw_json, no path to prompts, no path to file paths. This is the structural enforcement of the privacy contract from ADR-0083 Β§1.
  2. Builds the sync envelope (daily_rollups[] + session_summaries[]) per the schema in ADR-0083 Β§2.
  3. Tracks a watermark in the existing sync_state table under the __budi_cloud_sync__ keys. On each tick it sends new days (bucket_day > watermark) plus today's rollups (always re-sent β€” they may have grown).
  4. POSTs https://app.getbudi.dev/v1/ingest with Authorization: Bearer budi_<key>, parses the server's confirmation, and advances the local watermark to the server-confirmed value.
  5. Idle-loops at the configured interval (default 300 s; configurable via [cloud.sync].interval_seconds).

Double-post safety

A separate AppState.cloud_syncing AtomicBool guards the worker and the manual budi cloud sync path from running concurrently. Both surfaces share the same sync_tick() call; the AtomicBool ensures only one execution at a time, so a developer who hits budi cloud sync while the background worker is mid-upload doesn't trigger a double-post.

Server-side, the ingest endpoint is UPSERT-only keyed on (device_id, bucket_day, role, provider, model, repo_id, git_branch) for rollups and (device_id, session_id) for sessions, per ADR-0083 Β§5. A re-uploaded row is a no-op.

Retry and backoff

Server response Worker behavior
200 OK Advance watermark to the server-confirmed value. Continue next tick.
401 Unauthorized Stop syncing. The user must re-authenticate. The status endpoint reports auth_error; the CLI prompts on next budi cloud status.
422 Unprocessable Entity Pause syncing until the daemon is upgraded. Implies the cloud has moved to a newer schema. Structured log at warn.
429 Too Many Requests Exponential backoff (1 s β†’ 2 s β†’ 4 s β†’ … β†’ 5 min cap). Watermark not advanced.
5xx Server Error Same exponential backoff as 429.
Network failure Same backoff. The local DB keeps growing; no data is lost.

The backoff cap is 5 minutes; the worker does not give up entirely. A daemon left running through a 12-hour cloud outage catches up on the first successful POST after the outage clears.

Manual sync paths

Surface Underlying call Notes
budi cloud sync POST /cloud/sync (loopback-only) Triggers the same sync_tick() the worker runs. Returns non-zero exit code on non-ok sync. Useful for "force the cloud to catch up before I quit my laptop".
budi cloud status GET /cloud/status Read-only. Reports readiness + watermarks. No network call β€” the daemon answers from local state, so this works offline.
GET /v1/ingest/status (cloud-side) β€” Returns the server's view of the device's watermark and sync health. Used by the dashboard.

The CLI returns text by default; --format json is supported by both cloud sync and cloud status. Exit code 2 on a non-ok sync.

Onboarding helper

budi cloud init writes ~/.config/budi/cloud.toml from a commented template. Three modes:

budi cloud init                        # write commented template; manual edit to enable
budi cloud init --api-key budi_xxx     # one-shot: write key, set enabled = true
budi cloud init --force [--yes]        # overwrite existing config (--yes skips confirm)

The status renderer distinguishes five states so the onboarding UX is precise rather than binary:

  1. Disabled β€” no config (cloud.toml does not exist)
  2. Disabled β€” stub key (config exists but api_key is the template placeholder)
  3. Enabled but missing API key (enabled = true, no key)
  4. Enabled but not fully configured (key present, workspace not joined)
  5. Ready (key + workspace + watermark all healthy)

CloudSyncStatus carries config_exists + api_key_stub flags so the daemon's GET /cloud/status envelope drives the three-way UX without a separate filesystem poke on every render.

Config surface

~/.config/budi/cloud.toml:

[cloud]
enabled     = false
api_key     = "budi_..."
device_id   = "dev_..."
workspace_id = "ws_..."             # legacy `org_id` still read via serde alias
endpoint    = "https://app.getbudi.dev"

[cloud.sync]
interval_seconds  = 300              # 5 minutes
retry_max_seconds = 300              # backoff cap

Environment overrides (highest precedence):

Var Purpose
BUDI_CLOUD_ENABLED true / false override
BUDI_CLOUD_API_KEY Override API key (CI / scripted setup)
BUDI_CLOUD_ENDPOINT Override cloud endpoint (self-hosted)

Per the 2026-05-15 amendment to ADR-0083, workspace_id is the canonical key; the legacy org_id is accepted as an alias on read (TOML, CLI flag, JSON output) during the rename deprecation window. Users do not need to edit their config when upgrading.

Privacy invariants (recap)

The wire envelope never carries:

  • Prompt or response content
  • File paths, cwd, workspace_root
  • Email addresses
  • Raw JSON payloads
  • Tag values (only known keys β€” repo_id, git_branch, ticket, ticket_source β€” make it through)
  • Tool arguments or tool results

The full never-upload table and reasoning is in Cloud Data Contract and Privacy Boundary Β§1.

Reference implementations

Out of scope here

Clone this wiki locally