Cloud Sync Mechanics

The operational sibling of Cloud Data Contract and Privacy Boundary (ADR-0083). That ADR pins what can leave the machine and what shape the wire format takes; this page documents how the daemon-side sync worker delivers it — the loop, the watermark, retry behavior, manual paths, and the onboarding UX.

Cloud sync is disabled by default. The daemon never phones home, never opts in automatically, and never carries telemetry that the user has not explicitly enabled by running budi cloud init or budi cloud join.

The sync worker

A single background task inside budi-daemon (workers/cloud_sync.rs) owns the upload path. The worker:

Reads from rollup tables only. The envelope is built from message_rollups_daily aggregates and a curated projection of session metadata. The worker has no path to messages.raw_json, no path to prompts, no path to file paths. This is the structural enforcement of the privacy contract from ADR-0083 §1.
Builds the sync envelope (daily_rollups[] + session_summaries[]) per the schema in ADR-0083 §2.
Tracks a watermark in the existing sync_state table under the __budi_cloud_sync__ keys. On each tick it sends new days (bucket_day > watermark) plus today's rollups (always re-sent — they may have grown).
POSTs https://app.getbudi.dev/v1/ingest with Authorization: Bearer budi_<key>, parses the server's confirmation, and advances the local watermark to the server-confirmed value.
Idle-loops at the configured interval (default 300 s; configurable via [cloud.sync].interval_seconds).

Double-post safety

A separate AppState.cloud_syncing AtomicBool guards the worker and the manual budi cloud sync path from running concurrently. Both surfaces share the same sync_tick() call; the AtomicBool ensures only one execution at a time, so a developer who hits budi cloud sync while the background worker is mid-upload doesn't trigger a double-post.

Server-side, the ingest endpoint is UPSERT-only keyed on (device_id, bucket_day, role, provider, model, repo_id, git_branch) for rollups and (device_id, session_id) for sessions, per ADR-0083 §5. A re-uploaded row is a no-op.

Retry and backoff

Server response	Worker behavior
`200 OK`	Advance watermark to the server-confirmed value. Continue next tick.
`401 Unauthorized`	Stop syncing. The user must re-authenticate. The status endpoint reports `auth_error`; the CLI prompts on next `budi cloud status`.
`422 Unprocessable Entity`	Pause syncing until the daemon is upgraded. Implies the cloud has moved to a newer schema. Structured log at warn.
`429 Too Many Requests`	Exponential backoff (1 s → 2 s → 4 s → … → 5 min cap). Watermark not advanced.
`5xx Server Error`	Same exponential backoff as 429.
Network failure	Same backoff. The local DB keeps growing; no data is lost.

The backoff cap is 5 minutes; the worker does not give up entirely. A daemon left running through a 12-hour cloud outage catches up on the first successful POST after the outage clears.

Manual sync paths

Surface	Underlying call	Notes
`budi cloud sync`	`POST /cloud/sync` (loopback-only)	Triggers the same `sync_tick()` the worker runs. Returns non-zero exit code on non-ok sync. Useful for "force the cloud to catch up before I quit my laptop".
`budi cloud status`	`GET /cloud/status`	Read-only. Reports readiness + watermarks. No network call — the daemon answers from local state, so this works offline.
`GET /v1/ingest/status` (cloud-side)	—	Returns the server's view of the device's watermark and sync health. Used by the dashboard.

The CLI returns text by default; --format json is supported by both cloud sync and cloud status. Exit code 2 on a non-ok sync.

Onboarding helper

budi cloud init writes ~/.config/budi/cloud.toml from a commented template. Three modes:

budi cloud init                        # write commented template; manual edit to enable
budi cloud init --api-key budi_xxx     # one-shot: write key, set enabled = true
budi cloud init --force [--yes]        # overwrite existing config (--yes skips confirm)

The status renderer distinguishes five states so the onboarding UX is precise rather than binary:

Disabled — no config (cloud.toml does not exist)
Disabled — stub key (config exists but api_key is the template placeholder)
Enabled but missing API key (enabled = true, no key)
Enabled but not fully configured (key present, workspace not joined)
Ready (key + workspace + watermark all healthy)

CloudSyncStatus carries config_exists + api_key_stub flags so the daemon's GET /cloud/status envelope drives the three-way UX without a separate filesystem poke on every render.

Config surface

~/.config/budi/cloud.toml:

[cloud]
enabled     = false
api_key     = "budi_..."
device_id   = "dev_..."
workspace_id = "ws_..."             # legacy `org_id` still read via serde alias
endpoint    = "https://app.getbudi.dev"

[cloud.sync]
interval_seconds  = 300              # 5 minutes
retry_max_seconds = 300              # backoff cap

Environment overrides (highest precedence):

Var	Purpose
`BUDI_CLOUD_ENABLED`	`true` / `false` override
`BUDI_CLOUD_API_KEY`	Override API key (CI / scripted setup)
`BUDI_CLOUD_ENDPOINT`	Override cloud endpoint (self-hosted)

Per the 2026-05-15 amendment to ADR-0083, workspace_id is the canonical key; the legacy org_id is accepted as an alias on read (TOML, CLI flag, JSON output) during the rename deprecation window. Users do not need to edit their config when upgrading.

Privacy invariants (recap)

The wire envelope never carries:

Prompt or response content
File paths, cwd, workspace_root
Email addresses
Raw JSON payloads
Tag values (only known keys — repo_id, git_branch, ticket, ticket_source — make it through)
Tool arguments or tool results

The full never-upload table and reasoning is in Cloud Data Contract and Privacy Boundary §1.

Reference implementations

crates/budi-core/src/cloud_sync.rs — envelope builder, watermark tracking, HTTPS-only HTTP client with retry/backoff, privacy-safe rollup extraction
crates/budi-daemon/src/workers/cloud_sync.rs — background loop, interval / backoff / auth / schema-error handling
crates/budi-daemon/src/routes/cloud.rs — POST /cloud/sync (loopback) and GET /cloud/status
crates/budi-cli/src/commands/cloud.rs — budi cloud sync / status / init

Out of scope here

What the wire format actually carries — the schema and never-upload contract → Cloud Data Contract and Privacy Boundary (ADR-0083)
The cloud-side Postgres schema, RLS policies, dashboard UI → siropkin/budi-cloud
Team pricing recompute that runs alongside sync → Custom Team Pricing and Effective Cost (ADR-0094)
Local DB inspection, schema repair → Operations and Observability

budi · Issues · Releases · app.getbudi.dev · getbudi.dev

Home

Start here

ADRs — Data & privacy

Cloud Data Contract and Privacy Boundary

ADRs — Ingestion

ADRs — Pricing

ADRs — Provider contracts

Operational references

Ecosystem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloud Sync Mechanics

Cloud Sync Mechanics

The sync worker

Double-post safety

Retry and backoff

Manual sync paths

Onboarding helper

Config surface

Privacy invariants (recap)

Reference implementations

Out of scope here

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally