feat: lazy on-demand R2 index restore (skip eager bootstrap)#14
feat: lazy on-demand R2 index restore (skip eager bootstrap)#14kristof-siket wants to merge 1 commit into
Conversation
The build-logs streams server does a blocking eager full-index restore on boot with `--bootstrap-from-r2`: it lists every stream and, per stream, GETs the manifest, HEADs every segment, and GETs the schema into local SQLite, all before `createApp` + `Bun.serve`. That cost scales with the whole R2 backlog and now exceeds the 60s deploy health gate, so the deploy rolls back on every merge, and every restart (deploy/rollback/idle-sleep-wake on Compute) re-triggers it, causing intermittent read outages. SQLite is a local cache of durable R2 state, and a completed stream's rows are fully derivable from its R2 manifest (which is exactly what the eager bootstrap does per stream). So the eager pass is unnecessary: a stream's index can be hydrated on demand from its manifest on the first read miss. Design (Option A — reuse the reader, don't refactor it): - Extract the per-manifest restore loop body from `bootstrapFromR2` into `restoreManifestIntoDb`, shared by both the eager path and a new `hydrateStreamFromR2(cfg, store, db, streamName)`. The rows a lazy read writes are identical to what the eager pass writes for that stream, so the reader's segment/WAL merge stays correct either way. - Add `--lazy-restore` (and `DS_LAZY_RESTORE`) which skips the eager `bootstrapFromR2` in both entry points (`server.ts` and the compute `demo_entry.ts`) so the server serves `/health` immediately. When both flags are passed, `--lazy-restore` wins. - In `createAppCore`, the read handlers resolve the stream row through `getStreamForRead`, which on a miss (and only when lazy restore is on) hydrates from R2, then re-reads. A stream with no manifest is a genuine 404. Concurrent first reads of the same cold stream share one hydration (single-flight, evicted on settle). Eager and local modes keep their synchronous hot path unchanged. Eviction of hydrated rows to bound SQLite is left as future work; on Compute local SQLite resets per redeploy and the working set is small. Measurement (opt-in `test:bootstrap-scaling`, MockR2 with modeled R2 round-trips, authentic manifest/segment layout): streams | store ops | @5ms measured | @25ms projected 100 | 408 | 2.7s | 10.2s 1000 | 4008 | 24.6s | 100.2s 5000 | 20008 | 119.9s | 500.2s Wall-clock is linear in the backlog and, at a realistic 25ms round-trip, the 5000-stream eager restore projects to ~500s — 8x over the 60s gate. Even at 5ms it already hits ~120s at 5000. Verification: `bun run verify` (369 pass), full + local conformance (239/239 each), and new `test/lazy_r2_restore.test.ts` covering boot-serves-immediately, byte-identical hydrate-on-miss vs the eager path, never-written 404, K-of-N row-count, and single-flight. pdp-control-plane adopts this by swapping `--bootstrap-from-r2` for `--lazy-restore` in its build-runner streams compute entry and bumping the `@prisma/streams-server` dependency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
✅ End-to-end verification — real binary, real restart, durable storeVerified Boot time — eager vs lazy at scaleIdentical backlog pre-seeded into MinIO (256 stream manifests / 668 sealed segments), each instance booted one-at-a-time on its own fresh empty rootDir (so local SQLite starts empty = a real redeploy). Wall-clock from process spawn to first
Eager scales linearly with the backlog; lazy stays flat. The lazy instance had 0 backlog streams in local SQLite at boot, yet a cold read of Correctness — genuine cross-process restartSeeded 6 per-build streams (
Setup (reproducible)MinIO as a durable S3 store — docker run -d --name streams-minio -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minio -e MINIO_ROOT_PASSWORD=minio12345 \
minio/minio server /data --console-address ":9001"
docker exec streams-minio mc mb local/streams-testServer env (each instance on its own empty
Subtleties handled
BaselineThe in-process suite ConclusionLazy restore behaves exactly as intended: instant boot regardless of backlog, correct on-demand hydration, no data loss, no eager scan. The eager path's boot cost grows linearly with the backlog (4.3 s at 256 streams and climbing); lazy is flat at ~0.1 s. 🟢 (No source was modified for this run; the streams repo was left clean on 🤖 Test run generated with Claude Code |
Problem
The build-logs Streams server (
@prisma/streams-server, deployed on Prisma Compute) does a blocking eager full-index restore on boot when started with--bootstrap-from-r2.bootstrapFromR2lists every stream and, per stream, GETs the manifest, HEADs every segment, and GETs the schema, writing all of it into local SQLite — and itawaits this beforecreateApp+Bun.serve.That cost scales with the whole R2 backlog. It now exceeds the 60s deploy health gate, so:
Why lazy restore is correct
Per this repo's own docs (
docs/architecture.md§"High-level components" → Reader,docs/recovery-integrity-runbook.md§1.3), SQLite is a local cache/index of durable R2 state. Historical log data already streams from R2 on a segment cache miss. A completed stream is fully in R2 (manifest = complete metadata, segments = data), and its SQLite index rows are 100% derivable from its R2 manifest — which is exactly whatbootstrapFromR2does per stream.So the eager pass is unnecessary: a single stream's index can be hydrated on demand from its manifest, then the existing reader runs unchanged.
Design — Option A (reuse the reader, don't refactor it)
bootstrapFromR2's per-manifest loop body is extracted intorestoreManifestIntoDb, now shared by the eager path and a newhydrateStreamFromR2(cfg, store, db, streamName)(computes the manifest key from the stream name).bootstrapFromR2stays behavior-preserving — it just enumerates and delegates. The ~300-line body is unchanged (not re-indented), so the diff is a clean 56-line extraction.src/app_core.tsthe read handlers resolve the stream row throughgetStreamForRead. On a miss (and only when lazy restore is on) itawaitshydrateStreamFromR2, then re-reads. A stream with no manifest in R2 is a genuine 404. Append and touch sites keep plaindb.getStream(they don't restore history).--lazy-restoreflag (andDS_LAZY_RESTOREconfig field) SKIPS the eagerawait bootstrapFromR2(...)in both entry points —src/server.ts(which the published@prisma/streams-server/computeexport routes through viapackage_entry.ts→../server) andsrc/compute/demo_entry.ts(its own bootstrap gate).--bootstrap-from-r2keeps its existing meaning;--lazy-restoreis a new mode and wins when both are passed. Server serves/healthimmediately.Map<string, Promise>so concurrent reads of the same cold stream hydrate once; the entry is evicted on settle.The rows a lazy read writes are identical to what the eager pass writes for that stream, so the reader's segment/WAL merge stays correct whether the index was restored eagerly or lazily. Safety invariants (runbook §1.2) and read correctness (§1.3) are preserved.
Deferred: eviction of hydrated rows to bound SQLite. On Compute the local SQLite resets each redeploy and the working set is small; a long-lived node reading a very large stream set would accumulate index rows. Noted in
docs/overview.mdas future work.Measurement — the eager-boot curve
Opt-in harness
test/bootstrap_restore_scaling.test.ts(run viabun run test:bootstrap-scaling, gated behindDS_BOOTSTRAP_SCALING=1liketest:large-index-filter). It seeds a small authentic corpus through the real append→segment→upload path into a MockR2 (genuine manifest/segment layout), replicates those R2 objects under distinct stream names to reach N, injects a fixed per-op latency to model R2 round-trips, and timesbootstrapFromR2at N = 100 / 1k / 5k.Captured with
DS_BOOTSTRAP_SCALING_DELAY_MS=5(the table always projects the measured op count to a realistic 25ms round-trip):Wall-clock is linear in the backlog (50× streams → 50× ops → ~45× time). At a realistic 25ms R2 round-trip the 5000-stream eager restore projects to ~500s — 8× over the 60s gate. Even the measured 5ms run already hits ~120s at N=5000 (2× the gate). This reproduces the production wall with no real R2 needed.
Correctness tests
test/lazy_r2_restore.test.ts(MockR2, real seed path):Verification
bun run verify(result-policy + typecheck +bun test): 369 pass, 0 fail (9 skip, incl. the opt-in scaling test)How pdp-control-plane adopts this
Swap
--bootstrap-from-r2→--lazy-restoreinservices/build-runner/streams/compute-entry.ts(or the deploy args) and bump the@prisma/streams-serverdependency. The compute entry routes throughserver.ts, whose new--lazy-restoregate skips the eager restore./healththen comes up immediately regardless of the R2 backlog, so the deploy gate stops rolling back, and idle-wake/restart no longer re-runs the full restore.🤖 Generated with Claude Code