feat: lazy on-demand R2 index restore (skip eager bootstrap) by kristof-siket · Pull Request #14 · prisma/streams

kristof-siket · 2026-07-01T08:42:33Z

Problem

The build-logs Streams server (@prisma/streams-server, deployed on Prisma Compute) does a blocking eager full-index restore on boot when started with --bootstrap-from-r2. bootstrapFromR2 lists every stream and, per stream, GETs the manifest, HEADs every segment, and GETs the schema, writing all of it into local SQLite — and it awaits this before createApp + Bun.serve.

That cost scales with the whole R2 backlog. It now exceeds the 60s deploy health gate, so:

the deploy rolls back on every merge, and
every restart (deploy / rollback / idle-sleep-wake on Compute) re-triggers the full restore → intermittent read outages.

Why lazy restore is correct

Per this repo's own docs (docs/architecture.md §"High-level components" → Reader, docs/recovery-integrity-runbook.md §1.3), SQLite is a local cache/index of durable R2 state. Historical log data already streams from R2 on a segment cache miss. A completed stream is fully in R2 (manifest = complete metadata, segments = data), and its SQLite index rows are 100% derivable from its R2 manifest — which is exactly what bootstrapFromR2 does per stream.

So the eager pass is unnecessary: a single stream's index can be hydrated on demand from its manifest, then the existing reader runs unchanged.

Design — Option A (reuse the reader, don't refactor it)

Extract per-stream hydration. bootstrapFromR2's per-manifest loop body is extracted into restoreManifestIntoDb, now shared by the eager path and a new hydrateStreamFromR2(cfg, store, db, streamName) (computes the manifest key from the stream name). bootstrapFromR2 stays behavior-preserving — it just enumerates and delegates. The ~300-line body is unchanged (not re-indented), so the diff is a clean 56-line extraction.
Hydrate on read-miss. In src/app_core.ts the read handlers resolve the stream row through getStreamForRead. On a miss (and only when lazy restore is on) it awaits hydrateStreamFromR2, then re-reads. A stream with no manifest in R2 is a genuine 404. Append and touch sites keep plain db.getStream (they don't restore history).
Non-eager boot. New --lazy-restore flag (and DS_LAZY_RESTORE config field) SKIPS the eager await bootstrapFromR2(...) in both entry points — src/server.ts (which the published @prisma/streams-server/compute export routes through via package_entry.ts → ../server) and src/compute/demo_entry.ts (its own bootstrap gate). --bootstrap-from-r2 keeps its existing meaning; --lazy-restore is a new mode and wins when both are passed. Server serves /health immediately.
Single-flight. A per-stream in-flight Map<string, Promise> so concurrent reads of the same cold stream hydrate once; the entry is evicted on settle.

The rows a lazy read writes are identical to what the eager pass writes for that stream, so the reader's segment/WAL merge stays correct whether the index was restored eagerly or lazily. Safety invariants (runbook §1.2) and read correctness (§1.3) are preserved.

Deferred: eviction of hydrated rows to bound SQLite. On Compute the local SQLite resets each redeploy and the working set is small; a long-lived node reading a very large stream set would accumulate index rows. Noted in docs/overview.md as future work.

Measurement — the eager-boot curve

Opt-in harness test/bootstrap_restore_scaling.test.ts (run via bun run test:bootstrap-scaling, gated behind DS_BOOTSTRAP_SCALING=1 like test:large-index-filter). It seeds a small authentic corpus through the real append→segment→upload path into a MockR2 (genuine manifest/segment layout), replicates those R2 objects under distinct stream names to reach N, injects a fixed per-op latency to model R2 round-trips, and times bootstrapFromR2 at N = 100 / 1k / 5k.

Captured with DS_BOOTSTRAP_SCALING_DELAY_MS=5 (the table always projects the measured op count to a realistic 25ms round-trip):

streams	segments	r2 objects	store ops	@5ms measured	@25ms projected
100	100	204	408	2,668 ms	10.2 s
1000	1000	2,004	4,008	24,568 ms	100.2 s
5000	5000	10,004	20,008	119,919 ms	500.2 s

Wall-clock is linear in the backlog (50× streams → 50× ops → ~45× time). At a realistic 25ms R2 round-trip the 5000-stream eager restore projects to ~500s — 8× over the 60s gate. Even the measured 5ms run already hits ~120s at N=5000 (2× the gate). This reproduces the production wall with no real R2 needed.

Correctness tests

test/lazy_r2_restore.test.ts (MockR2, real seed path):

boots immediately — lazy node has 0 hydrated streams at boot (no eager restore ran)
hydrate-on-miss byte-identical — a cold read returns records byte-for-byte identical to the eager-restored path
never-written stream 404s
K-of-N — after reading K of N cold streams, exactly K rows are hydrated (not N)
single-flight — a burst of 12 concurrent reads of one cold stream triggers exactly one manifest fetch

Verification

bun run verify (result-policy + typecheck + bun test): 369 pass, 0 fail (9 skip, incl. the opt-in scaling test)
Full-server conformance: 239/239 · Local-mode conformance: 239/239

How pdp-control-plane adopts this

Swap --bootstrap-from-r2 → --lazy-restore in services/build-runner/streams/compute-entry.ts (or the deploy args) and bump the @prisma/streams-server dependency. The compute entry routes through server.ts, whose new --lazy-restore gate skips the eager restore. /health then comes up immediately regardless of the R2 backlog, so the deploy gate stops rolling back, and idle-wake/restart no longer re-runs the full restore.

🤖 Generated with Claude Code

@5ms

The build-logs streams server does a blocking eager full-index restore on boot with `--bootstrap-from-r2`: it lists every stream and, per stream, GETs the manifest, HEADs every segment, and GETs the schema into local SQLite, all before `createApp` + `Bun.serve`. That cost scales with the whole R2 backlog and now exceeds the 60s deploy health gate, so the deploy rolls back on every merge, and every restart (deploy/rollback/idle-sleep-wake on Compute) re-triggers it, causing intermittent read outages. SQLite is a local cache of durable R2 state, and a completed stream's rows are fully derivable from its R2 manifest (which is exactly what the eager bootstrap does per stream). So the eager pass is unnecessary: a stream's index can be hydrated on demand from its manifest on the first read miss. Design (Option A — reuse the reader, don't refactor it): - Extract the per-manifest restore loop body from `bootstrapFromR2` into `restoreManifestIntoDb`, shared by both the eager path and a new `hydrateStreamFromR2(cfg, store, db, streamName)`. The rows a lazy read writes are identical to what the eager pass writes for that stream, so the reader's segment/WAL merge stays correct either way. - Add `--lazy-restore` (and `DS_LAZY_RESTORE`) which skips the eager `bootstrapFromR2` in both entry points (`server.ts` and the compute `demo_entry.ts`) so the server serves `/health` immediately. When both flags are passed, `--lazy-restore` wins. - In `createAppCore`, the read handlers resolve the stream row through `getStreamForRead`, which on a miss (and only when lazy restore is on) hydrates from R2, then re-reads. A stream with no manifest is a genuine 404. Concurrent first reads of the same cold stream share one hydration (single-flight, evicted on settle). Eager and local modes keep their synchronous hot path unchanged. Eviction of hydrated rows to bound SQLite is left as future work; on Compute local SQLite resets per redeploy and the working set is small. Measurement (opt-in `test:bootstrap-scaling`, MockR2 with modeled R2 round-trips, authentic manifest/segment layout): streams | store ops | @5ms measured | @25ms projected 100 | 408 | 2.7s | 10.2s 1000 | 4008 | 24.6s | 100.2s 5000 | 20008 | 119.9s | 500.2s Wall-clock is linear in the backlog and, at a realistic 25ms round-trip, the 5000-stream eager restore projects to ~500s — 8x over the 60s gate. Even at 5ms it already hits ~120s at 5000. Verification: `bun run verify` (369 pass), full + local conformance (239/239 each), and new `test/lazy_r2_restore.test.ts` covering boot-serves-immediately, byte-identical hydrate-on-miss vs the eager path, never-written 404, K-of-N row-count, and single-flight. pdp-control-plane adopts this by swapping `--bootstrap-from-r2` for `--lazy-restore` in its build-runner streams compute entry and bumping the `@prisma/streams-server` dependency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

kristof-siket · 2026-07-01T09:59:58Z

✅ End-to-end verification — real binary, real restart, durable store

Verified --lazy-restore against a real MinIO object store with a genuine cross-process restart of the actual bun run src/server.ts binary — no prod creds, no Cloudflare. Every assertion passed, and the boot-time win is decisive.

Boot time — eager vs lazy at scale

Identical backlog pre-seeded into MinIO (256 stream manifests / 668 sealed segments), each instance booted one-at-a-time on its own fresh empty rootDir (so local SQLite starts empty = a real redeploy). Wall-clock from process spawn to first /health 200 (tight 5 ms poll loop):

Mode	Flag	`/health`-ready
Eager	`--bootstrap-from-r2`	4,324 ms — blocks restoring all 256 streams
Lazy	`--lazy-restore`	119 ms — ~36× faster, skips restore

Eager scales linearly with the backlog; lazy stays flat. The lazy instance had 0 backlog streams in local SQLite at boot, yet a cold read of backlog/s0100 (never touched on that instance) hydrated it on demand and returned all 6 records — deferred, not lost.

Correctness — genuine cross-process restart

Seeded 6 per-build streams (build-logs/b1…build-logs/b5, 15 records each) → confirmed uploaded to MinIO → killed the writer process → started a fresh instance (empty SQLite) with --lazy-restore against the same MinIO:

No eager restore. Lazy /health-ready in 160 ms; local SQLite held only the streams that were read (b1, b3) plus internal __stream_metrics__ — b2/b4/b5 sat un-restored in MinIO.
Cold read hydrates, byte-identical. GET /v1/stream/build-logs%2Fb1?offset=-1&format=json on the fresh instance returned all 15 records identical to the exact request bodies captured before the restart (deep + string compare, key order preserved). build-logs/b3 likewise (15 records, all buildId=b3).
On-demand. Each previously-absent stream appeared in local SQLite only after it was read.
Missing → 404. GET /v1/stream/build-logs%2Fnope?... → HTTP 404 {"error":{"code":"not_found","message":"not_found"}}.

Setup (reproducible)

MinIO as a durable S3 store — R2ObjectStore honors DURABLE_STREAMS_R2_ENDPOINT with standard SigV4 path-style addressing and worked with zero code changes:

docker run -d --name streams-minio -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minio -e MINIO_ROOT_PASSWORD=minio12345 \
  minio/minio server /data --console-address ":9001"
docker exec streams-minio mc mb local/streams-test

Server env (each instance on its own empty DS_ROOT + unique PORT):

DURABLE_STREAMS_R2_BUCKET=streams-test
DURABLE_STREAMS_R2_ACCESS_KEY_ID=minio
DURABLE_STREAMS_R2_SECRET_ACCESS_KEY=minio12345
DURABLE_STREAMS_R2_ENDPOINT=http://127.0.0.1:9000
DURABLE_STREAMS_R2_REGION=us-east-1

Writer (forces sealing + upload): DS_SEGMENT_TARGET_ROWS=5 DS_SEGMENT_MAX_INTERVAL_MS=200 DS_SEGMENT_CHECK_MS=100 DS_UPLOAD_CHECK_MS=100, argv --object-store r2 --no-auth.
Lazy reader: argv --object-store r2 --no-auth --lazy-restore.
Eager reader: argv --object-store r2 --no-auth --bootstrap-from-r2.

Subtleties handled

Forced sealing + upload — with prod-size segments a handful of records stays in the WAL and never reaches the object store (lazy would then correctly return no_logs). Lowered the segment thresholds and polled GET /v1/stream/<name>/_details until uploaded_segment_count == segment_count (and confirmed manifest.json objects in MinIO) before restarting.
_details shape — the counters (segment_count, uploaded_segment_count, sealed_through, uploaded_through) are nested under a top-level stream key, not the root.
_details is heavy — 250 concurrent _details calls (per-stream storage stats) reset a socket; the large-backlog confirmation was run at concurrency 8 with retries.
Auth — src/auth.ts requires exactly one of --no-auth / --auth-strategy api-key; used --no-auth.

Baseline

The in-process suite bun test test/lazy_r2_restore.test.ts also passes (4/4) — full createApp HTTP requests, fresh-SQLite over a populated MockR2, byte-identical vs the eager path, K-of-N hydration, and single-flight (12 concurrent reads → 1 manifest fetch).

Conclusion

Lazy restore behaves exactly as intended: instant boot regardless of backlog, correct on-demand hydration, no data loss, no eager scan. The eager path's boot cost grows linearly with the backlog (4.3 s at 256 streams and climbing); lazy is flat at ~0.1 s. 🟢

(No source was modified for this run; the streams repo was left clean on feat/lazy-r2-restore.)

🤖 Test run generated with Claude Code

kristof-siket marked this pull request as ready for review July 1, 2026 09:45

kristof-siket mentioned this pull request Jul 2, 2026

Reap expired and deleted streams from the object store #15

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: lazy on-demand R2 index restore (skip eager bootstrap)#14

feat: lazy on-demand R2 index restore (skip eager bootstrap)#14
kristof-siket wants to merge 1 commit into
mainfrom
feat/lazy-r2-restore

kristof-siket commented Jul 1, 2026

Uh oh!

kristof-siket commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kristof-siket commented Jul 1, 2026

Problem

Why lazy restore is correct

Design — Option A (reuse the reader, don't refactor it)

Measurement — the eager-boot curve

Correctness tests

Verification

How pdp-control-plane adopts this

Uh oh!

kristof-siket commented Jul 1, 2026

✅ End-to-end verification — real binary, real restart, durable store

Boot time — eager vs lazy at scale

Correctness — genuine cross-process restart

Setup (reproducible)

Subtleties handled

Baseline

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant