Skip to content

feat: lazy on-demand R2 index restore (skip eager bootstrap)#14

Open
kristof-siket wants to merge 1 commit into
mainfrom
feat/lazy-r2-restore
Open

feat: lazy on-demand R2 index restore (skip eager bootstrap)#14
kristof-siket wants to merge 1 commit into
mainfrom
feat/lazy-r2-restore

Conversation

@kristof-siket

Copy link
Copy Markdown

Problem

The build-logs Streams server (@prisma/streams-server, deployed on Prisma Compute) does a blocking eager full-index restore on boot when started with --bootstrap-from-r2. bootstrapFromR2 lists every stream and, per stream, GETs the manifest, HEADs every segment, and GETs the schema, writing all of it into local SQLite — and it awaits this before createApp + Bun.serve.

That cost scales with the whole R2 backlog. It now exceeds the 60s deploy health gate, so:

  • the deploy rolls back on every merge, and
  • every restart (deploy / rollback / idle-sleep-wake on Compute) re-triggers the full restore → intermittent read outages.

Why lazy restore is correct

Per this repo's own docs (docs/architecture.md §"High-level components" → Reader, docs/recovery-integrity-runbook.md §1.3), SQLite is a local cache/index of durable R2 state. Historical log data already streams from R2 on a segment cache miss. A completed stream is fully in R2 (manifest = complete metadata, segments = data), and its SQLite index rows are 100% derivable from its R2 manifest — which is exactly what bootstrapFromR2 does per stream.

So the eager pass is unnecessary: a single stream's index can be hydrated on demand from its manifest, then the existing reader runs unchanged.

Design — Option A (reuse the reader, don't refactor it)

  1. Extract per-stream hydration. bootstrapFromR2's per-manifest loop body is extracted into restoreManifestIntoDb, now shared by the eager path and a new hydrateStreamFromR2(cfg, store, db, streamName) (computes the manifest key from the stream name). bootstrapFromR2 stays behavior-preserving — it just enumerates and delegates. The ~300-line body is unchanged (not re-indented), so the diff is a clean 56-line extraction.
  2. Hydrate on read-miss. In src/app_core.ts the read handlers resolve the stream row through getStreamForRead. On a miss (and only when lazy restore is on) it awaits hydrateStreamFromR2, then re-reads. A stream with no manifest in R2 is a genuine 404. Append and touch sites keep plain db.getStream (they don't restore history).
  3. Non-eager boot. New --lazy-restore flag (and DS_LAZY_RESTORE config field) SKIPS the eager await bootstrapFromR2(...) in both entry points — src/server.ts (which the published @prisma/streams-server/compute export routes through via package_entry.ts../server) and src/compute/demo_entry.ts (its own bootstrap gate). --bootstrap-from-r2 keeps its existing meaning; --lazy-restore is a new mode and wins when both are passed. Server serves /health immediately.
  4. Single-flight. A per-stream in-flight Map<string, Promise> so concurrent reads of the same cold stream hydrate once; the entry is evicted on settle.

The rows a lazy read writes are identical to what the eager pass writes for that stream, so the reader's segment/WAL merge stays correct whether the index was restored eagerly or lazily. Safety invariants (runbook §1.2) and read correctness (§1.3) are preserved.

Deferred: eviction of hydrated rows to bound SQLite. On Compute the local SQLite resets each redeploy and the working set is small; a long-lived node reading a very large stream set would accumulate index rows. Noted in docs/overview.md as future work.

Measurement — the eager-boot curve

Opt-in harness test/bootstrap_restore_scaling.test.ts (run via bun run test:bootstrap-scaling, gated behind DS_BOOTSTRAP_SCALING=1 like test:large-index-filter). It seeds a small authentic corpus through the real append→segment→upload path into a MockR2 (genuine manifest/segment layout), replicates those R2 objects under distinct stream names to reach N, injects a fixed per-op latency to model R2 round-trips, and times bootstrapFromR2 at N = 100 / 1k / 5k.

Captured with DS_BOOTSTRAP_SCALING_DELAY_MS=5 (the table always projects the measured op count to a realistic 25ms round-trip):

streams segments r2 objects store ops @5ms measured @25ms projected
100 100 204 408 2,668 ms 10.2 s
1000 1000 2,004 4,008 24,568 ms 100.2 s
5000 5000 10,004 20,008 119,919 ms 500.2 s

Wall-clock is linear in the backlog (50× streams → 50× ops → ~45× time). At a realistic 25ms R2 round-trip the 5000-stream eager restore projects to ~500s — 8× over the 60s gate. Even the measured 5ms run already hits ~120s at N=5000 (2× the gate). This reproduces the production wall with no real R2 needed.

Correctness tests

test/lazy_r2_restore.test.ts (MockR2, real seed path):

  • boots immediately — lazy node has 0 hydrated streams at boot (no eager restore ran)
  • hydrate-on-miss byte-identical — a cold read returns records byte-for-byte identical to the eager-restored path
  • never-written stream 404s
  • K-of-N — after reading K of N cold streams, exactly K rows are hydrated (not N)
  • single-flight — a burst of 12 concurrent reads of one cold stream triggers exactly one manifest fetch

Verification

  • bun run verify (result-policy + typecheck + bun test): 369 pass, 0 fail (9 skip, incl. the opt-in scaling test)
  • Full-server conformance: 239/239 · Local-mode conformance: 239/239

How pdp-control-plane adopts this

Swap --bootstrap-from-r2--lazy-restore in services/build-runner/streams/compute-entry.ts (or the deploy args) and bump the @prisma/streams-server dependency. The compute entry routes through server.ts, whose new --lazy-restore gate skips the eager restore. /health then comes up immediately regardless of the R2 backlog, so the deploy gate stops rolling back, and idle-wake/restart no longer re-runs the full restore.

🤖 Generated with Claude Code

The build-logs streams server does a blocking eager full-index restore on
boot with `--bootstrap-from-r2`: it lists every stream and, per stream, GETs
the manifest, HEADs every segment, and GETs the schema into local SQLite,
all before `createApp` + `Bun.serve`. That cost scales with the whole R2
backlog and now exceeds the 60s deploy health gate, so the deploy rolls back
on every merge, and every restart (deploy/rollback/idle-sleep-wake on
Compute) re-triggers it, causing intermittent read outages.

SQLite is a local cache of durable R2 state, and a completed stream's rows
are fully derivable from its R2 manifest (which is exactly what the eager
bootstrap does per stream). So the eager pass is unnecessary: a stream's
index can be hydrated on demand from its manifest on the first read miss.

Design (Option A — reuse the reader, don't refactor it):
- Extract the per-manifest restore loop body from `bootstrapFromR2` into
  `restoreManifestIntoDb`, shared by both the eager path and a new
  `hydrateStreamFromR2(cfg, store, db, streamName)`. The rows a lazy read
  writes are identical to what the eager pass writes for that stream, so the
  reader's segment/WAL merge stays correct either way.
- Add `--lazy-restore` (and `DS_LAZY_RESTORE`) which skips the eager
  `bootstrapFromR2` in both entry points (`server.ts` and the compute
  `demo_entry.ts`) so the server serves `/health` immediately. When both
  flags are passed, `--lazy-restore` wins.
- In `createAppCore`, the read handlers resolve the stream row through
  `getStreamForRead`, which on a miss (and only when lazy restore is on)
  hydrates from R2, then re-reads. A stream with no manifest is a genuine
  404. Concurrent first reads of the same cold stream share one hydration
  (single-flight, evicted on settle). Eager and local modes keep their
  synchronous hot path unchanged.

Eviction of hydrated rows to bound SQLite is left as future work; on Compute
local SQLite resets per redeploy and the working set is small.

Measurement (opt-in `test:bootstrap-scaling`, MockR2 with modeled R2
round-trips, authentic manifest/segment layout):

  streams | store ops | @5ms measured | @25ms projected
      100 |       408 |         2.7s  |          10.2s
     1000 |      4008 |        24.6s  |         100.2s
     5000 |     20008 |       119.9s  |         500.2s

Wall-clock is linear in the backlog and, at a realistic 25ms round-trip, the
5000-stream eager restore projects to ~500s — 8x over the 60s gate. Even at
5ms it already hits ~120s at 5000.

Verification: `bun run verify` (369 pass), full + local conformance
(239/239 each), and new `test/lazy_r2_restore.test.ts` covering
boot-serves-immediately, byte-identical hydrate-on-miss vs the eager path,
never-written 404, K-of-N row-count, and single-flight.

pdp-control-plane adopts this by swapping `--bootstrap-from-r2` for
`--lazy-restore` in its build-runner streams compute entry and bumping the
`@prisma/streams-server` dependency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kristof-siket kristof-siket marked this pull request as ready for review July 1, 2026 09:45
@kristof-siket

Copy link
Copy Markdown
Author

✅ End-to-end verification — real binary, real restart, durable store

Verified --lazy-restore against a real MinIO object store with a genuine cross-process restart of the actual bun run src/server.ts binary — no prod creds, no Cloudflare. Every assertion passed, and the boot-time win is decisive.

Boot time — eager vs lazy at scale

Identical backlog pre-seeded into MinIO (256 stream manifests / 668 sealed segments), each instance booted one-at-a-time on its own fresh empty rootDir (so local SQLite starts empty = a real redeploy). Wall-clock from process spawn to first /health 200 (tight 5 ms poll loop):

Mode Flag /health-ready
Eager --bootstrap-from-r2 4,324 ms — blocks restoring all 256 streams
Lazy --lazy-restore 119 ms — ~36× faster, skips restore

Eager scales linearly with the backlog; lazy stays flat. The lazy instance had 0 backlog streams in local SQLite at boot, yet a cold read of backlog/s0100 (never touched on that instance) hydrated it on demand and returned all 6 records — deferred, not lost.

Correctness — genuine cross-process restart

Seeded 6 per-build streams (build-logs/b1build-logs/b5, 15 records each) → confirmed uploaded to MinIO → killed the writer process → started a fresh instance (empty SQLite) with --lazy-restore against the same MinIO:

  • No eager restore. Lazy /health-ready in 160 ms; local SQLite held only the streams that were read (b1, b3) plus internal __stream_metrics__b2/b4/b5 sat un-restored in MinIO.
  • Cold read hydrates, byte-identical. GET /v1/stream/build-logs%2Fb1?offset=-1&format=json on the fresh instance returned all 15 records identical to the exact request bodies captured before the restart (deep + string compare, key order preserved). build-logs/b3 likewise (15 records, all buildId=b3).
  • On-demand. Each previously-absent stream appeared in local SQLite only after it was read.
  • Missing → 404. GET /v1/stream/build-logs%2Fnope?... → HTTP 404 {"error":{"code":"not_found","message":"not_found"}}.

Setup (reproducible)

MinIO as a durable S3 store — R2ObjectStore honors DURABLE_STREAMS_R2_ENDPOINT with standard SigV4 path-style addressing and worked with zero code changes:

docker run -d --name streams-minio -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minio -e MINIO_ROOT_PASSWORD=minio12345 \
  minio/minio server /data --console-address ":9001"
docker exec streams-minio mc mb local/streams-test

Server env (each instance on its own empty DS_ROOT + unique PORT):

DURABLE_STREAMS_R2_BUCKET=streams-test
DURABLE_STREAMS_R2_ACCESS_KEY_ID=minio
DURABLE_STREAMS_R2_SECRET_ACCESS_KEY=minio12345
DURABLE_STREAMS_R2_ENDPOINT=http://127.0.0.1:9000
DURABLE_STREAMS_R2_REGION=us-east-1
  • Writer (forces sealing + upload): DS_SEGMENT_TARGET_ROWS=5 DS_SEGMENT_MAX_INTERVAL_MS=200 DS_SEGMENT_CHECK_MS=100 DS_UPLOAD_CHECK_MS=100, argv --object-store r2 --no-auth.
  • Lazy reader: argv --object-store r2 --no-auth --lazy-restore.
  • Eager reader: argv --object-store r2 --no-auth --bootstrap-from-r2.

Subtleties handled

  • Forced sealing + upload — with prod-size segments a handful of records stays in the WAL and never reaches the object store (lazy would then correctly return no_logs). Lowered the segment thresholds and polled GET /v1/stream/<name>/_details until uploaded_segment_count == segment_count (and confirmed manifest.json objects in MinIO) before restarting.
  • _details shape — the counters (segment_count, uploaded_segment_count, sealed_through, uploaded_through) are nested under a top-level stream key, not the root.
  • _details is heavy — 250 concurrent _details calls (per-stream storage stats) reset a socket; the large-backlog confirmation was run at concurrency 8 with retries.
  • Authsrc/auth.ts requires exactly one of --no-auth / --auth-strategy api-key; used --no-auth.

Baseline

The in-process suite bun test test/lazy_r2_restore.test.ts also passes (4/4) — full createApp HTTP requests, fresh-SQLite over a populated MockR2, byte-identical vs the eager path, K-of-N hydration, and single-flight (12 concurrent reads → 1 manifest fetch).

Conclusion

Lazy restore behaves exactly as intended: instant boot regardless of backlog, correct on-demand hydration, no data loss, no eager scan. The eager path's boot cost grows linearly with the backlog (4.3 s at 256 streams and climbing); lazy is flat at ~0.1 s. 🟢

(No source was modified for this run; the streams repo was left clean on feat/lazy-r2-restore.)

🤖 Test run generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant