Out-of-core STARK proving: a trace larger than RAM/VRAM, bit-identical & cgroup-enforced. Relevant to SP1? #2843

nzengi · 2026-06-09T18:34:09Z

nzengi
Jun 9, 2026

I built a research prover that overlaps a lot with what SP1 and Plonky3 are doing, and I wanted to ask whether the problem it goes after is one you actually care about or a dead end.

Short version: it's a DEEP-FRI STARK over Goldilocks, except every stage that would normally hold the whole trace in RAM streams to disk instead. So peak memory stops depending on
how big the trace is.

How it's built

A few mechanics, since this crowd will want them:

NTT/LDE. Four-step (Bailey) split, N = n1 * n2: NTT the rows, apply twiddles, NTT the columns, transpose. Each pass only touches one row or column, so the working set is
O(sqrt N). The matrix sits on NVMe and tiles move through a fixed buffer.
Merkle commitment. Built bottom-up straight off a file of leaves, one chunk per pread, so the full tree never lives in memory. Columns commit in batched groups (a leaf is the
Poseidon hash of a group's columns at one row).
FRI. Each fold reads the previous round's file and writes the next. I commit every other layer (arity-4) and let the verifier recompute the skipped layer from the committed
endpoints.
Page cache. This one bit me. Under a hard cgroup with swap off, dirty pages count against you and get you OOM-killed even with a flat heap. A flusher that writes back and drops
every ~64 MiB keeps it bounded.

Why I trust the numbers

Every memory number is measured under a real systemd cgroup (MemoryMax=256M, MemorySwapMax=0). With swap off the limit is a hard wall, so when I say the in-core prover dies
and the streaming one lives, that's something I watched, not a model.

Soundness gets poked at too: flip any field in a proof, tamper with a lookup table, hand it wrong LogUp multiplicities, or feed an invalid trace, and it rejects.

Numbers (256 MB cgroup, swap off)

2^28 NTT, 2 GB of data: in-core OOM-killed, out-of-core runs in 40 MB and gives the identical checksum.
Full FRI low-degree proof at 2^26 / 2^27 / 2^28 (512 MB up to 2 GB): in-core OOMs at every size, out-of-core stays flat at 34 MB across the whole 8x range.
Full Fibonacci STARK at 2^22, roughly 7 GB if kept in core: OOM in core, 45 MB and 61 s out of core.
A sponge hash proving Sponge(public msg) = digest, 54 columns, 2^18 blocks: OOM in core, 121 MB and 15.5 s out of core.

Resident set tracks the tile size, not N. The CPU side is already parallel (a 2^24 self-proof goes 65.7 s to 11.2 s on 16 cores). The thing eating the time is Poseidon hashing in
the commitment, not the streaming I/O.

The part I want your read on

Same Goldilocks/FRI/STARK stack SP1 runs on, so you'd know better than me where this lands. I don't think the interesting axis is cheap CPU RAM, it's VRAM. A GPU prover can't hold a
trace bigger than its VRAM, and that caps how big a proof a given card can do. Does a streaming layer that spills past VRAM actually buy anything (bigger traces per card, or a lower
hardware bar for people running provers in the network), or is it moot because you'd shard the trace or just rent a bigger card?

What this isn't

Straight about the gaps: it's a research artifact. The AIRs are small, though they go up to a real sponge, and the Poseidon is a research one, not a standardized constant set. It is
not wired into SP1 or Plonky3. And since the bottleneck is Poseidon, the obvious next step is GPU hashing, which is the one piece I haven't had the hardware to test.

Full writeup with diagrams and a screen recording of the cgroup run (in-core dies, out-of-core verifies in tens of MB):
https://dev.to/nzengi/a-zk-provers-ram-is-a-dial-not-a-wall-3blg

So is out-of-core or VRAM-bounded proving something you'd find useful, and if so where would it hurt first? Fine to tell me it's irrelevant, that's useful to know too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Out-of-core STARK proving: a trace larger than RAM/VRAM, bit-identical & cgroup-enforced. Relevant to SP1? #2843

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Out-of-core STARK proving: a trace larger than RAM/VRAM, bit-identical & cgroup-enforced. Relevant to SP1? #2843

Uh oh!

nzengi Jun 9, 2026

How it's built

Why I trust the numbers

Numbers (256 MB cgroup, swap off)

The part I want your read on

What this isn't

Replies: 0 comments

nzengi
Jun 9, 2026