[r3.4] cl/caplin: fix blob and data column pruning (never ran, wrong range, configurable keep window) by lystopad · Pull Request #20380 · erigontech/erigon

lystopad · 2026-04-07T09:33:55Z

Cherry-pick of #20379 to release/3.4.

Summary

Three bugs caused caplin blob and PeerDAS data column storage to grow unboundedly, filling disks on long-running nodes (observed: 1.6TB in caplin/ on a Hoodi node).

Bug 1: `CleanupAndPruning` stage was never reached

ForkChoice transitioned directly to SleepForSlot, making CleanupAndPruning a dead stage. Pruning never ran on any node, ever.

Bug 2: Prune loop only covered a narrow window

Even if pruning had run, both BlobStore.Prune() and dataColumnStorageImpl.Prune() started the delete loop from currentSlot - minSlotsForBlobSidecarRequest instead of 0. Data older than that window was never deleted.

Bug 3: Column keep window was 1M slots (~138 days) — too large

The hardcoded pruneDistance = 1_000_000 for data columns meant that on networks where PeerDAS activated recently (e.g. Hoodi ~50 days ago), all column data fell within the keep window and nothing was pruned even after fixing bugs 1 and 2.

Test plan

Deploy on Hoodi node and verify du -sh /erigon-data/caplin/* decreases after reaching head
Verify ls /erigon-data/caplin/blobs/ | sort -n | head -5 shows only recent subdirs after prune
Verify --caplin.columns-keep-slots flag is accepted and overrides the default
go test ./cl/persistence/blob_storage/... passes

Co-Authored-By: Claude

The Prune() functions for both BlobStore and dataColumnStorageImpl had a bug where the loop started from (cutoff - minSlotsForBlobSidecarRequest) instead of 0. This meant only a narrow ~131K-slot window just before the pruning cutoff was ever iterated, leaving all data older than that window on disk indefinitely. On a long-running Hoodi node this caused 444GB of blob accumulation and 1.1TB of PeerDAS data column accumulation (1.6TB total in caplin/). Fix: start the delete loop from slot 0. Also add an underflow guard for the case where currentSlot < keepDistance (e.g. very early in sync). Co-Authored-By: Claude

Replace hardcoded 1_000_000 slot column keep distance with a configurable flag --caplin.columns-keep-slots (default: 131072 = MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS * SLOTS_PER_EPOCH = 4096 * 32, ~18 days). The previous default of 1M slots (~138 days) caused unbounded disk growth on nodes where PeerDAS activated recently: all column data fell within the keep window so nothing was ever pruned. Operators running DA oracle or rollup nodes that need longer column history can increase the value via the flag. Co-Authored-By: Claude

ForkChoice was transitioning directly to SleepForSlot, leaving CleanupAndPruning as a dead stage that was never executed. Blob and data column pruning therefore never ran on any node. Fix ForkChoice to transition to CleanupAndPruning, which then transitions to SleepForSlot, matching the intended graph: ForkChoice -> CleanupAndPruning -> SleepForSlot Co-Authored-By: Claude

…value - Derive the column keep distance from beaconCfg when ColumnKeepSlots is 0 (i.e. not set via CLI), using MinEpochsForDataColumnSidecarsRequests * SlotsPerEpoch. This gives the correct spec minimum per chain: mainnet/Hoodi/Sepolia = 131072 slots (~18 days), Gnosis/Chiado = 65536 slots (~3.8 days). - Guards against ColumnKeepSlots being zero (struct zero-value when config is constructed without CLI), which would otherwise delete all column data. Co-Authored-By: Claude

…tion) (#20729) ## Summary Documents the `--caplin.columns-keep-slots` flag introduced in #20380. - Adds a new **PeerDAS Data Column Retention** subsection to `caplin.md` - Flag: `--caplin.columns-keep-slots` (default: 131072, ~18 days) - Explains use case for DA oracle / rollup nodes needing longer column history ## Test plan - [ ] Verify flag appears in `erigon --help` on `release/3.4` - [ ] Verify default value matches source: `MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS * SLOTS_PER_EPOCH = 131072` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: bloxster <gianni.morselli@erigon.tech>

lystopad added 3 commits April 7, 2026 11:30

lystopad requested review from AskAlexSharov, Giulio2002 and domiwei as code owners April 7, 2026 09:33

lystopad requested a review from yperbasis April 7, 2026 09:34

lystopad self-assigned this Apr 7, 2026

AskAlexSharov approved these changes Apr 8, 2026

View reviewed changes

AskAlexSharov merged commit 70013ce into release/3.4 Apr 8, 2026
22 checks passed

AskAlexSharov deleted the feature/lystopad/fix-blob-column-pruning-r34 branch April 8, 2026 08:08

bloxster mentioned this pull request Apr 22, 2026

docs: document --caplin.columns-keep-slots flag (PeerDAS column retention) #20729

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[r3.4] cl/caplin: fix blob and data column pruning (never ran, wrong range, configurable keep window)#20380

[r3.4] cl/caplin: fix blob and data column pruning (never ran, wrong range, configurable keep window)#20380
AskAlexSharov merged 4 commits into
release/3.4from
feature/lystopad/fix-blob-column-pruning-r34

lystopad commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lystopad commented Apr 7, 2026

Summary

Bug 1: CleanupAndPruning stage was never reached

Bug 2: Prune loop only covered a narrow window

Bug 3: Column keep window was 1M slots (~138 days) — too large

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug 1: `CleanupAndPruning` stage was never reached