Skip to content

prune, state, snapshotsync: bound commitment history retention by block count#21021

Open
MysticRyuujin wants to merge 10 commits into
erigontech:mainfrom
MysticRyuujin:commitment-history-distance-blocks
Open

prune, state, snapshotsync: bound commitment history retention by block count#21021
MysticRyuujin wants to merge 10 commits into
erigontech:mainfrom
MysticRyuujin:commitment-history-distance-blocks

Conversation

@MysticRyuujin
Copy link
Copy Markdown
Contributor

@MysticRyuujin MysticRyuujin commented May 6, 2026

Summary

Adds --prune.commitment-history.distance.blocks=N for nodes running with --prune.include-commitment-history that want bounded snapshot disk usage. Composes with the existing bool: N must be ≤ --prune.distance (validated at startup) because eth_getProof needs the underlying state history kept by the regular prune mode.

Behavior

Flag combination Effect
--prune.include-commitment-history only Full archive (existing behavior, unchanged)
+ --prune.commitment-history.distance.blocks=N Snapshots covering blocks older than the retention boundary skipped at download and removed locally on next prune cycle
--prune.commitment-history.distance.blocks without the bool Fatal at startup
N > --prune.distance Fatal at startup

Runtime tightening between runs is allowed (extra files cleaned up next prune cycle); widening requires resync because filtered files no longer exist on disk.

Implementation

  • Download filter in db/snapshotsync/snapshotsync.go: pre-computes commitmentMinStep once before the loop and skips commitment-history segments that end before the retention boundary, while keeping boundary-overlapping segments.
  • Local prune in execution/stagedsync/stage_snapshots.go + db/state/aggregator.go: pruneCommitmentHistorySnapshots runs alongside block snapshot pruning, but uses the state aggregator's file lifecycle rather than block/receipt pruning. Commitment-history files are removed from dirtyFiles, marked canDelete, and physically removed only by the reader-owned cleanup path so live mmap readers are not invalidated. Frozen-file deletion is opted in only for commitment history via deleteFrozen.
  • Persistence in db/rawdb/accessors_chain.go + db/kv/tables.go: new CommitmentLayoutBlocksKey in kv.DatabaseInfo. Legacy datadirs without the key are tolerated.
  • Validation in node/cli/flags.go + node/eth/backend.go: startup check enforces commitment ≤ prune.distance; restart-mismatch check allows shrinks, rejects expands with an actionable error.

Tests

  • db/snapshotsync/snapshotsync_test.go: boundary coverage for commitment-history download filtering.
  • db/state/aggregator_test.go: downloader delete notification, idempotent no-op on repeat prune, immediate cleanup for unpinned dirty files, and deferred cleanup while frozen files have active readers.
  • db/rawdb/accessors_chain_test.go: round-trip + malformed-length for the new persistence helpers.
  • node/eth/backend_test.go: restart mismatch behavior for fresh / equal / shrink / expand cases.

Validation

End-to-end on Hoodi (~2.7M blocks):

  1. Phase 1 (--prune.distance=2.5M --prune.commitment-history.distance.blocks=2M): commitment.0-128 segment filtered at download (~32 GB saved) while accounts/code/storage/receipt 0-128 all downloaded — flag is the binding constraint for commitment only.
  2. Phase 2 (shrink to =500K): [snapshots] commitment history retention tightened warning fires at startup; on the next prune cycle, [snapshots] pruned commitment history files removed=4 deletes commitment.128-192 (32 GB) plus its inverted-index pair. Chain continues without disruption.
  3. RPC: eth_getProof succeeds at 9-36 ms for blocks inside the retention window and returns old data not available due to pruning: commitment start: <txnum> for blocks outside it. The reported commitment start value moves up after the shrink (50M → 75M tx-num), confirming the cleanup actually removed data.

Docs

Updated:

  • docs/gitbook/src/fundamentals/configuring-erigon/README.md — flag table and verbatim help-text dump.
  • docs/gitbook/src/interacting-with-erigon/eth.md — paragraph in the eth_getProof section explaining the bound and the ≤ --prune.distance invariant.

Test plan

  • make lint
  • go test ./db/state ./db/snapshotsync -count=1
  • End-to-end Hoodi sync + retention shrink + RPC validation

…ck count

Adds --prune.commitment-history.distance.blocks=N for nodes running with
--prune.include-commitment-history that want bounded snapshot disk usage.
Composes with the existing bool: must be ≤ --prune.distance because
eth_getProof needs the underlying state history kept by the regular
prune mode.

Implementation mirrors how transactions snapshots are bounded: a download-
time filter that skips preverified segments older than the retention
boundary, plus a local prune pass that removes files outside the window
after each retire cycle. Runtime tightening is allowed (extra files are
removed on the next prune cycle); widening requires resync because the
filtered files no longer exist on disk.

The retention value is persisted in kv.DatabaseInfo alongside the existing
commitment-history bool. Mismatched config on restart errors out the same
way the bool does.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new commitment-history retention option (bounded by block count) for nodes running with commitment-history enabled, aiming to cap snapshot disk usage while preserving the eth_getProof dependency on state-history retention.

Changes:

  • Introduces --prune.commitment-history.distance.blocks and persists its value in the datadir for restart consistency checks.
  • Filters commitment-history snapshot downloads and adds a local prune path that deletes expired commitment-history snapshot files.
  • Adds unit tests for filename expiration logic, DB persistence helpers, and backend startup validation.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
node/ethconfig/features/sync_features.go Reads persisted commitment-history block-retention value into sync config.
node/ethconfig/config.go Adds block-retention field and helper for comparing effective retention windows.
node/eth/backend.go Persists retention blocks in DB and enforces “shrink-only” on restart (widen requires resync).
node/eth/backend_test.go Adds test coverage for restart mismatch behavior (fresh/equal/shrink/expand).
node/cli/flags.go Validates commitment-history retention does not exceed regular prune distance.
node/cli/default_flags.go Registers the new CLI flag in default flag set.
cmd/utils/flags.go Defines the new flag and enforces it requires commitment-history to be enabled.
db/kv/tables.go Adds a new DatabaseInfo key for persisting commitment-history block-retention.
db/rawdb/accessors_chain.go Adds rawdb helpers to read/write the persisted block-retention value.
db/rawdb/accessors_chain_test.go Adds tests for the new rawdb read/write helpers (round-trip + malformed).
db/snapshotsync/snapshotsync.go Adds download-side filtering to skip expired commitment-history segments.
db/snapshotsync/snapshotsync_test.go Adds unit tests for commitment-history segment expiration logic.
execution/stagedsync/stage_snapshots.go Adds a prune step that removes commitment-history snapshot files below the retention boundary.
db/state/aggregator.go Implements file selection + deletion for expired commitment-history history/idx snapshot files.
docs/gitbook/src/interacting-with-erigon/eth.md Documents bounded commitment-history retention and the invariant with --prune.distance.
docs/gitbook/src/fundamentals/configuring-erigon/README.md Updates flag documentation and CLI help output with the new option.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/state/aggregator.go
Comment thread db/state/aggregator.go Outdated
Comment thread db/state/aggregator.go Outdated
- aggregator: notify the downloader (a.onFilesDelete) with relative paths
  before unlinking, matching the cleanAfterMerge ordering so the downloader
  cannot recreate files after local removal. Treat ENOENT from RemoveFile
  as success (deleteMergeFile may have already removed non-frozen files)
  and count only successful removals.
- snapshotsync: keep commitment-history segments that overlap the retention
  boundary instead of skipping them. Aligns the download filter with the
  local prune (endTxNum <= threshold) and ensures retention covers the
  full requested window.
- node/cli/flags: only enforce commit <= prune.distance when the bounded
  flag is actually set; otherwise existing --prune.include-commitment-history
  setups regress at startup. Extract validateCommitmentHistoryRetention
  for direct testing.

Adds focused tests for the downloader callback, the boundary-overlap
download behavior, and the CLI validation matrix.
@MysticRyuujin
Copy link
Copy Markdown
Contributor Author

I think I addressed the copilot review findings. I did a PR review with opus 4.7 and GPT 5.5, they both said it looked good.

Comment thread db/state/aggregator.go Outdated
Comment thread db/state/aggregator.go Outdated
Comment thread execution/stagedsync/stage_snapshots.go
Mark commitment-history files for deletion through the existing reader-owned cleanup path so retention cannot unlink mapped files while readers are alive.
@MysticRyuujin
Copy link
Copy Markdown
Contributor Author

Updated in 333d3fd35c to address the review comments:

  • Removed direct file unlinking from commitment-history retention. Files are now removed from dirtyFiles, marked canDelete, and physically closed/unlinked only through the reader-owned cleanup path, so live mmap readers are respected.
  • Avoided the merge cleanup helper as requested; retention now uses a dedicated marker path rather than deleteMergeFile/merge-specific methods.
  • Kept the download-time filter for expired commitment-history segments, so unnecessary files are skipped before download; local prune remains only the cleanup path for already-present files.

I did not reuse block/receipt pruning directly because those use the block snapshot subsystem, while commitment history is managed by db/state aggregator files. The dirty_files.go change is scoped to commitment history only: regular frozen state files keep the old behavior, while commitment-history retention opts frozen history files into refcounted, last-reader deletion via a separate deleteFrozen flag.

Validation: go test ./db/state ./db/snapshotsync -count=1 and make lint.

@AskAlexSharov AskAlexSharov self-assigned this May 11, 2026
…stance-blocks

# Conflicts:
#	docs/gitbook/src/fundamentals/configuring-erigon/README.md
…/site location

The gitbook docs where this flag was originally documented were removed
on main. Port the same one-liner to docs/site/.
…stance-blocks

# Conflicts:
#	db/state/dirty_files.go
#	db/state/history.go
#	db/state/inverted_index.go
@JkLondon
Copy link
Copy Markdown
Member

working on it here #21199 and here #21198

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants