Skip to content

fix: skip configure-state-sync for S3 snapshot restores#62

Merged
bdchatham merged 1 commit intomainfrom
fix/skip-state-sync-for-s3-snapshots
Apr 6, 2026
Merged

fix: skip configure-state-sync for S3 snapshot restores#62
bdchatham merged 1 commit intomainfrom
fix/skip-state-sync-for-s3-snapshots

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

configure-state-sync was incorrectly inserted into init plans for S3 snapshot restores, causing nodes to ignore restored data and loop forever: "no snapshots discovered, sleeping."

Root Cause

S3 snapshot-restore writes a complete data directory (app.db, blockstore, state). CometBFT should start with statesync.enable=false and block-sync forward. But configure-state-sync runs after config-apply and overwrites with enable=true, causing seid to enter its state sync reactor and try to discover snapshot chunks from peers — which have already rotated.

Fix

Change both buildBasePlan and buildBootstrapProgression to only insert configure-state-sync when hasStateSync(snap) is true (StateSync source), not when any snapshot source exists.

Tests

  • Removed TaskConfigureStateSync from S3 snapshot plan progression assertions
  • Updated expected task count (6 → 5) for snapshot mode
  • Removed state-sync step from snapshot integration test
  • StateSync and Archive tests unchanged (they correctly use StateSync source)

Test plan

  • make test — all tests pass
  • StateSync plans still include configure-state-sync
  • S3 snapshot plans no longer include configure-state-sync

🤖 Generated with Claude Code

configure-state-sync enables CometBFT's state sync protocol, which
discovers and applies snapshot chunks from peers. This is only needed
for StateSync source bootstrapping, not S3 snapshot restores.

S3 snapshot-restore writes a complete data directory (app.db, blockstore,
state). CometBFT should start normally with statesync.enable=false and
block-sync forward. When configure-state-sync runs on an S3-restored
node, it overwrites this with enable=true, causing seid to ignore the
restored data and loop forever: "no snapshots discovered, sleeping."

Fix both buildBasePlan and buildBootstrapProgression to only insert
configure-state-sync when hasStateSync(snap) is true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit a4d141f into main Apr 6, 2026
2 checks passed
bdchatham added a commit that referenced this pull request Apr 10, 2026
Reverts the logic change from #62 that skipped configure-state-sync for
S3 snapshot sources. The original rationale assumed S3 snapshot-restore
writes a complete data directory, but the sidecar extracts ABCI snapshot
chunks to data/snapshots/. Without configure-state-sync (useLocalSnapshot:
true), CometBFT starts with statesync.enable=false, never applies the
snapshot, and panics at InitChain with an empty state DB.

Restores the original guard (`snap != nil`) so configure-state-sync runs
for both S3 and StateSync sources. Fixes the pacific-1-shadow-replayer
canonicalRpc port typo (2665 → 26657).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bdchatham added a commit that referenced this pull request Apr 10, 2026
Reverts the logic change from #62 that skipped configure-state-sync for
S3 snapshot sources. The original rationale assumed S3 snapshot-restore
writes a complete data directory, but the sidecar extracts ABCI snapshot
chunks to data/snapshots/. Without configure-state-sync (useLocalSnapshot:
true), CometBFT starts with statesync.enable=false, never applies the
snapshot, and panics at InitChain with an empty state DB.

Restores the original guard (`snap != nil`) so configure-state-sync runs
for both S3 and StateSync sources. Fixes the pacific-1-shadow-replayer
canonicalRpc port typo (2665 → 26657).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant