fix(node): async chain.db save off BFT critical path (Fix A)#565
Merged
Conversation
Per-peer libp2p request-response is single-attempt-delivery: when a
prevote or precommit lost a hop in the WAN mesh, the only retry was
our manual 0.5s × 6 tick from the validator loop. With validators
~10-50ms apart across hosts each phase paid that latency × peer-count,
so mainnet bt sat at ~3.6 s/blk while testnet (localhost) was 0.46 s.
Switches the four BFT message classes to gossipsub mesh topics:
sentrix/bft/proposal/1
sentrix/bft/prevote/1
sentrix/bft/precommit/1
sentrix/bft/round-status/1
Mesh fan-out + IHAVE/IWANT lazy-push handle retransmission natively;
the validator-side vote rebroadcast tick is gone (proposal rebroadcast
stays — it replays the saved signed proposal verbatim, separate from
the missed-vote retry pattern). SwarmCommand::Broadcast variant is
dead and removed.
Wire types: new GossipBft{Proposal,Prevote,Precommit,RoundStatus}
envelopes alongside the existing GossipBlock / GossipTransaction.
SENTRIX_PROTOCOL bumped 2.0.0 → 2.1.0 so old peers (RR-only BFT) can't
silently interop with new peers (gossipsub-only BFT).
Deploy: halt-all + simul-start. Mid-deploy a validator on the old
binary would gossip nothing the new binary subscribes to, and vice
versa. Same procedure as the 2026-05-10 evening swap to v2.1.91 +
watchdog removal.
Inbound boundary checks (verify_sig + is_active_bft_signer) mirror
the existing RR path so byzantine peers can't push forged votes into
the engine via either transport. RR BFT request handlers stay in
place as a defensive no-op for any peer still attempting the old
path; the protocol negotiation will reject them anyway.
save_blockchain previously fsync-blocked the validator loop inside the BftAction::FinalizeBlock arm — on mainnet that was 500 ms-1 s per block on a 5 GB chain.db, sitting in front of the next round's propose call. Move it to a single tokio writer task drained from an mpsc channel. - Writer takes a brief read lock to serialise the state blob, releases it, commits MDBX. Multiple queued heights coalesce into one snapshot since save_blockchain always writes the latest state. - B2 load-replay (PR #556) already covers the crash window between the in-memory commit and the queued disk save. - Shutdown still does one final sync save_blockchain after the validator task exits — safety net. - save_block (block bytes only) on the peer-propose path stays sync; it's small and we don't broadcast unless the block bytes hit disk.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
5 tasks
github-actions Bot
pushed a commit
that referenced
this pull request
May 11, 2026
Captures the post-#564 stack as a distinct version label: - #564 BFT votes over gossipsub (wire SENTRIX_PROTOCOL 2.0.0 -> 2.1.0) - #565 Fix A: async chain.db save off BFT critical path - #566 Fix C: speculative pre-build of next proposal - #567 self-describe chain_name from genesis + load-fixup - #568 remove inbound-silence watchdog Multiple distinct binaries shipped under 2.1.91 today during the mainnet stall recovery cycle. Bumping so the next build maps 1:1 to a single sha + version label. Going forward: every chain-touching PR bumps in the same commit set (see operator memory feedback_bump_version_per_fix.md).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`save_blockchain` previously fsync-blocked the validator loop inside
the `BftAction::FinalizeBlock` arm. On mainnet that was ~500 ms-1 s
per block on a 5 GB chain.db, sitting in front of the next round's
propose call.
Move it to a single tokio writer task drained from a bounded mpsc
channel. The writer briefly takes the read lock to serialise state,
releases it, commits MDBX. Multiple queued heights coalesce into one
snapshot.
Why
Per-block timing on mainnet (BFT phases ~200 ms, save_blockchain
~500 ms-1 s, propose-build ~100 ms) — the synchronous disk save was
the single biggest contributor to mainnet WAN bt of ~2.7 s/blk under
PR #564 transport. Pipelining the save off the critical path lets the
next round start immediately after the in-memory commit.
Crash safety
The B2 load-replay path (PR #556) already covers a crash between the
in-memory commit and a queued save: on restart, `load_blockchain`
detects `disk_height > blob_height` and replays the missing blocks
via `add_block_from_peer`. Fix A relies on that mechanism for
unsaved-on-shutdown safety.
The shutdown signal handler still does one final synchronous
`save_blockchain` after the validator task exits — explicit safety net.
`save_block` (block bytes only, on the peer-propose path) stays
synchronous; the small write is kept on the critical path so we don't
broadcast a block whose bytes never reached disk.
Deploy results — 2026-05-10 ~00:00 WIB
Binary sha `9345dd807b10254a1dd678ce707f0e2f69d8a79ab3cc558ef4f9e8957a093edc`
built in `rust:1.95-bullseye`, deployed across all 11 nodes via
halt-all + simul-start.
60-second rolling window after stabilisation:
Test plan