Skip to content

fix(node): async chain.db save off BFT critical path (Fix A)#565

Merged
github-actions[bot] merged 2 commits into
mainfrom
feat/async-chain-save
May 10, 2026
Merged

fix(node): async chain.db save off BFT critical path (Fix A)#565
github-actions[bot] merged 2 commits into
mainfrom
feat/async-chain-save

Conversation

@satyakwok
Copy link
Copy Markdown
Member

Summary

`save_blockchain` previously fsync-blocked the validator loop inside
the `BftAction::FinalizeBlock` arm. On mainnet that was ~500 ms-1 s
per block on a 5 GB chain.db, sitting in front of the next round's
propose call.

Move it to a single tokio writer task drained from a bounded mpsc
channel. The writer briefly takes the read lock to serialise state,
releases it, commits MDBX. Multiple queued heights coalesce into one
snapshot.

Why

Per-block timing on mainnet (BFT phases ~200 ms, save_blockchain
~500 ms-1 s, propose-build ~100 ms) — the synchronous disk save was
the single biggest contributor to mainnet WAN bt of ~2.7 s/blk under
PR #564 transport. Pipelining the save off the critical path lets the
next round start immediately after the in-memory commit.

Crash safety

The B2 load-replay path (PR #556) already covers a crash between the
in-memory commit and a queued save: on restart, `load_blockchain`
detects `disk_height > blob_height` and replays the missing blocks
via `add_block_from_peer`. Fix A relies on that mechanism for
unsaved-on-shutdown safety.

The shutdown signal handler still does one final synchronous
`save_blockchain` after the validator task exits — explicit safety net.

`save_block` (block bytes only, on the peer-propose path) stays
synchronous; the small write is kept on the critical path so we don't
broadcast a block whose bytes never reached disk.

Deploy results — 2026-05-10 ~00:00 WIB

Binary sha `9345dd807b10254a1dd678ce707f0e2f69d8a79ab3cc558ef4f9e8957a093edc`
built in `rust:1.95-bullseye`, deployed across all 11 nodes via
halt-all + simul-start.

60-second rolling window after stabilisation:

Network Pre-Fix A Post-Fix A Δ
Testnet 0.7 s/blk 0.72 s/blk ~same (already in target)
Mainnet 2.73 s/blk 1.67 s/blk -39 %

Test plan

  • Build in `rust:1.95-bullseye` (GLIBC ≤ 2.30).
  • Halt-all + simul-start all 11 nodes (testnet 5 + mainnet 6).
  • Verify both networks advancing.
  • Measure 60s rolling bt on mainnet — confirm improvement.
  • Confirm `save queue full` warnings stay near zero.
  • Bake ≥1 h; confirm no chain.db replay anomalies on restart.

satyakwok added 2 commits May 10, 2026 23:07
Per-peer libp2p request-response is single-attempt-delivery: when a
prevote or precommit lost a hop in the WAN mesh, the only retry was
our manual 0.5s × 6 tick from the validator loop. With validators
~10-50ms apart across hosts each phase paid that latency × peer-count,
so mainnet bt sat at ~3.6 s/blk while testnet (localhost) was 0.46 s.

Switches the four BFT message classes to gossipsub mesh topics:

  sentrix/bft/proposal/1
  sentrix/bft/prevote/1
  sentrix/bft/precommit/1
  sentrix/bft/round-status/1

Mesh fan-out + IHAVE/IWANT lazy-push handle retransmission natively;
the validator-side vote rebroadcast tick is gone (proposal rebroadcast
stays — it replays the saved signed proposal verbatim, separate from
the missed-vote retry pattern). SwarmCommand::Broadcast variant is
dead and removed.

Wire types: new GossipBft{Proposal,Prevote,Precommit,RoundStatus}
envelopes alongside the existing GossipBlock / GossipTransaction.
SENTRIX_PROTOCOL bumped 2.0.0 → 2.1.0 so old peers (RR-only BFT) can't
silently interop with new peers (gossipsub-only BFT).

Deploy: halt-all + simul-start. Mid-deploy a validator on the old
binary would gossip nothing the new binary subscribes to, and vice
versa. Same procedure as the 2026-05-10 evening swap to v2.1.91 +
watchdog removal.

Inbound boundary checks (verify_sig + is_active_bft_signer) mirror
the existing RR path so byzantine peers can't push forged votes into
the engine via either transport. RR BFT request handlers stay in
place as a defensive no-op for any peer still attempting the old
path; the protocol negotiation will reject them anyway.
save_blockchain previously fsync-blocked the validator loop inside the
BftAction::FinalizeBlock arm — on mainnet that was 500 ms-1 s per block
on a 5 GB chain.db, sitting in front of the next round's propose call.
Move it to a single tokio writer task drained from an mpsc channel.

- Writer takes a brief read lock to serialise the state blob, releases
  it, commits MDBX. Multiple queued heights coalesce into one snapshot
  since save_blockchain always writes the latest state.
- B2 load-replay (PR #556) already covers the crash window between the
  in-memory commit and the queued disk save.
- Shutdown still does one final sync save_blockchain after the validator
  task exits — safety net.
- save_block (block bytes only) on the peer-propose path stays sync;
  it's small and we don't broadcast unless the block bytes hit disk.
@github-actions github-actions Bot enabled auto-merge (squash) May 10, 2026 22:00
@codecov
Copy link
Copy Markdown

codecov Bot commented May 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions github-actions Bot merged commit afdc47f into main May 10, 2026
8 checks passed
github-actions Bot pushed a commit that referenced this pull request May 11, 2026
Captures the post-#564 stack as a distinct version label:
- #564 BFT votes over gossipsub (wire SENTRIX_PROTOCOL 2.0.0 -> 2.1.0)
- #565 Fix A: async chain.db save off BFT critical path
- #566 Fix C: speculative pre-build of next proposal
- #567 self-describe chain_name from genesis + load-fixup
- #568 remove inbound-silence watchdog

Multiple distinct binaries shipped under 2.1.91 today during the
mainnet stall recovery cycle. Bumping so the next build maps 1:1 to
a single sha + version label.

Going forward: every chain-touching PR bumps in the same commit set
(see operator memory feedback_bump_version_per_fix.md).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant