Skip to content

split blocksync reactor into 2 modes.#3511

Merged
pompon0 merged 17 commits into
mainfrom
gprusak-blocksync
May 29, 2026
Merged

split blocksync reactor into 2 modes.#3511
pompon0 merged 17 commits into
mainfrom
gprusak-blocksync

Conversation

@pompon0
Copy link
Copy Markdown
Contributor

@pompon0 pompon0 commented May 27, 2026

Autobahn nodes will need to be able to send pre-giga blocks to pre-giga nodes during transition period (so that pre-giga nodes can blocksync to the upgrade height). However all the remaining parts of the blocksync reactor should be disabled. This pr extracts the tendermint-only blocksync logic to syncController which is optional part of the blocksync reactor. Additionally I have

  • refactored the blocksync reactor to use structured concurrency
  • fixed bug in poolRoutine which was silently terminating blocksync in case of block validation failure (now it will retry fetching a block)
  • fixed busy loop in blocksync pool.run.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 27, 2026

PR Summary

High Risk
Large refactor of blocksync startup, P2P message routing, pool concurrency, and consensus handoff—core node sync path with behavior changes (monotone catch-up, validation retry).

Overview
Blocksync is split into an always-on query path and an optional active sync path. Reactor keeps the single blocksync P2P channel and serves BlockRequest / StatusRequest from local store even when catch-up is off; optional SyncerConfig wires a syncController that owns the pool, outbound requests, block apply, consensus handoff, and lag metrics. NewReactor no longer takes block executor / consensus reactor directly—those move into SyncerConfig (utils.Option).

Pool and concurrency are reworked. BlockPool drops BaseService, owns internal request/error channels, and runs via pool.run with scope + per-height bpRequester tasks (utils.Watch / Option). Caught-up uses a monotone max peer height so retracted peer heights do not falsely mark the node synced. poolRoutine no longer exits on validation failure—it evicts bad peers and continues (new tests).

Wiring and RPC: Node construction passes SyncerConfig; SwitchToBlockSync no longer takes context. RPC status reads state-sync metrics from StateSyncReactor instead of a Metricer interface (mock removed). Mempool tests use utils.TestRng / GenBytes for deterministic concurrency tests.

Reviewed by Cursor Bugbot for commit 4f62265. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 29, 2026, 10:12 AM

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 72.72727% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.22%. Comparing base (3936ac9) to head (4f62265).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/internal/blocksync/reactor.go 64.41% 71 Missing and 8 partials ⚠️
sei-tendermint/internal/blocksync/pool.go 92.72% 6 Missing and 2 partials ⚠️
sei-tendermint/internal/rpc/core/status.go 0.00% 8 Missing ⚠️
sei-tendermint/node/node.go 90.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3511      +/-   ##
==========================================
- Coverage   59.04%   58.22%   -0.82%     
==========================================
  Files        2199     2129      -70     
  Lines      182096   173895    -8201     
==========================================
- Hits       107513   101252    -6261     
+ Misses      64933    63659    -1274     
+ Partials     9650     8984     -666     
Flag Coverage Δ
sei-chain-pr 63.22% <72.72%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/rpc/core/env.go 76.15% <ø> (ø)
sei-tendermint/internal/statesync/reactor.go 71.72% <100.00%> (+1.26%) ⬆️
sei-tendermint/libs/utils/option.go 90.90% <100.00%> (ø)
sei-tendermint/node/node.go 65.37% <90.00%> (+0.17%) ⬆️
sei-tendermint/internal/blocksync/pool.go 89.53% <92.72%> (+5.00%) ⬆️
sei-tendermint/internal/rpc/core/status.go 73.62% <0.00%> (ø)
sei-tendermint/internal/blocksync/reactor.go 65.10% <64.41%> (+1.01%) ⬆️

... and 72 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread sei-tendermint/internal/blocksync/reactor.go
Comment thread sei-tendermint/internal/blocksync/reactor.go Outdated
Comment thread sei-tendermint/internal/blocksync/pool.go Outdated
Comment thread sei-tendermint/internal/blocksync/reactor.go
Comment thread sei-tendermint/internal/blocksync/pool.go
return err
switch update.Status {
case p2p.PeerStatusUp:
s.channel.Send(wrap(&pb.StatusRequest{}), update.NodeID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peer-up handler sends StatusRequest instead of StatusResponse

Medium Severity

On PeerStatusUp, the old code sent a StatusResponse (advertising local height to the new peer), enabling the remote node to immediately learn this node's height. The new code sends a StatusRequest instead, which asks the peer for their height. While this node still learns about the peer, the peer no longer receives an immediate height advertisement. Peers that rely on receiving a StatusResponse on connection (e.g., pre-giga nodes during transition) won't learn this node's height until the next periodic StatusRequest broadcast (every 10 seconds).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3206044. Configure here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be a minor temporary regression until upgrade completes.

Comment thread sei-tendermint/internal/blocksync/doc.go Outdated
Comment thread sei-tendermint/internal/blocksync/reactor.go Outdated
"height", first.Height,
"err", err)
return
return consensusHandoff{}, fmt.Errorf("first.MakePartSet(%d): %w", first.Height, err)
Copy link
Copy Markdown
Contributor

@wen-coding wen-coding May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to log error and stop blocksync, now we return an error, which may propagate back to the caller, will this cause a panic?
Is that intended behavior?

Copy link
Copy Markdown
Contributor Author

@pompon0 pompon0 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silently stopping blocksync on error here will just make the node halt, so panic is better. Afaict MakePartSet can fail only due to serialization error here. Since the blocks were already deserialized to get to this point, serialization is expected to always succeed.

Comment thread sei-tendermint/internal/blocksync/reactor.go Outdated
errorsCh: make(chan peerError, maxPeerErrBuffer), // NOTE: capacity should exceed peer count.
lastSyncRate: 0,
router: router,
reportErr: reportErr,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can restructure AddBlock (see below), then maybe the test can read the error channel instead and we don't need this reportErr argument?

// height of the extended commit and the height of the block do not match, we
// do not add the block and return an error.
// TODO: ensure that blocks come in order for each peer.
func (pool *BlockPool) AddBlock(peerID types.NodeID, block *types.Block, blockSize int) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we do:

func (pool *BlockPool) AddBlock(...) error {
    pendingErr, pendingPeerID, returnErr := pool.addBlockLocked(...)
    if pendingErr != nil {
        pool.sendError(pendingErr, pendingPeerID)   // statically outside the lock
    }
    return returnErr
}

func (pool *BlockPool) addBlockLocked(...) (pendingErr error, pendingPeerID types.NodeID, returnErr error) {
    pool.mtx.Lock()
    defer pool.mtx.Unlock()
    // ... logic ...
}

Then maybe we don't need the test which requires reportErr injection?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of injection was to test that sendError is not performed under lock, which requires being able to pause the goroutine while inside sendError. This is a preexisting test. Alternatively I can remove the test, I suppose. Waiting for goroutines to block on channel is a fragile logic to have.

Comment thread sei-tendermint/internal/blocksync/pool.go
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5891958. Configure here.

Comment thread sei-tendermint/internal/blocksync/pool.go
@pompon0 pompon0 requested a review from wen-coding May 29, 2026 10:11
@pompon0 pompon0 added this pull request to the merge queue May 29, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026
@pompon0 pompon0 added this pull request to the merge queue May 29, 2026
Merged via the queue into main with commit c4e1a2a May 29, 2026
54 checks passed
@pompon0 pompon0 deleted the gprusak-blocksync branch May 29, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants