split blocksync reactor into 2 modes.#3511
Conversation
PR SummaryHigh Risk Overview Pool and concurrency are reworked. Wiring and RPC: Node construction passes Reviewed by Cursor Bugbot for commit 4f62265. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3511 +/- ##
==========================================
- Coverage 59.04% 58.22% -0.82%
==========================================
Files 2199 2129 -70
Lines 182096 173895 -8201
==========================================
- Hits 107513 101252 -6261
+ Misses 64933 63659 -1274
+ Partials 9650 8984 -666
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| return err | ||
| switch update.Status { | ||
| case p2p.PeerStatusUp: | ||
| s.channel.Send(wrap(&pb.StatusRequest{}), update.NodeID) |
There was a problem hiding this comment.
Peer-up handler sends StatusRequest instead of StatusResponse
Medium Severity
On PeerStatusUp, the old code sent a StatusResponse (advertising local height to the new peer), enabling the remote node to immediately learn this node's height. The new code sends a StatusRequest instead, which asks the peer for their height. While this node still learns about the peer, the peer no longer receives an immediate height advertisement. Peers that rely on receiving a StatusResponse on connection (e.g., pre-giga nodes during transition) won't learn this node's height until the next periodic StatusRequest broadcast (every 10 seconds).
Reviewed by Cursor Bugbot for commit 3206044. Configure here.
There was a problem hiding this comment.
this will be a minor temporary regression until upgrade completes.
| "height", first.Height, | ||
| "err", err) | ||
| return | ||
| return consensusHandoff{}, fmt.Errorf("first.MakePartSet(%d): %w", first.Height, err) |
There was a problem hiding this comment.
We used to log error and stop blocksync, now we return an error, which may propagate back to the caller, will this cause a panic?
Is that intended behavior?
There was a problem hiding this comment.
silently stopping blocksync on error here will just make the node halt, so panic is better. Afaict MakePartSet can fail only due to serialization error here. Since the blocks were already deserialized to get to this point, serialization is expected to always succeed.
| errorsCh: make(chan peerError, maxPeerErrBuffer), // NOTE: capacity should exceed peer count. | ||
| lastSyncRate: 0, | ||
| router: router, | ||
| reportErr: reportErr, |
There was a problem hiding this comment.
If we can restructure AddBlock (see below), then maybe the test can read the error channel instead and we don't need this reportErr argument?
| // height of the extended commit and the height of the block do not match, we | ||
| // do not add the block and return an error. | ||
| // TODO: ensure that blocks come in order for each peer. | ||
| func (pool *BlockPool) AddBlock(peerID types.NodeID, block *types.Block, blockSize int) error { |
There was a problem hiding this comment.
How about we do:
func (pool *BlockPool) AddBlock(...) error {
pendingErr, pendingPeerID, returnErr := pool.addBlockLocked(...)
if pendingErr != nil {
pool.sendError(pendingErr, pendingPeerID) // statically outside the lock
}
return returnErr
}
func (pool *BlockPool) addBlockLocked(...) (pendingErr error, pendingPeerID types.NodeID, returnErr error) {
pool.mtx.Lock()
defer pool.mtx.Unlock()
// ... logic ...
}
Then maybe we don't need the test which requires reportErr injection?
There was a problem hiding this comment.
The point of injection was to test that sendError is not performed under lock, which requires being able to pause the goroutine while inside sendError. This is a preexisting test. Alternatively I can remove the test, I suppose. Waiting for goroutines to block on channel is a fragile logic to have.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5891958. Configure here.


Autobahn nodes will need to be able to send pre-giga blocks to pre-giga nodes during transition period (so that pre-giga nodes can blocksync to the upgrade height). However all the remaining parts of the blocksync reactor should be disabled. This pr extracts the tendermint-only blocksync logic to syncController which is optional part of the blocksync reactor. Additionally I have