The new mechanism of block synchronization using Index. #1378

ShawnYun · 2019-12-19T11:00:17Z

Background

Thanks to @ixje @erikzhang @vncoelho @jsolman @shargon for the discussion in #522, #1138, #781, and added a new p2p message for GetBlockData.
Now, we will discuss the new mechanism of block synchronization using Index.

Do you have any solution you want to propose?

We recommend creating a new SyncManager class that manages synchronization, while TaskManager handles Inv messages. The overall mechanism process is as follows:

Process:

Get the latest block height of remotenode through PingPong messages.
Calculate the block segments to be synchronized based on the current local block height and remotenode block height. According to the task interval rule (for example, each task can synchronize a maximum of 50 blocks), the block segment is divided into several synchronous tasks. Select a suitable node according to the task assignment rule and assign synchronous task to it. Save the startIndex, endIndex, task time and other information of the synchronization task. Send the GetBlockdata request.
Each task maintains a BitArray to record the received blocks’ index. Find the corresponding task for each received block, and set the bit corresponding to the index of the received block to true. Then, tell the received block to Blockchain.
Blockchain verifies the block and persists it, then tell the index of the persisted block to SyncManager.
SyncManager checks the index of the persisted block. If the index reaches the endIndex of a task, check the list of all tasks, delete the corresponding completed task, and assign a new task to the node.

Task assignment rule:
Assign task to the node with the lowest number of tasks.

Task interval rule:
startIndex is the endIndex of the previous task + 1; EndIndex is the height that is greater than and nearest to startIndex and divisible by 50.

Exception handling:

The Timer will check the task timeout, and if the task is timed out, the corresponding task will be re-assigned to other nodes.
If blockchain receives a invalid block, tell to SyncManager and reassign the task to other node.(the node sending the invalid block can be marked as a bad node)

Neo Version

Neo 3

Where in the software does this update applies to?

Ledger
P2P (TCP)

The text was updated successfully, but these errors were encountered:

ixje · 2019-12-27T15:00:50Z

I'll share what I implemented for neo3-python to give ideas for the above mechanism.

I have split Node management and Sync management into 2 separate classes.

Node manager
The node manager is responsible for establishing and maintaining and pool of active connections to healthy NEO nodes. It runs three services on an interval.

a connection pool monitor to ensure it meets the configured minimum and maximum client settings and attempt to fill any open spots .
an address list filler which asks for new addresses from connected nodes to ensure it always has a new node to connect to.
a connection pool monitor to ensure that the remote nodes blockchain height keeps advancing (via PING/PONG messages). If they are stuck they’ll be replaced by a new node and tagged for poor performance.

In the unlikely event that all nodes fail and there are no more new addresses to connect to, the node manager will recycle all addresses that it historically was able to connect to.

Sync manager
The sync manager is responsible for bringing the local blockchain in sync with the global blockchain and keeping it in sync.

The sync manager depends on NodeManager for providing it with healthy nodes to request data from. It does 3 things on an interval

check local blockchain height against the best height of the remote nodes. If any of the remote nodes has a better height, we request the next blocks.
monitor that data requests of step 1 have not exceeded a configured time threshold. If the threshold was exceeded (default set to 5 seconds), tag the node for poor performance and request from another node.

Tagging the node changes its weight. Weight is used by the node manager to determine which node from its pool to provide to the sync manager for its data request. A higher weight will be returned first. Node weight is based on 4 attributes
1. avg speed - calculated by the size of the payload and the time it took for the node to deliver the data.
2. avg request time - the longer ago the a node was used to request data, the higher the score. This helps in load balancing the data requests over all connected nodes.
3. timeout count - if a node does not reply within the set timeout threshold then the weight is lowered. If it reaches a maximum timeout count, the node will be disconnected and replaced by a healthy node (done by the node manager).
4. error count - if a node delivers faulty data (e.g. fails to deserialize, unrequested data) then the weight is lowered. If it reaches a maximum error count, the node will be disconnected and replaced by a healthy node (done by the node manager)
finally, data that is received is scheduled to be persisted. This is Python specific because there's no true parallelism, so probably isn't relevant for C#.

If you have questions let me know.

ShawnYun · 2019-12-30T09:01:00Z

@ixje Thank you for your introduction to the python implementation. Through your sharing, I think the following mechanism can be added in NEO3.

Replace nodes that have been stuck for a long time with Ping/Pong messages.
Add an assessment of the health of the node. Including the 4 attributes you mentioned above And other attributes.

ixje · 2019-12-30T12:04:22Z

@ShawnYun you're welcome.

I believe a mechanism equal to point 1. is a must to avoid ever getting stuck with a list of peers that are also stuck. Point 2. is a nice to have and mostly depends on how well syncing works now.

ShawnYun added the discussion Initial issue state - proposed but not yet accepted label Dec 19, 2019

This was referenced Dec 31, 2019

Add Get-full-blocks [Draft] #1388

Closed

GetBlocks by block index #1397

Merged

erikzhang closed this as completed in #1397 Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The new mechanism of block synchronization using Index. #1378

The new mechanism of block synchronization using Index. #1378

ShawnYun commented Dec 19, 2019

ixje commented Dec 27, 2019

ShawnYun commented Dec 30, 2019 •

edited

Loading

ixje commented Dec 30, 2019

The new mechanism of block synchronization using Index. #1378

The new mechanism of block synchronization using Index. #1378

Comments

ShawnYun commented Dec 19, 2019

ixje commented Dec 27, 2019

ShawnYun commented Dec 30, 2019 • edited Loading

ixje commented Dec 30, 2019

ShawnYun commented Dec 30, 2019 •

edited

Loading