Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new mechanism of block synchronization using Index. #1378

Closed
ShawnYun opened this issue Dec 19, 2019 · 3 comments · Fixed by #1397
Closed

The new mechanism of block synchronization using Index. #1378

ShawnYun opened this issue Dec 19, 2019 · 3 comments · Fixed by #1397
Labels
discussion Initial issue state - proposed but not yet accepted

Comments

@ShawnYun
Copy link
Contributor

Background

Thanks to @ixje @erikzhang @vncoelho @jsolman @shargon for the discussion in #522, #1138, #781, and added a new p2p message for GetBlockData.
Now, we will discuss the new mechanism of block synchronization using Index.

Do you have any solution you want to propose?

We recommend creating a new SyncManager class that manages synchronization, while TaskManager handles Inv messages. The overall mechanism process is as follows:

1

Process:

  1. Get the latest block height of remotenode through PingPong messages.

  2. Calculate the block segments to be synchronized based on the current local block height and remotenode block height. According to the task interval rule (for example, each task can synchronize a maximum of 50 blocks), the block segment is divided into several synchronous tasks. Select a suitable node according to the task assignment rule and assign synchronous task to it. Save the startIndex, endIndex, task time and other information of the synchronization task. Send the GetBlockdata request.

  3. Each task maintains a BitArray to record the received blocks’ index. Find the corresponding task for each received block, and set the bit corresponding to the index of the received block to true. Then, tell the received block to Blockchain.

  4. Blockchain verifies the block and persists it, then tell the index of the persisted block to SyncManager.

  5. SyncManager checks the index of the persisted block. If the index reaches the endIndex of a task, check the list of all tasks, delete the corresponding completed task, and assign a new task to the node.

Task assignment rule:
Assign task to the node with the lowest number of tasks.

Task interval rule:
startIndex is the endIndex of the previous task + 1; EndIndex is the height that is greater than and nearest to startIndex and divisible by 50.

Exception handling:

  1. The Timer will check the task timeout, and if the task is timed out, the corresponding task will be re-assigned to other nodes.

  2. If blockchain receives a invalid block, tell to SyncManager and reassign the task to other node.(the node sending the invalid block can be marked as a bad node)

Neo Version

  • Neo 3

Where in the software does this update applies to?

  • Ledger
  • P2P (TCP)
@ShawnYun ShawnYun added the discussion Initial issue state - proposed but not yet accepted label Dec 19, 2019
@ixje
Copy link
Contributor

ixje commented Dec 27, 2019

I'll share what I implemented for neo3-python to give ideas for the above mechanism.

I have split Node management and Sync management into 2 separate classes.

Node manager
The node manager is responsible for establishing and maintaining and pool of active connections to healthy NEO nodes. It runs three services on an interval.

  1. a connection pool monitor to ensure it meets the configured minimum and maximum client settings and attempt to fill any open spots .

  2. an address list filler which asks for new addresses from connected nodes to ensure it always has a new node to connect to.

  3. a connection pool monitor to ensure that the remote nodes blockchain height keeps advancing (via PING/PONG messages). If they are stuck they’ll be replaced by a new node and tagged for poor performance.

In the unlikely event that all nodes fail and there are no more new addresses to connect to, the node manager will recycle all addresses that it historically was able to connect to.

Sync manager
The sync manager is responsible for bringing the local blockchain in sync with the global blockchain and keeping it in sync.

The sync manager depends on NodeManager for providing it with healthy nodes to request data from. It does 3 things on an interval

  1. check local blockchain height against the best height of the remote nodes. If any of the remote nodes has a better height, we request the next blocks.

  2. monitor that data requests of step 1 have not exceeded a configured time threshold. If the threshold was exceeded (default set to 5 seconds), tag the node for poor performance and request from another node.

    Tagging the node changes its weight. Weight is used by the node manager to determine which node from its pool to provide to the sync manager for its data request. A higher weight will be returned first. Node weight is based on 4 attributes

    1. avg speed - calculated by the size of the payload and the time it took for the node to deliver the data.

    2. avg request time - the longer ago the a node was used to request data, the higher the score. This helps in load balancing the data requests over all connected nodes.

    3. timeout count - if a node does not reply within the set timeout threshold then the weight is lowered. If it reaches a maximum timeout count, the node will be disconnected and replaced by a healthy node (done by the node manager).

    4. error count - if a node delivers faulty data (e.g. fails to deserialize, unrequested data) then the weight is lowered. If it reaches a maximum error count, the node will be disconnected and replaced by a healthy node (done by the node manager)

  3. finally, data that is received is scheduled to be persisted. This is Python specific because there's no true parallelism, so probably isn't relevant for C#.

If you have questions let me know.

@ShawnYun
Copy link
Contributor Author

ShawnYun commented Dec 30, 2019

@ixje Thank you for your introduction to the python implementation. Through your sharing, I think the following mechanism can be added in NEO3.

  1. Replace nodes that have been stuck for a long time with Ping/Pong messages.

  2. Add an assessment of the health of the node. Including the 4 attributes you mentioned above And other attributes.

@ixje
Copy link
Contributor

ixje commented Dec 30, 2019

@ShawnYun you're welcome.

I believe a mechanism equal to point 1. is a must to avoid ever getting stuck with a list of peers that are also stuck. Point 2. is a nice to have and mostly depends on how well syncing works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Initial issue state - proposed but not yet accepted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants