-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit reconnect node count metrics #12412
Comments
Here's a log from mainnet when a real reconnect happened a couple of weeks ago:
We can see that the |
Whether "any internal node lesson is redundant" depends on the reconnect algorithm. The current implementation transfers internal nodes to try and narrow down the branches of the tree that contain dirty leaves, so the internal node lessons aren't redundant but are in fact essential in order to avoid transferring all the leaves regardless of whether they are clean or dirty. And as example stats above show, it may in fact happen that some (or most, but not all) internal node transfers are redundant. However, in an ideal world (an algorithm that waits for confirmations such as the LevelByLevel traversal), the number of redundant internal node transfers should be minimal. To rephrase: depending on the used algorithm, it may in fact be important to differentiate between the total number of internal nodes transferred, and those that could've been avoided because they were in fact clean (aka the redundant transfers.) |
Given the considerations from above comments, here's a proposal for the counters to be collected:
This set of counters captures the work that the current reconnect implementation performs, and it should be usable for any new implementations that we may introduce or enable in the future. We can also revise this list in the future as needed. |
After each reconnect, learners report the following metrics in logs:
All metrics related to node counters are slightly confusing. In the current "push" reconnect implementation, they are set to
The problem here is that if a lesson is sent for a node, the lesson contains hashes for all its children. Today such a single lesson contributes to only totalNodes and one of internalNodes/leafNodes, but in reality information about more than one node is sent. redundantLeafNodes is always close to zero. It happens because virtual maps are large, and by the time when teacher sends a leaf node, a response from learner about the node is already received. For clean leaves, it looks like they are successfully skipped, but their hashes were still read by the teacher, sent to the learner (as a part of parent node's lesson), and then verified on the learner side. This should be reflected in leafNodes, but it is not. Same observation for redundantInternalNodes.
For virtual maps, another observation is any internal node lesson is redundant. Ideally, only dirty virtual leaves should be sent, since in virtual maps all data is in leaves. It means, for virtual maps, having internalNodes and redundantInternalNodes separate doesn't make sense. Only number of sent internal nodes does matter.
Here is my proposal (names are TBD):
For the current push implementation:
For pull-based implementation:
I am even thinking that these node counters should only be available for virtual maps, but not for other merkle nodes. There shouldn't be many non-virtual nodes these days anyway.
Finally, it would be helpful to see per-map counters in addition to stats about the whole merkle tree.
Tasks
The text was updated successfully, but these errors were encountered: