-
Notifications
You must be signed in to change notification settings - Fork 679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Improved peer management #2993
Conversation
Ok, this guy is looking much better. There were few other things I had to add in. In particular, when we dial peers sometimes libp2p doesn't return the peer-id when we cannot connect to a multiaddr. This leaves the peer in the Dialing state in the peerdb. This would be fine, except that we use the number of dialiing peers in determining whether to accept incoming connections. I've added a timeout, 15 seconds such that if a peer is in the dialing state for more than 15 seconds (i.e libp2p has not informed us of its failure) we mark it as disconnected. This is handled in the peer manager. |
dddcf53
to
fa14cbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's some convoluted logic, but looks quite reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up reading this quite thoroughly, and it looks great!
I just suggested a few style tweaks and one small change to the logic regarding removed sync committee peers
beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
Outdated
Show resolved
Hide resolved
beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
Outdated
Show resolved
Hide resolved
beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
Outdated
Show resolved
Hide resolved
beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
Outdated
Show resolved
Hide resolved
bors r+ |
## Issue Addressed I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots. This PR corrects this issue and advances the pruning mechanism to favour subnet peers. An overview the new logic added is: - We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count. - We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10% The modified pruning logic is documented in the code but for reference it should do the following: - Prune peers with bad scores first - If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet - If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets. This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.
Pull request successfully merged into unstable. Build succeeded: |
Issue Addressed
I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots.
This PR corrects this issue and advances the pruning mechanism to favour subnet peers.
An overview the new logic added is:
The modified pruning logic is documented in the code but for reference it should do the following:
This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.