Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Split peer slots between full and light nodes #10688

Merged
merged 4 commits into from
Jan 19, 2022

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Jan 18, 2022

Close #8610

This PR adds a new CLI option --in-peers-light whose value defaults to 100.
The "existing slots" (--out-peers and --in-peers) are now reserved only for full nodes, and these 100 new additional slots are reserved for light clients.

Note that this concerns only the syncing, transactions, and GrandPa/Beefy protocols/peer sets. The parachains-related code and any other user-added protocol isn't concerned.

Motivation

The problem that this PR solves is that, currently, all the "in peers" of all nodes are always full. Since there is an equal number of "out peers" and "in peers", each full node on the network brings as many slots as it consumes.

Because some nodes are unfortunately behind a NAT, there are actually fewer available "in peers" than there are "out peers" being consumed, making the problem worse.

The obvious solution to that problem would be to increase the number of "in peers". While this solves the problem in theory, in practice older nodes have more chances of being discovered, and these older nodes would likely become full while younger nodes would not receive as many "in peers".

Having for example 125 full nodes connected to you consumes a huge amount of resources, as a node sends all transactions and GrandPa messages to all full nodes it is connected to.

This PR proposes another solution: the number of "in peers" for full nodes doesn't change, but we add new "in peers" specifically for light nodes. Having a light node connected to you consumes significantly fewer resources.

Actual behavior

For pragmatic reasons, the actual behavior of this PR is not what I've just described.
The code that assigns slots (the peerset) isn't aware of in/out slots, and I am too scared to do the necessary refactoring (paritytech/polkadot-sdk#556), as each big refactoring has lead to months of tedious debugging sessions.

Instead, in practice the node can right now open outbound substreams towards light clients.
This means that a certain number of slots, both out and in, are dedicated to full nodes, and the rest is dedicated to light nodes.

I think that this is fine, especially because light clients are typically unreachable and will be purged quite rapidly, and because light clients are not discoverable through the DHT but only because they've connected to us in the past.

@tomaka tomaka added A0-please_review Pull request needs code review. B5-clientnoteworthy C1-low PR touches the given topic and has a low impact on builders. labels Jan 18, 2022
@tomaka tomaka added this to Triage in Networking (Outdated) via automation Jan 18, 2022
@wigy-opensource-developer
Copy link
Contributor

"each big refactoring has lead to months of tedious debugging sessions" -> I hope that was not your best motivation speech ever 😝 Anyways, I will go through the code, but I guess it needs a few commits to get all CI checks green yet.

Copy link
Contributor

@wigy-opensource-developer wigy-opensource-developer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is typical to what I see around this part of the codebase. If you look at this single patch, it is easy to reason about it. But you have to look below and above to find a simple invariant:

in:u32 + out:u32 = in_light + (in_full + out) = light:usize + full:usize

There is lots of accidental complexity in the logic where we construct objects from the configuration. We might reason, parts of it is to stay backward compatible, but we still break that every now and then.

I am staring at the codebase for a month now trying to come up with a resolution to simplify this. Every protocol needs their own config parameters, and extra_sets does not allow to provide them. I mention this, because I wonder if there are light-client specific configuration that are relevant for protocols in extra_sets, too.

debug!(target: "sync", "Too many full nodes, rejecting {}", who);
self.behaviour.disconnect_peer(&who, HARDCODED_PEERSETS_SYNC);
return Err(())
} else if status.roles.is_light() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else is redundant, please get rid of it. I love that you used roles.is_light() here, although it is actually always !roles.is_full(), but having it made it easier to read the code. Why are we not worried here about overflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.peers contains both full and light peers, but self.sync only contains full peers, so the subtraction should always succeed

@tomaka
Copy link
Contributor Author

tomaka commented Jan 19, 2022

Every protocol needs their own config parameters, and extra_sets does not allow to provide them.

I swear I opened an issue about this, but I can't find it.
To me the solution is to refactor the public API of sc-network to make it possible to accept/reject peers based on their handshake. This is essentially what this PR right now is doing: accepting/rejecting peers based on their handshake.

There's no need to "inject" a set of accept/deny conditions to the networking, you just have to add a new state to a peer, which is "waiting for acceptance/denial by the upper layers".

This is not really doable right now because all new peers substreams events are reported to all the listeners, whereas making this accept/deny system work requires a specific higher-level owner for each peer-substream tuple.

We might reason, parts of it is to stay backward compatible, but we still break that every now and then.

I agree it's not great, but the fact that we're doing peer-to-peer networking is really helpful here, because nodes don't have to give a reason to deny a peer or close a substream.
The networking protocol doesn't give any guarantee of liveness, and so we are more or less free to break this.

@tomaka
Copy link
Contributor Author

tomaka commented Jan 19, 2022

Merging, as we want to fork off Polkadot 0.9.16 today.

@tomaka
Copy link
Contributor Author

tomaka commented Jan 19, 2022

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 104c2cd into paritytech:master Jan 19, 2022
Networking (Outdated) automation moved this from Triage to Done Jan 19, 2022
@tomaka tomaka deleted the light-slots branch January 19, 2022 10:58
tomaka added a commit to tomaka/polkadot that referenced this pull request Jan 24, 2022
paritytech-processbot bot pushed a commit that referenced this pull request Jan 24, 2022
chevdor pushed a commit that referenced this pull request Jan 24, 2022
grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022
* Split peer slots between full and light nodes

* Rustfmt

* Oops, accidentally removed a comma

* Remove else
grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
* Split peer slots between full and light nodes

* Rustfmt

* Oops, accidentally removed a comma

* Remove else
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. C1-low PR touches the given topic and has a low impact on builders.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Reserved a certain amount of incoming slots for full nodes
3 participants