Split peer slots between full and light nodes #10688

tomaka · 2022-01-18T10:59:34Z

This PR adds a new CLI option --in-peers-light whose value defaults to 100.
The "existing slots" (--out-peers and --in-peers) are now reserved only for full nodes, and these 100 new additional slots are reserved for light clients.

Note that this concerns only the syncing, transactions, and GrandPa/Beefy protocols/peer sets. The parachains-related code and any other user-added protocol isn't concerned.

Motivation

The problem that this PR solves is that, currently, all the "in peers" of all nodes are always full. Since there is an equal number of "out peers" and "in peers", each full node on the network brings as many slots as it consumes.

Because some nodes are unfortunately behind a NAT, there are actually fewer available "in peers" than there are "out peers" being consumed, making the problem worse.

The obvious solution to that problem would be to increase the number of "in peers". While this solves the problem in theory, in practice older nodes have more chances of being discovered, and these older nodes would likely become full while younger nodes would not receive as many "in peers".

Having for example 125 full nodes connected to you consumes a huge amount of resources, as a node sends all transactions and GrandPa messages to all full nodes it is connected to.

This PR proposes another solution: the number of "in peers" for full nodes doesn't change, but we add new "in peers" specifically for light nodes. Having a light node connected to you consumes significantly fewer resources.

Actual behavior

For pragmatic reasons, the actual behavior of this PR is not what I've just described.
The code that assigns slots (the peerset) isn't aware of in/out slots, and I am too scared to do the necessary refactoring (paritytech/polkadot-sdk#556), as each big refactoring has lead to months of tedious debugging sessions.

Instead, in practice the node can right now open outbound substreams towards light clients.
This means that a certain number of slots, both out and in, are dedicated to full nodes, and the rest is dedicated to light nodes.

I think that this is fine, especially because light clients are typically unreachable and will be purged quite rapidly, and because light clients are not discoverable through the DHT but only because they've connected to us in the past.

wigy-opensource-developer · 2022-01-18T12:40:08Z

"each big refactoring has lead to months of tedious debugging sessions" -> I hope that was not your best motivation speech ever 😝 Anyways, I will go through the code, but I guess it needs a few commits to get all CI checks green yet.

wigy-opensource-developer

This change is typical to what I see around this part of the codebase. If you look at this single patch, it is easy to reason about it. But you have to look below and above to find a simple invariant:

in:u32 + out:u32 = in_light + (in_full + out) = light:usize + full:usize

There is lots of accidental complexity in the logic where we construct objects from the configuration. We might reason, parts of it is to stay backward compatible, but we still break that every now and then.

I am staring at the codebase for a month now trying to come up with a resolution to simplify this. Every protocol needs their own config parameters, and extra_sets does not allow to provide them. I mention this, because I wonder if there are light-client specific configuration that are relevant for protocols in extra_sets, too.

wigy-opensource-developer · 2022-01-19T08:36:49Z

client/network/src/protocol.rs

+			debug!(target: "sync", "Too many full nodes, rejecting {}", who);
+			self.behaviour.disconnect_peer(&who, HARDCODED_PEERSETS_SYNC);
+			return Err(())
+		} else if status.roles.is_light() &&


This else is redundant, please get rid of it. I love that you used roles.is_light() here, although it is actually always !roles.is_full(), but having it made it easier to read the code. Why are we not worried here about overflow?

self.peers contains both full and light peers, but self.sync only contains full peers, so the subtraction should always succeed

tomaka · 2022-01-19T09:34:55Z

Every protocol needs their own config parameters, and extra_sets does not allow to provide them.

I swear I opened an issue about this, but I can't find it.
To me the solution is to refactor the public API of sc-network to make it possible to accept/reject peers based on their handshake. This is essentially what this PR right now is doing: accepting/rejecting peers based on their handshake.

There's no need to "inject" a set of accept/deny conditions to the networking, you just have to add a new state to a peer, which is "waiting for acceptance/denial by the upper layers".

This is not really doable right now because all new peers substreams events are reported to all the listeners, whereas making this accept/deny system work requires a specific higher-level owner for each peer-substream tuple.

We might reason, parts of it is to stay backward compatible, but we still break that every now and then.

I agree it's not great, but the fact that we're doing peer-to-peer networking is really helpful here, because nodes don't have to give a reason to deny a peer or close a substream.
The networking protocol doesn't give any guarantee of liveness, and so we are more or less free to break this.

tomaka · 2022-01-19T10:58:20Z

Merging, as we want to fork off Polkadot 0.9.16 today.

tomaka · 2022-01-19T10:58:31Z

bot merge

* Split peer slots between full and light nodes * Rustfmt * Oops, accidentally removed a comma * Remove else

Split peer slots between full and light nodes

69e501c

tomaka added A0-please_review Pull request needs code review. B5-clientnoteworthy C1-low PR touches the given topic and has a low impact on builders. labels Jan 18, 2022

tomaka requested review from kpp, wigy-opensource-developer and arkpar January 18, 2022 10:59

tomaka added this to Triage in Networking (Outdated) via automation Jan 18, 2022

Rustfmt

0875dc5

arkpar approved these changes Jan 18, 2022

View reviewed changes

Oops, accidentally removed a comma

571fe5a

wigy-opensource-developer approved these changes Jan 19, 2022

View reviewed changes

Remove else

fd87416

paritytech-processbot bot merged commit 104c2cd into paritytech:master Jan 19, 2022

Networking (Outdated) automation moved this from Triage to Done Jan 19, 2022

tomaka deleted the light-slots branch January 19, 2022 10:58

tomaka added a commit to tomaka/polkadot that referenced this pull request Jan 24, 2022

Fix paritytech#10688 being misimplemented

14c7a0c

tomaka mentioned this pull request Jan 24, 2022

Fix #10688 being misimplemented #10721

Merged

paritytech-processbot bot pushed a commit that referenced this pull request Jan 24, 2022

Fix #10688 being misimplemented (#10721)

b57c08b

chevdor pushed a commit that referenced this pull request Jan 24, 2022

Fix #10688 being misimplemented (#10721)

ac8c2cd

tomaka mentioned this pull request Feb 5, 2022

https://github.com/paritytech/substrate/pull/10688 doesn't take reserved nodes into account paritytech/polkadot-sdk#533

Open

librelois mentioned this pull request Feb 7, 2022

Update substrate/polkadot/cumulus from v0.9.15 to v0.9.16 moonbeam-foundation/moonbeam#1259

Closed

grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022

Split peer slots between full and light nodes (paritytech#10688)

12de528

* Split peer slots between full and light nodes * Rustfmt * Oops, accidentally removed a comma * Remove else

grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022

Fix paritytech#10688 being misimplemented (paritytech#10721)

d2cf601

ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023

Split peer slots between full and light nodes (paritytech#10688)

60f350e

* Split peer slots between full and light nodes * Rustfmt * Oops, accidentally removed a comma * Remove else

ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023

Fix paritytech#10688 being misimplemented (paritytech#10721)

9611071

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split peer slots between full and light nodes #10688

Split peer slots between full and light nodes #10688

tomaka commented Jan 18, 2022 •

edited

Loading

wigy-opensource-developer commented Jan 18, 2022

wigy-opensource-developer left a comment •

edited

Loading

wigy-opensource-developer Jan 19, 2022

tomaka Jan 19, 2022

tomaka commented Jan 19, 2022

tomaka commented Jan 19, 2022

tomaka commented Jan 19, 2022

Split peer slots between full and light nodes #10688

Split peer slots between full and light nodes #10688

Conversation

tomaka commented Jan 18, 2022 • edited Loading

Motivation

Actual behavior

wigy-opensource-developer commented Jan 18, 2022

wigy-opensource-developer left a comment • edited Loading

Choose a reason for hiding this comment

wigy-opensource-developer Jan 19, 2022

Choose a reason for hiding this comment

tomaka Jan 19, 2022

Choose a reason for hiding this comment

tomaka commented Jan 19, 2022

tomaka commented Jan 19, 2022

tomaka commented Jan 19, 2022

tomaka commented Jan 18, 2022 •

edited

Loading

wigy-opensource-developer left a comment •

edited

Loading