core/muxing: Use total number of alive inbound streams for backpressure #2878

thomaseizinger · 2022-09-08T14:29:37Z

Description

A PoC implementation for what is described in #2865.

Links to any relevant issues

Simplify ConnectionHandler trait by removing as many associated types as possible #2863

Open tasks

Merge swarm/connection: Enforce limit on inbound substreams via StreamMuxer #2861
Remove SubstreamBox::new from public API of libp2p-core
Bump version of libp2p-core to 0.37.0
Bump version of libp2p-swarm to 0.40.0

Open Questions

Should libp2p-kad make use of the new limit and

rust-libp2p/protocols/kad/src/handler.rs

Line 42 in cce296e

const MAX_NUM_INBOUND_SUBSTREAMS: usize = 32;

be refactored?
- I ended up using it.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

mxinden

Thanks for writing this POC!

From what I can see here, I am in favor of this proposal.

I would like to only merge here once we have at least one user of this API. I would guess that the best user for this would be libp2p-metrics.

While we could emit a new SwarmEvent on each new stream, I wonder how we could inform libp2p-metrics once a stream closed.

Next to the number of substreams, it would be wonderful to expose the protocol that is being negotiated on the substream as well. That would enable us to expose the following metric:

libp2p_swarm_stream { direction = "inbound", protocol = "/libp2p/kad/1.0.0" } 10

thomaseizinger · 2022-09-15T14:13:59Z

Thanks for writing this POC!

From what I can see here, I am in favor of this proposal.

I would like to only merge here once we have at least one user of this API. I would guess that the best user for this would be libp2p-metrics.

Yes, makes sense. I see the first user in swarm::Connection but for that #2861 needs to merge first. Together with #2861, this will give us actual backpressure on the number of open streams and not just the ones that are currently negotiating.

While we could emit a new SwarmEvent on each new stream, I wonder how we could inform libp2p-metrics once a stream closed.

Next to the number of substreams, it would be wonderful to expose the protocol that is being negotiated on the substream as well. That would enable us to expose the following metric:
libp2p_swarm_stream { direction = "inbound", protocol = "/libp2p/kad/1.0.0" } 10

I did something like this in the past. If we make swarm::Connection aware of metrics, it is fairly easily doable.

thomaseizinger · 2022-09-22T07:48:20Z

swarm/CHANGELOG.md

+- Use the total number of alive inbound streams for back-pressure. This can have a BIG impact on your application
+  depending on how it uses `libp2p`. Previously, the limit for inbound streams per connection only applied to the
+  _upgrade_ phase, i.e. for the time `InboundUpgrade` was running. Any stream being returned from `InboundUpgrade` and
+  given to the `ConnectionHandler` did not count towards that limit, essentially mitigating the back-pressure mechanism.
+  With this release, substreams count towards that limit until they are dropped and thus we actually enforce, how many
+  inbound streams can be active at one time _per connection_. `libp2p` will not accept any more incoming streams once
+  that limit is hit. If you experience stalls or unaccepted streams in your application, consider upping the limit via
+  `SwarmBuilder::max_negotiating_inbound_streams`. See [PR 2878].


Not sure if I am giving this too much attention but it feels like a really important change to me.

mxinden · 2022-09-26T17:36:01Z

swarm/CHANGELOG.md

+- Use the total number of alive inbound streams for back-pressure. This can have a BIG impact on your application
+  depending on how it uses `libp2p`. Previously, the limit for inbound streams per connection only applied to the
+  _upgrade_ phase, i.e. for the time `InboundUpgrade` was running. Any stream being returned from `InboundUpgrade` and
+  given to the `ConnectionHandler` did not count towards that limit, essentially mitigating the back-pressure mechanism.
+  With this release, substreams count towards that limit until they are dropped and thus we actually enforce, how many
+  inbound streams can be active at one time _per connection_. `libp2p` will not accept any more incoming streams once
+  that limit is hit. If you experience stalls or unaccepted streams in your application, consider upping the limit via
+  `SwarmBuilder::max_negotiating_inbound_streams`. See [PR 2878].


I have to give this more thought. Previously I was under the impression that this will be feature for reporting only, but not actually enforcing a limit.

As of today I am undecided whether the number of established inbound streams should have a global connection maximum, or whether it should only have a maximum per ConnectionHandler implementation, potentially coordinated with the NetworkBehaviour and thus a maximum per NetworkBehaviour implementation. I am not sure the global application has enough knowledge to set a suitable limit for all protocols on a single connection, nor do I think a static limit per connection is a good idea in the first place.

Ideally I would like to enforce a limit for connections only. With #2828 a connection limit could e.g. be dynamic based on the current memory utilization. Protocols, e.g. like Kademlia, would enforce a streams-per-connection limit in their ConnectionHandler implementation. That limit would be a value of maximum expected parallelization, e.g. 16 (as in "we don't expect an implementation to be handle more than 16 requests in parallel").

What do you think @thomaseizinger?

If we want the limit to be dynamic, all we need to do is add a function to ConnectionHandler that allows us to query the max number of allowed streams:

trait ConnectionHandler { fn max_inbound_streams(&self) -> usize { 128 // This is today's default. } }

We can then use this limit on every iteration of Connection::poll to check if we should poll for more inbound streams.

If we move forward with #2863 then the notion of upgrades will be gone and this limit does not mean anything anymore. Thus, I am inclined to say we should remove this from the Swarm and Pool altogether and only have the ConnectionHandler decide.

If we move forward with #2863 then the notion of upgrades will be gone and this limit does not mean anything anymore. Thus, I am inclined to say we should remove this from the Swarm and Pool altogether and only have the ConnectionHandler decide.

That would be my preference.

LMK what you think of 40ab076 (#2878).

This effectively deprecates `SwarmBuilder::max_negotiating_inbound_streams`.

This will result in peers never being able to open a substream and we currently depend on this behaviour, despite the upgrade never being successful.

These will eventually go away so we don't bother replacing the links.

thomaseizinger · 2022-09-30T07:07:04Z

@mxinden Updated the PR description with an open question.

thomaseizinger · 2022-10-01T08:53:36Z

@mxinden In #2957, a usecase for tracking the number of substreams in metrics came up.

I can't think of a way of recording this though if we don't want to take in a dependency on prometheus-client. If we had a metric instance inside Connection, I could update it on every loop iteration. We could emit events every time we open or accept a new substream?

Co-authored-by: Ryan Plauche <ryan@littlebearlabs.io>

thomaseizinger · 2022-10-06T23:53:05Z

@AgeManning I'd be curious what you think of this PR in regards to what it changes in gossipsub (534c3f6 (#2878)).

The gist is that we can now enforce, how many substreams to accept from the remote. Given that GossipSub should only ever have one inbound stream, I set this to one.

AgeManning · 2022-10-09T14:49:29Z

Yeah the changes look fine to me. We only have one inbound substream as you've pointed out, so I dont see any issue here

thomaseizinger · 2022-10-14T10:46:15Z

I can't reproduce this error on my machine. Anyone else got any luck?

Also, can you have another look at this @mxinden? I implemented a per connection-configurable limit now as requested in #2878 (comment) :)

mxinden

I think long term this is something we should be doing. I am not yet sure on whether the current implementation allows collaborative usage of these limits across ConnectionHandler implementations.

In my eyes, before we do this, we should tackle #2863 first, thus simplifying the sets (pending-multistream, pending-upgrade, negotiated) of inbound streams.

Another thought would be to redesign max_inbound_streams to be closer to Sink::poll_ready:

fn poll_new_inbound_stream_ready(&self, cx) -> Poll<()> {

A ConnectionHandler could thus signal in a async way whether it is willing to accept new streams.

Also adding this to the agenda for the next rust-libp2p community call.

mxinden · 2022-10-19T16:47:52Z

swarm/src/handler.rs

+    /// The maximum number of inbound substreams allowed on the underlying connection.
+    ///
+    /// Once this limit is hit, we will stop accepting new inbound streams from the remote.
+    fn max_inbound_streams(&self) -> usize {
+        DEFAULT_MAX_INBOUND_STREAMS
+    }


In case an implementation of ConnectionHandler does not implement this method, but accepts DEFAULT_MAX_INBOUND_STREAMS streams, it would starve all other implementations of ConnectionHandler from receiving more inbound streams, correct?

From the connection task PoV, there is only one ConnectionHandler. I think if we get the combinators right, this shouldn't be an issue?

mxinden · 2022-10-19T16:48:50Z

swarm/src/handler/multi.rs

+        self.handlers
+            .values()
+            .map(|h| h.max_inbound_streams())
+            .max()


Shouldn't this be the sum? I.e. the sum of all limits equals the total limit.

mxinden · 2022-10-19T16:49:02Z

swarm/src/handler/multi.rs

+            .values()
+            .map(|h| h.max_inbound_streams())
+            .max()
+            .unwrap_or(0) // No handlers? No substreams.


thomaseizinger · 2022-10-19T21:09:19Z

I think long term this is something we should be doing. I am not yet sure on whether the current implementation allows collaborative usage of these limits across ConnectionHandler implementations.

Do you think the proposed implementation is an improvement?

I think it is but I agree with you that it can be even better.

In my eyes, before we do this, we should tackle #2863 first, thus simplifying the sets (pending-multistream, pending-upgrade, negotiated) of inbound streams.

Funny, I see this a step towards that solution. With this PR, a substream counts towards a limit as soon as it exists and not just during the upgrade phase.

The next idea (but could be done in parallel) for #2863 is to make less use of upgrades in other protocols but build a FuturesUnordered-like primitive that implementations can use to do the upgrade themselves.

Another thought would be to redesign max_inbound_streams to be closer to Sink::poll_ready:
fn poll_new_inbound_stream_ready(&self, cx) -> Poll<()> {
A ConnectionHandler could thus signal in a async way whether it is willing to accept new streams.

This is an interesting idea! Should a ConnectionHandler have multiple poll functions perhaps?

poll_ready_inbound
poll_new_outbound
poll_event

Also adding this to the agenda for the next rust-libp2p community call.

👍

thomaseizinger · 2022-11-02T04:06:35Z

Setting to draft until we reach a decision.

thomaseizinger · 2023-03-13T07:40:45Z

Closing because stale.

thomaseizinger added 3 commits September 9, 2022 00:20

Add tests

10861b9

Track number of active inbound and outbound streams

dbd0e24

Update changelog with PR number

21ca446

thomaseizinger mentioned this pull request Sep 8, 2022

Should StreamMuxerBox track how many "active" substreams are still around? #2865

Open

thomaseizinger requested review from mxinden and elenaf9 September 8, 2022 14:32

mxinden reviewed Sep 15, 2022

View reviewed changes

thomaseizinger added 4 commits September 22, 2022 17:32

Merge branch 'master' into track-alive-substreams

17f8139

Make ctor module private

f3e17b1

Use total number of inbound streams for back-pressure

63f16d1

Add changelog entry to swarm

9fba25f

thomaseizinger changed the title ~~core/muxing: Track number of active inbound and outbound streams~~ core/muxing: Use total number of alive inbound streams for backpressure Sep 22, 2022

thomaseizinger marked this pull request as ready for review September 22, 2022 07:47

thomaseizinger commented Sep 22, 2022

View reviewed changes

mxinden reviewed Sep 26, 2022

View reviewed changes

thomaseizinger added 2 commits September 30, 2022 10:39

Introduce ConnectionHandler::max_inbound_streams

40ab076

This effectively deprecates `SwarmBuilder::max_negotiating_inbound_streams`.

Merge branch 'master' into track-alive-substreams

b679d8c

thomaseizinger requested a review from mxinden September 30, 2022 00:46

thomaseizinger added 2 commits September 30, 2022 16:41

Don't override max_inbound_streams for DummyConnectionHandler

a008719

This will result in peers never being able to open a substream and we currently depend on this behaviour, despite the upgrade never being successful.

Remove outdated rustdoc links

9cd4083

These will eventually go away so we don't bother replacing the links.

plauche and others added 6 commits October 7, 2022 10:22

protocols/{autonat,dcutr}: Fixing filename collision in examples (#2959)

e2b4969

Co-authored-by: Ryan Plauche <ryan@littlebearlabs.io>

Merge branch 'master' into track-alive-substreams

7e02538

Fix typo in log

311e674

Use max_inbound_streams for kademlia handler

cbb6dd5

Use max_inbound_streams in GossipSub handler

534c3f6

Use max_inbound_streams for ping handler

a6bd2e4

thomaseizinger force-pushed the track-alive-substreams branch from f7b2221 to a6bd2e4 Compare October 6, 2022 23:52

thomaseizinger added 4 commits October 11, 2022 22:08

Merge branch 'master' into track-alive-substreams

f921261

Merge branch 'master' into track-alive-substreams

088301b

Remove use of deprecated functions

5531d96

Rewrite changelog section

f3d36eb

thomaseizinger mentioned this pull request Oct 18, 2022

Swarm does not honour max_negotiating_inbound_streams setting #3041

Closed

mxinden reviewed Oct 19, 2022

View reviewed changes

thomaseizinger mentioned this pull request Oct 19, 2022

Simplify ConnectionHandler trait by removing as many associated types as possible #2863

Open

thomaseizinger marked this pull request as draft November 2, 2022 04:06

thomaseizinger closed this Mar 13, 2023

thomaseizinger deleted the track-alive-substreams branch April 26, 2023 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/muxing: Use total number of alive inbound streams for backpressure #2878

core/muxing: Use total number of alive inbound streams for backpressure #2878

thomaseizinger commented Sep 8, 2022 •

edited

mxinden left a comment

thomaseizinger commented Sep 15, 2022 •

edited

thomaseizinger Sep 22, 2022

mxinden Sep 26, 2022 •

edited

thomaseizinger Sep 27, 2022

mxinden Sep 29, 2022

thomaseizinger Sep 30, 2022

thomaseizinger commented Sep 30, 2022

thomaseizinger commented Oct 1, 2022

thomaseizinger commented Oct 6, 2022

AgeManning commented Oct 9, 2022

thomaseizinger commented Oct 14, 2022 •

edited

mxinden left a comment

mxinden Oct 19, 2022

thomaseizinger Oct 19, 2022

mxinden Oct 19, 2022

mxinden Oct 19, 2022

thomaseizinger commented Oct 19, 2022

thomaseizinger commented Nov 2, 2022

thomaseizinger commented Mar 13, 2023

core/muxing: Use total number of alive inbound streams for backpressure #2878

core/muxing: Use total number of alive inbound streams for backpressure #2878

Conversation

thomaseizinger commented Sep 8, 2022 • edited

Description

Links to any relevant issues

Open tasks

Open Questions

Change checklist

mxinden left a comment

Choose a reason for hiding this comment

thomaseizinger commented Sep 15, 2022 • edited

Choose a reason for hiding this comment

mxinden Sep 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaseizinger commented Sep 30, 2022

thomaseizinger commented Oct 1, 2022

thomaseizinger commented Oct 6, 2022

AgeManning commented Oct 9, 2022

thomaseizinger commented Oct 14, 2022 • edited

mxinden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaseizinger commented Oct 19, 2022

thomaseizinger commented Nov 2, 2022

thomaseizinger commented Mar 13, 2023

thomaseizinger commented Sep 8, 2022 •

edited

thomaseizinger commented Sep 15, 2022 •

edited

mxinden Sep 26, 2022 •

edited

thomaseizinger commented Oct 14, 2022 •

edited