Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/muxing: Use total number of alive inbound streams for backpressure #2878

Closed
wants to merge 21 commits into from

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Sep 8, 2022

Description

A PoC implementation for what is described in #2865.

Links to any relevant issues

Open tasks

Open Questions

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this POC!

From what I can see here, I am in favor of this proposal.

I would like to only merge here once we have at least one user of this API. I would guess that the best user for this would be libp2p-metrics.

While we could emit a new SwarmEvent on each new stream, I wonder how we could inform libp2p-metrics once a stream closed.

Next to the number of substreams, it would be wonderful to expose the protocol that is being negotiated on the substream as well. That would enable us to expose the following metric:

libp2p_swarm_stream { direction = "inbound", protocol = "/libp2p/kad/1.0.0" } 10

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Sep 15, 2022

Thanks for writing this POC!

From what I can see here, I am in favor of this proposal.

I would like to only merge here once we have at least one user of this API. I would guess that the best user for this would be libp2p-metrics.

Yes, makes sense. I see the first user in swarm::Connection but for that #2861 needs to merge first. Together with #2861, this will give us actual backpressure on the number of open streams and not just the ones that are currently negotiating.

While we could emit a new SwarmEvent on each new stream, I wonder how we could inform libp2p-metrics once a stream closed.

Next to the number of substreams, it would be wonderful to expose the protocol that is being negotiated on the substream as well. That would enable us to expose the following metric:

libp2p_swarm_stream { direction = "inbound", protocol = "/libp2p/kad/1.0.0" } 10

I did something like this in the past. If we make swarm::Connection aware of metrics, it is fairly easily doable.

@thomaseizinger thomaseizinger changed the title core/muxing: Track number of active inbound and outbound streams core/muxing: Use total number of alive inbound streams for backpressure Sep 22, 2022
@thomaseizinger thomaseizinger marked this pull request as ready for review September 22, 2022 07:47
Comment on lines 3 to 10
- Use the total number of alive inbound streams for back-pressure. This can have a BIG impact on your application
depending on how it uses `libp2p`. Previously, the limit for inbound streams per connection only applied to the
_upgrade_ phase, i.e. for the time `InboundUpgrade` was running. Any stream being returned from `InboundUpgrade` and
given to the `ConnectionHandler` did not count towards that limit, essentially mitigating the back-pressure mechanism.
With this release, substreams count towards that limit until they are dropped and thus we actually enforce, how many
inbound streams can be active at one time _per connection_. `libp2p` will not accept any more incoming streams once
that limit is hit. If you experience stalls or unaccepted streams in your application, consider upping the limit via
`SwarmBuilder::max_negotiating_inbound_streams`. See [PR 2878].
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I am giving this too much attention but it feels like a really important change to me.

Comment on lines 3 to 10
- Use the total number of alive inbound streams for back-pressure. This can have a BIG impact on your application
depending on how it uses `libp2p`. Previously, the limit for inbound streams per connection only applied to the
_upgrade_ phase, i.e. for the time `InboundUpgrade` was running. Any stream being returned from `InboundUpgrade` and
given to the `ConnectionHandler` did not count towards that limit, essentially mitigating the back-pressure mechanism.
With this release, substreams count towards that limit until they are dropped and thus we actually enforce, how many
inbound streams can be active at one time _per connection_. `libp2p` will not accept any more incoming streams once
that limit is hit. If you experience stalls or unaccepted streams in your application, consider upping the limit via
`SwarmBuilder::max_negotiating_inbound_streams`. See [PR 2878].
Copy link
Member

@mxinden mxinden Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to give this more thought. Previously I was under the impression that this will be feature for reporting only, but not actually enforcing a limit.

As of today I am undecided whether the number of established inbound streams should have a global connection maximum, or whether it should only have a maximum per ConnectionHandler implementation, potentially coordinated with the NetworkBehaviour and thus a maximum per NetworkBehaviour implementation. I am not sure the global application has enough knowledge to set a suitable limit for all protocols on a single connection, nor do I think a static limit per connection is a good idea in the first place.

Ideally I would like to enforce a limit for connections only. With #2828 a connection limit could e.g. be dynamic based on the current memory utilization. Protocols, e.g. like Kademlia, would enforce a streams-per-connection limit in their ConnectionHandler implementation. That limit would be a value of maximum expected parallelization, e.g. 16 (as in "we don't expect an implementation to be handle more than 16 requests in parallel").

What do you think @thomaseizinger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want the limit to be dynamic, all we need to do is add a function to ConnectionHandler that allows us to query the max number of allowed streams:

trait ConnectionHandler {
	fn max_inbound_streams(&self) -> usize {
		128 // This is today's default.
	}
}

We can then use this limit on every iteration of Connection::poll to check if we should poll for more inbound streams.

If we move forward with #2863 then the notion of upgrades will be gone and this limit does not mean anything anymore. Thus, I am inclined to say we should remove this from the Swarm and Pool altogether and only have the ConnectionHandler decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move forward with #2863 then the notion of upgrades will be gone and this limit does not mean anything anymore. Thus, I am inclined to say we should remove this from the Swarm and Pool altogether and only have the ConnectionHandler decide.

That would be my preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK what you think of 40ab076 (#2878).

This will result in peers never being able to open a substream
and we currently depend on this behaviour, despite the upgrade
never being successful.
These will eventually go away so we don't bother replacing the links.
@thomaseizinger
Copy link
Contributor Author

@mxinden Updated the PR description with an open question.

@thomaseizinger
Copy link
Contributor Author

@mxinden In #2957, a usecase for tracking the number of substreams in metrics came up.

I can't think of a way of recording this though if we don't want to take in a dependency on prometheus-client. If we had a metric instance inside Connection, I could update it on every loop iteration. We could emit events every time we open or accept a new substream?

@thomaseizinger
Copy link
Contributor Author

@AgeManning I'd be curious what you think of this PR in regards to what it changes in gossipsub (534c3f6 (#2878)).

The gist is that we can now enforce, how many substreams to accept from the remote. Given that GossipSub should only ever have one inbound stream, I set this to one.

@AgeManning
Copy link
Contributor

Yeah the changes look fine to me. We only have one inbound substream as you've pointed out, so I dont see any issue here

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Oct 14, 2022

I can't reproduce this error on my machine. Anyone else got any luck?

Also, can you have another look at this @mxinden? I implemented a per connection-configurable limit now as requested in #2878 (comment) :)

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think long term this is something we should be doing. I am not yet sure on whether the current implementation allows collaborative usage of these limits across ConnectionHandler implementations.

In my eyes, before we do this, we should tackle #2863 first, thus simplifying the sets (pending-multistream, pending-upgrade, negotiated) of inbound streams.

Another thought would be to redesign max_inbound_streams to be closer to Sink::poll_ready:

fn poll_new_inbound_stream_ready(&self, cx) -> Poll<()> {

A ConnectionHandler could thus signal in a async way whether it is willing to accept new streams.

Also adding this to the agenda for the next rust-libp2p community call.

Comment on lines +120 to +125
/// The maximum number of inbound substreams allowed on the underlying connection.
///
/// Once this limit is hit, we will stop accepting new inbound streams from the remote.
fn max_inbound_streams(&self) -> usize {
DEFAULT_MAX_INBOUND_STREAMS
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case an implementation of ConnectionHandler does not implement this method, but accepts DEFAULT_MAX_INBOUND_STREAMS streams, it would starve all other implementations of ConnectionHandler from receiving more inbound streams, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the connection task PoV, there is only one ConnectionHandler. I think if we get the combinators right, this shouldn't be an issue?

self.handlers
.values()
.map(|h| h.max_inbound_streams())
.max()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the sum? I.e. the sum of all limits equals the total limit.

.values()
.map(|h| h.max_inbound_streams())
.max()
.unwrap_or(0) // No handlers? No substreams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

@thomaseizinger
Copy link
Contributor Author

I think long term this is something we should be doing. I am not yet sure on whether the current implementation allows collaborative usage of these limits across ConnectionHandler implementations.

Do you think the proposed implementation is an improvement?

I think it is but I agree with you that it can be even better.

In my eyes, before we do this, we should tackle #2863 first, thus simplifying the sets (pending-multistream, pending-upgrade, negotiated) of inbound streams.

Funny, I see this a step towards that solution. With this PR, a substream counts towards a limit as soon as it exists and not just during the upgrade phase.

The next idea (but could be done in parallel) for #2863 is to make less use of upgrades in other protocols but build a FuturesUnordered-like primitive that implementations can use to do the upgrade themselves.

Another thought would be to redesign max_inbound_streams to be closer to Sink::poll_ready:

fn poll_new_inbound_stream_ready(&self, cx) -> Poll<()> {

A ConnectionHandler could thus signal in a async way whether it is willing to accept new streams.

This is an interesting idea! Should a ConnectionHandler have multiple poll functions perhaps?

  • poll_ready_inbound
  • poll_new_outbound
  • poll_event

Also adding this to the agenda for the next rust-libp2p community call.

👍

@thomaseizinger
Copy link
Contributor Author

Setting to draft until we reach a decision.

@thomaseizinger thomaseizinger marked this pull request as draft November 2, 2022 04:06
@thomaseizinger
Copy link
Contributor Author

Closing because stale.

@thomaseizinger thomaseizinger deleted the track-alive-substreams branch April 26, 2023 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants