Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify connection and endpoint drivers #1219

Closed
wants to merge 5 commits into from
Closed

Unify connection and endpoint drivers #1219

wants to merge 5 commits into from

Conversation

Ralith
Copy link
Collaborator

@Ralith Ralith commented Oct 30, 2021

Fixes #914. This is a high-risk change, so we probably don't want it in 0.8, but I'm getting a head start on it all the same.

@Ralith Ralith changed the title Unify connection and endpoint drivers WIP: Unify connection and endpoint drivers Oct 30, 2021
@Ralith Ralith force-pushed the unify-drivers branch 2 times, most recently from 33da0d5 to 7a12269 Compare November 5, 2021 04:13
@Ralith Ralith marked this pull request as ready for review November 17, 2021 02:13
@Ralith
Copy link
Collaborator Author

Ralith commented Nov 17, 2021

This could probably use some more iteration, but is now functionally complete. Next step is careful benchmarking.

@Ralith Ralith changed the title WIP: Unify connection and endpoint drivers Unify connection and endpoint drivers Nov 17, 2021
@Ralith Ralith force-pushed the unify-drivers branch 3 times, most recently from 16e1e43 to 589ec83 Compare November 19, 2021 05:08
@Ralith Ralith force-pushed the unify-drivers branch 3 times, most recently from b4082d4 to dffe9ba Compare November 27, 2021 02:35
@Ralith Ralith force-pushed the unify-drivers branch 2 times, most recently from a5e57e5 to 17fd83a Compare December 29, 2021 06:27
@djc
Copy link
Collaborator

djc commented Dec 29, 2021

Is this still waiting for benchmarking?

@Ralith
Copy link
Collaborator Author

Ralith commented Dec 29, 2021

Initial results were inconclusive, with increases in ACK frames sent, presumed due to reduced internal buffering. I've therefore been waiting for @Matthias247 to finish his WIP delayed ACK support which should remove that variable.

@DemiMarie

This comment was marked as off-topic.

@DemiMarie

This comment was marked as off-topic.

@Ralith

This comment was marked as off-topic.

@DemiMarie

This comment was marked as off-topic.

@Ralith

This comment was marked as off-topic.

@DemiMarie

This comment was marked as off-topic.

@Ralith

This comment was marked as off-topic.

@DemiMarie

This comment was marked as off-topic.

@Ralith
Copy link
Collaborator Author

Ralith commented Jan 15, 2023

prioritizing soaking up rustls review bandwidth while it's available.

Makes sense; that's definitely a precious resource! I don't think this is blocking anything immediate; @rrichardson's interest in horizontal scaling as discussed in #914 can be pursued independently at the proto layer.

@mxinden
Copy link
Contributor

mxinden commented Feb 24, 2023

👋 rust-libp2p maintainer here,

Is there any way I could help move this pull request forward?

Ways that come to my mind:

  • More reviews
  • Testing in production environments
  • Writing tests
  • Providing benchmark results

My motivation:

  • We are currently using quinn-proto in rust-libp2p.
  • We don't use quinn directly, but instead our own wrapper around quinn-proto due to quinn using unbounded channels between the Endpoint and the Connection. See Stop using unbounded channels #1353.
  • I don't think it makes sense for us (rust-libp2p) to maintain our own wrapper long term. It introduces unnecessary maintenance work and we don't benefit from upstream (quinn) improvements and fixes.
  • This pull request would remove the unbounded channels, thus enabling us to use quinn directly.

On a general note, thank you for all your work! With the release of QUIC support in rust-libp2p via quinn-proto:

  • We are seeing significant connection establishment improvements in the 95th percentile compared to TCP:

    image

    See https://kademlia-exporter.max-inden.de/d/Pfr0Fj6Mk/rust-libp2p for more numbers.

  • We can leverage stream creation backpressure at the protocol level.

  • We see significant improvements to our hole punching success rate.

  • ...

@Ralith
Copy link
Collaborator Author

Ralith commented Feb 24, 2023

Thanks for sharing those numbers, that's really cool! I think knowing there's demand is helpful already, but thoughtful review of the core 294e5d7 commit that I think @djc is mainly hung up on certainly wouldn't hurt.

@DemiMarie
Copy link

@mxinden @Ralith This does not remove all unbounded channels. What would it take to remove the rest of them?

@djc
Copy link
Collaborator

djc commented Feb 24, 2023

Is there any evidence that this particular usage of unbounded channels is bad? If you are so worried about this, feel free to submit a PR and we'll consider it.

@Ralith
Copy link
Collaborator Author

Ralith commented Feb 24, 2023

The remaining "unbounded" channel introduced in this PR, in fact, bounded by the number of concurrently active connections. It could be replaced with various other sorts of queue to make that more obvious (an intrusive list might be cool), but the behavior would be the same.

@mxinden
Copy link
Contributor

mxinden commented Feb 24, 2023

@mxinden @Ralith This does not remove all unbounded channels. What would it take to remove the rest of them?

@DemiMarie are you referring to the unbounded channel used by a connection to wake up the endpoint?

    /// `handle` is sent here when `wake` is called, prompting the endpoint to drive the connection
    dirty: mpsc::UnboundedSender<ConnectionHandle>,

https://github.com/quinn-rs/quinn/pull/1219/files#diff-60e27999f6922becedd55cb952eb6a67db93ed5d136f2e22451d0ce83a73b89bR779-R780

I don't think this poses any issues. As @Ralith says above, they are bounded by the number of connections. I consider the usage a neat pattern of many-to-wake-one, transporting the id along the way.

Is there any evidence that this particular usage of unbounded channels is bad?

In case I am not missing something and indeed we are talking about the dirty unbounded channel, I don't see a downside to the usage.

In case one would want to ban the usage of unbounded channels across the codebase, one could also use a bounded channel. Given that the sender is cloned for each connection, there is a guaranteed slot per sender, thus always capacity to send, no .await needed. (Not advocating for it.)

The channel’s capacity is equal to buffer + num-senders.

https://docs.rs/futures/latest/futures/channel/mpsc/fn.channel.html

@thomaseizinger
Copy link

In case one would want to ban the usage of unbounded channels across the codebase

FWIW, clippy can help with this using its disallowed_methods lint:

disallowed-methods = [
  { path = "futures::channel::mpsc::unbounded", reason = "does not enforce backpressure" },
]

Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small review with small suggestion and question for better understanding.

quinn/src/connection.rs Outdated Show resolved Hide resolved
quinn/src/endpoint.rs Outdated Show resolved Hide resolved
Comment on lines 528 to 532
// Buffer the list of initially dirty connections, guaranteeing that the connection
// processing loop below takes a predictable amount of time.
while let Poll::Ready(Some(conn_handle)) = self.dirty_recv.poll_recv(cx) {
self.dirty_buffer.push(conn_handle);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the assumption here is that self.dirty_recv.poll_recv may fill up faster than the Endpoint can drive the connection below, thus leading to an endless loop without the here introduced self.dirty_push intermediary step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly. This initial loop is safe because it doesn't reset is_dirty.

quinn/src/endpoint.rs Outdated Show resolved Hide resolved
@djc
Copy link
Collaborator

djc commented Mar 2, 2023

@mxinden thanks for the careful review!

@Ralith
Copy link
Collaborator Author

Ralith commented Mar 4, 2023

Tweaked per @mxinden's feedback (thanks!) and rebased.

@Arqu
Copy link

Arqu commented Apr 16, 2023

Hey 👋

We're using quinn in iroh. I've run some quick benchmarks with a branch propped up to use this PR. I've noticed a significant drop in throughput (40-50%) when serving data from one node to 2 or more clients. One to one transfers don't seem significantly impacted yet (most probably because other parts of our code don't saturate the pipe).

From my observations with the PR the CPU on my test rig seems to peg to around 220% ie 2.2 cores and more clients just keep diluting the throughput between them. The same test when running against 0.9.3 seems to be able to utilize the CPU for as much as it is requested for and scaling ~linearly with clients. Seeing some 700-900% in the 1 to 10 test cases.

Didn't dive too deep into it yet, but my initial guess would be that we have either some mutex locking or polling happening that makes some run loop starve itself.

Let me know if there's anything specific you would like me to poke at or test out.

@Ralith
Copy link
Collaborator Author

Ralith commented Apr 19, 2023

Interesting, thank you for the report! We should try to understand that, and hopefully mitigate, before merging this. I do think this is an important foundation for an ultimately much more parallel quinn, but undermining our present scalability for a hypothetical future isn't ideal.

@Ralith Ralith marked this pull request as draft July 7, 2023 04:05
@Ralith
Copy link
Collaborator Author

Ralith commented Dec 17, 2023

Closing as this is quite stale and understood to be a suboptimal direction due to reduced opportunity for work stealing and parallelism.

@Ralith Ralith closed this Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce task-switching overhead on I/O path
8 participants