Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a timeout for checking or creating async-cluster connections #866

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

barshaul
Copy link
Contributor

@barshaul barshaul commented Jun 17, 2023

When PINGing or re-connecting to a node that has issues or has been shut down, the request can get hanged and stuck, preventing the client from getting the new cluster topology. In order to prevent hanging on broken connections
a timeout of 250 milliseconds was added to get_or_create_conn().

@barshaul barshaul marked this pull request as ready for review June 17, 2023 10:46
@jaymell
Copy link
Contributor

jaymell commented Jun 21, 2023

Hi, thanks for the submission. Does it make more sense to implement this as a configurable feature on the underlying connection instead? This is something that I think has been badly needed on the async side but has thus far not been implemented.

@nihohit
Copy link
Contributor

nihohit commented Jun 21, 2023

Discussed this with @barshaul already - it's very similar to #834
IMO we don't need to configure this on the underlying connection, maybe on get_simple_async_connection, but the issue this comes to solve (hanging when the server replaces nodes) doesn't require only a timeout on connection, but also on the check_connection function, since a dropped connection might hang indefinitely. So we'll need a timeout in the cluster connection in any case, unless we also want to configure a response timeout in the underlying connection.

@jaymell
Copy link
Contributor

jaymell commented Jun 23, 2023

Exactly, response timeouts is what I was attempting to refer to. We have this on the sync connection, would be wonderful if we could do the same for async.

@nihohit
Copy link
Contributor

nihohit commented Jun 23, 2023

Adding a response timeout to MultiplexedConnection is rather simple, but IMO it's the wrong thing to do. It's easy for the user to wrap each async operation with a timeout, and the user also doesn't need to use a runtime agnostic timeout function - they can choose the ones that are native to Tokio or async-std. Since timeout checks aren't zero-cost, this means that users that aren't interested in them will also pay the price.
IMO a better solution is to add a response timeout which is used internally in the cluster connection, though, or add a type that wraps MultiplexedConnection with timeouts. Just say what's your preferred solution and I'll implement it pronto.

@jaymell
Copy link
Contributor

jaymell commented Jun 26, 2023

I'm not clear on the overhead of implementing timeouts, but I disagree that it's easy for users to wrap requests in timeouts... Obviously it's not hard to do so, but it's not obvious to users whether it's safe to do so without corrupting the connection. This is supported on sync connections, and in my view it seems reasonable to do the same for async if we can do so with a sane implementation.

@nihohit
Copy link
Contributor

nihohit commented Jun 26, 2023

proposed timeouts for async connections:
#875

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants