-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add connection timeout to the cluster client #834
Add connection timeout to the cluster client #834
Conversation
Note: This is a proposed alternative to #833 |
I think this looks good -- simple and non-breaking! |
3rd option: #835 |
26a5ad4
to
c05ec82
Compare
@jaymell right, moved to ready for review. Also added support for sync cluster, for consistency - I didn't want to have a configuration in |
hey, mates. I'm using ConnectionManager and really waiting for the connection t/0 configuration for ConnectionManager / MultiplexedConnection. @jaymell @nihohit if I able to speed up and help with this t/o feature delivery - just let me know. |
c05ec82
to
1eb8318
Compare
@jaymell Hey, mate. Would you be able to review it? |
@nihohit seems you have a conflicts after the latest merges. |
1eb8318
to
c6e7c98
Compare
rebased |
@jaymell any chance to review this PR ? |
Yep, will get to it. Thanks for your patience. |
rebased over #875 |
37a7c74
to
5a8d2ad
Compare
I've been trying to benchmark the response-timeout changes locally and haven't found a significant difference, though I don't particularly trust my results. Any lingering doubts about the performance impacts of those changes? |
I've received roughly the same results in repeated benchmarks. I tried moving things around, but can't find either the reason, or a solution. |
a30880c
to
e4852ee
Compare
fixed breaking changes in |
e4852ee
to
442c12d
Compare
I tried to use this branch in my service, and it is broken for me. If I trigger a timeout I get a panic in How to ReproduceI am using a 1 node Redis cluster for testing. There I use the command The code I useuse redis::{cluster::ClusterClientBuilder, cluster_async::ClusterConnection, Value};
use std::{
sync::atomic::{AtomicU64, Ordering},
time::Duration,
};
static REQUEST_COUNTER: AtomicU64 = AtomicU64::new(0);
#[tokio::main]
async fn main() {
let connection = ClusterClientBuilder::new(vec!["redis://:password@redis-0"])
.connection_timeout(Duration::from_millis(400))
.response_timeout(Duration::from_millis(400))
.build()
.unwrap()
.get_async_connection()
.await
.unwrap();
loop {
let connection = connection.clone();
tokio::spawn(do_request(connection));
tokio::time::sleep(Duration::from_millis(250)).await;
}
}
async fn do_request(mut connection: ClusterConnection) {
redis::cmd("role")
.query_async::<_, Value>(&mut connection)
.await
.unwrap();
let counter = REQUEST_COUNTER.fetch_add(1, Ordering::SeqCst);
println!("request {counter}");
} |
@kamulos can you please write a self-contained test to demonstrate the issue? It's unclear how/when #[test]
fn test_response_timeout_reuse() {
let cluster = TestClusterContext::new(3, 0);
block_on_all(async move {
let mut connection = cluster.async_connection().await;
let mut cmd = redis::Cmd::new();
cmd.arg("BLPOP").arg("foo").arg(0); // 0 timeout blocks indefinitely
let result = connection.req_packed_command(&cmd).await;
assert!(result.is_err());
assert!(result.unwrap_err().is_timeout());
loop {
let result: RedisResult<Value> = redis::cmd("GET")
.arg("foo")
.query_async(&mut connection.clone())
.await;
let counter = REQUEST_COUNTER.fetch_add(1, Ordering::SeqCst);
println!("request {counter} {}", result.is_ok());
}
});
} |
422857b
to
a0a293d
Compare
a0a293d
to
c11df7e
Compare
Let's get it in! |
@nihohit sorry for being a bit quiet. I'll be able to test it later today |
@nihohit I finally got to to testing it again. I am not entirely sure what is happening, but my educated guess would be something along these lines: In my Test I have a Redis Node, where I execute the command I don't believe this merge request is at fault, because the whole panic in When the panic occurs, it ultimately is caught by tokio and other tasks are able to continue running. However the ClusterConnection is broken at that point and does not recover until the whole program is restarted. This is quite critical in my use-case. If I would speculate about why the Connection is broken, I think it is just not panic-safe and we somehow end up in an inconsistent state. What I observe is that after the panic the first few requests end with the Error For some reason the mpsc is broken, but the In my application I solved the timeouts, by just wrapping the So in summary I think there is a critical flaw in the |
Yes, I assume that the timeouts caused connections to be removed from the connections map, which eventually causes the panic. I believe #968 will help there, but as you mentioned, it's not caused by this change. |
…s. (redis-rs#834) * Added zrem command in node. * Added zadd and zaddIncr to transactions in node.
This allows the users to define a connection timeout to the async cluster, which will cause the
refresh_slots
action to timeout if it takes too long to connect to a node.