Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should timeout be read_timeout? #795

Closed
jesseforrest opened this issue Apr 20, 2016 · 5 comments · Fixed by #2459
Closed

Should timeout be read_timeout? #795

jesseforrest opened this issue Apr 20, 2016 · 5 comments · Fixed by #2459

Comments

@jesseforrest
Copy link

We are trying to understand the difference between the timeout and read_timeout. We noticed that c->waitms is set to timeout instead of read_timeout on this line:

/* Set up our waitms based on timeout */

We recently reduced timeout from 5 seconds to 1 second. Now we are seeing a significant number of "Timed out attempting to find data in the correct node!" errors and would have assumed it was tied to read_timeout, but it's actually tied to timeout. Is timeout the correct value to be set for c->waitms?

Our current production settings have:

timeout = 1 sec
read_timeout = 6 sec

Any advice on the appropriate next steps would be appreciated.

@michael-grunder
Copy link
Member

michael-grunder commented Apr 20, 2016

Hey,

Right so the "timeout" setting in the context of cluster refers to how long you are willing to let phpredis attempt to get your data before it returns execution to you.

Timeouts are pretty simple on a single instance as you can just use something like a connection timeout (how long to wait for a connection), and a read timeout (how long to wait to read a response).

If you're using a cluster, however, it gets more complicated. The cluster could be resharding, or failing over, etc. A way to describe it is "how long should phpredis let the cluster bounce us around before failing". It's different than a standard socket timeout, in that there may never be any TCP/IP communication error, but we still want to time out. The reason for this is so it doesn't just wait an infinite amount of time while it could be getting MOVED, ASKING, requests, etc.

Does that make sense?

Cheers,
Mike

@jesseforrest
Copy link
Author

jesseforrest commented Apr 20, 2016

Thanks for the fast response Michael. We are using Redis cluster, so I do understand the logic is more complicated. Your description does clearly define the differences.

However, I'm still confused why the code is using the timeout variable instead of the read_timeout. I figured timeout would only be for establishing a connection, while read_timeout would be how long to wait to read a response. A little more clarity would be helpful. I just want to make sure the engineers here understand the difference between the two variables.

@CatKang
Copy link
Contributor

CatKang commented Jun 29, 2016

Hey
I encounter the same confusion with @jesseforrest

What's more, I found the condition set on the line :

timedout = resp && c->waitms ? mstime() - msstart >= c->waitms : 0;

makes the client has no chance to retry after it received MOVED, when the connect timeout happened in CLUSTER_LAZY_CONNECT.

Would you please give us more clarity information?

@ppanphper
Copy link

redis_cluster_init {
...
/* Set our timeout and read_timeout which we'll pass through to the
* socket type operations */
c->timeout = timeout;
c->read_timeout = read_timeout;

/* Set our option to use or not use persistent connections */
c->persistent = persistent;

/* Calculate the number of miliseconds we will wait when bouncing around,
 * (e.g. a node goes down), which is not the same as a standard timeout.*/

**c->waitms = (long)(timeout * 1000);**

...
}

cluster_send_command {
....
/* Figure out if we've timed out trying to read or write the data */
timedout = resp && c->waitms ? mstime() - msstart >= c->waitms : 0;
...
}
the redis cluster not use read_timeout?

@yatsukhnenko
Copy link
Member

Old issue without any activity for a long time

kjoe added a commit to kjoe/phpredis that referenced this issue Mar 17, 2024
When a node timeout occurs, then phpredis will try to connect to another
node, whose answer probably will be MOVED redirect. After this we need
more time to accomplish the redirection, otherwise we get "Timed out
attempting to find data in the correct node" error message.

Fixes phpredis#795 phpredis#888 phpredis#1142 phpredis#1385 phpredis#1633 phpredis#1707 phpredis#1811 phpredis#2407
michael-grunder pushed a commit that referenced this issue Mar 25, 2024
When a node timeout occurs, then phpredis will try to connect to another
node, whose answer probably will be MOVED redirect. After this we need
more time to accomplish the redirection, otherwise we get "Timed out
attempting to find data in the correct node" error message.

Fixes #795 #888 #1142 #1385 #1633 #1707 #1811 #2407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants