-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis connection issues #27
Comments
If it helps anything. Here is tcpdump output
The first one is timeout, second is ok. As you see there is flag R (reset). Don't know why this is actually sent. |
@urosgruber Several notes:
|
And I should add that, when connect() is timed out, nginx will close the current connection right away and subsequent (late) response packets from the Redis server side will yield RST. |
Btw is it better to have set_keepalive for redis client or to disable keepalive? |
Hello! On Sun, Nov 10, 2013 at 5:36 PM, Uroš Gruber wrote:
Yes, of course. This has actually been a commen problem :) The Redis
Enabling the Redis connection pool on the client side (i.e., in Nginx) Regards, |
With timeout set to 10s things are looking better. I also check that backlog limit I have on this server is probably a bit low kern.ipc.somaxconn: 128 and checking netstat -Lan gives me this
80 is frontend proxy nginx and at 5000 are backend servers. |
Hm, and the issue is back. Server was in idle. There was no open connections, no pf states. When I refreshed the webpage I received 500 errror, checked the log and again timeout problem. |
I also try socket configuration and connection throug /tmp/redis.sock and it works without any issues. Btw increased kern.ipc.somaxconn=1024 does make changes to how much backlog limit is set for redis and it's now 512, but still does not help. Random timeouts on idle server still happened. |
Another possibility for timeout issues is that your Nginx worker process's event loop is heavily blocked by something. You can try porting the epoll-loop-blocking-distr tool over to kqueue and FreeBSD: https://github.com/agentzh/stapxx#epoll-loop-blocking-distr and the off-CPU flamegraph tool over FreeBSD's dtrace too: https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt-off-cpu And it's wise to monitor the latency in the accept() and recv() queues in your kernel, for example, see the tools for Linux: https://github.com/agentzh/nginx-systemtap-toolkit#tcp-accept-queue https://github.com/agentzh/nginx-systemtap-toolkit#tcp-recv-queue Also you can write your own custom dtrace scripts to trace various user-land and kernel-space events and socket states associated with those timeout connections on-the-fly on your side. Also please distinguish timeout errors in different operations like connect(), send(), and receive(). They mean different things :) |
For now I'll try to stick with /tmp/redis.sock becaue I don't have enough knowlage for DTrace. Btw a lot of nice tools you have there :) I also need to try the same setup with some other servers. |
你好,春哥: |
@xidianwlc Ah, please no Chinese here. This place is English only. If you really want to use Chinese, please join the openresty (Chinese) mailing list instead. Please see https://openresty.org/#Community Regarding your question, it's already an FAQ. See my (Chinese) replies (to similar questions from others) on the aforementioned mailing list: https://groups.google.com/d/msg/openresty/e-r69KtAWek/wJ3cdzxluhUJ https://groups.google.com/d/msg/openresty/h3l6jAo3aD0/UvQGlF77cUwJ |
hi, agentzh: |
@xidianwlc When you see "connect timed out", it usually means
You can consider
|
@xidianwlc Regarding the My hunch is that you always fail to put your connections back to the connection pool in the first place so that your pool is always empty. |
@agentzh there were no nginx error log for this |
@agentzh I find some articles from https://groups.google.com/forum/#!topic/openresty/h3l6jAo3aD0 |
@xidianwlc Isn't it already stated in the mailing list posts I mentioned earlier? https://groups.google.com/d/msg/openresty/e-r69KtAWek/wJ3cdzxluhUJ https://groups.google.com/d/msg/openresty/h3l6jAo3aD0/UvQGlF77cUwJ Alas. you didn't read my comments carefully. |
@xidianwlc It's also officially documented: https://github.com/openresty/lua-nginx-module#tcpsocksetkeepalive "When the connection pool exceeds the available size limit, the least recently used (idle) connection already in the pool will be closed to make room for the current connection." |
I also met the error info "lua tcp socket connect timed out, when connecting to xxx:xxx"(xxx:xxx is the master redis server). We group had been working on it for several days. I checked a lot issues but there were no resolution. At last we used connection pool to decrease this error - but not fixed it. It's on production env, and I'm sure the configure is no mistaken. On 3 different prod envs, one gived big chances of this error msg, the others are rarely. This one only have 2 cores (another one 8 cores and used 4 cores for thie nginx) but other sys config are totally same. And also same as redis. There are some info:
|
is the timeout issue occurred when you capture the TCP? |
Nop, there were not any clue for connection timeout issue. There were no SYN retransmition. And no packet just abnormal at SYN. |
What did you mean by "abnormal at SYN"? |
If there were connect timeout, I thought, there might:
The TCP streams what I saw all were normal as expected. |
Hi,
I'm doing some redis lookups in my lua script and I'm getting some weird connection problems which I don't know how to properly debug to actually get to the source of the problem.
Here is snip from config
On a random basis I get a lot of Redis connection failure: timeout in nginx error log. While debug loging redis server I don't see any connections at that same moment.
I'm using latest lua-resty-redis and 0.8.6 ngx_lua module. I also try to upgrade to 0.9.2 but then I get even more strange errors
I don't know if this one is actually related. Is there anything I can try to get more detail info where things actually broke. Server is FreeBSD with pf enabled, nginx 1.5.x.
Regards,
Uros
The text was updated successfully, but these errors were encountered: