Latency problem with 1k req/sec #50

zavada · 2014-10-09T21:57:02Z

I'm using 2.8.17 redis server.
Send requests throw unix.sock by nginx+lua. Each request is one SMEMBERS command. Now I have about 1k req per sec. In the nginx access log I see mostly next

[09/Oct/2014:18:04:49 +0300] [ 1412867089.670 : 0.002 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.670 : 0.002 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.670 : 0.002 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.670 : 0.002 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.671 : 0.001 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.672 : 0.000 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.673 : 0.001 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.673 : 0.001 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.674 : 0.000 : 198 ] 212.1.66.241 - - "POST     /API/0.1/url/getCategory/ HTTP/1.1" 200 37 
[09/Oct/2014:18:04:49 +0300] [ 1412867089.681 : 0.000 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29

You can see that request time is not more than 2ms. But almost every second I have a few requests with request time about 100ms

[09/Oct/2014:18:05:22 +0300] [ 1412867122.843 : 0.131 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:05:23 +0300] [ 1412867123.212 : 0.100 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29 
[09/Oct/2014:18:05:23 +0300] [ 1412867123.613 : 0.101 : 190 ] 212.1.66.241 - - "POST /API/0.1/url/getCategory/ HTTP/1.1" 200 29

I can't find the problem place.
I checked everything from the latency guide
Slowlog has no entries
System is not swapping
I don't use AOF

I read a lot of posts on stackoverflow like this
http://stackoverflow.com/questions/16841469/redis-performance-tuning/23719773#23719773
and I can't find the solution.

redis-benchmark gives me next

redis-benchmark -s /tmp/redis.sock -q -n 10000 -c 1 -d 50
PING_INLINE: 51282.05 requests per second
PING_BULK: 64935.07 requests per second
SET: 39840.64 requests per second
GET: 28901.73 requests per second
INCR: 26525.20 requests per second
LPUSH: 29411.76 requests per second
LPOP: 23255.81 requests per second
SADD: 21551.72 requests per second
SPOP: 23041.47 requests per second
LPUSH (needed to benchmark LRANGE): 45045.04 requests per second
LRANGE_100 (first 100 elements): 10101.01 requests per second
LRANGE_300 (first 300 elements): 4196.39 requests per second
LRANGE_500 (first 450 elements): 3337.78 requests per second
LRANGE_600 (first 600 elements): 7320.64 requests per second
MSET (10 keys): 62111.80 requests per second

The text was updated successfully, but these errors were encountered:

agentzh · 2014-10-09T22:16:16Z

@zavada What nginx variable are you using for the "request time" field in your access log? Is it $request_time? If yes, then the time specified by the $request_time variable includes the network I/O time between the client and your web server, which can be relatively large for slow downstream networks.

zavada · 2014-10-10T06:49:31Z

@agentzh Yes, I use $request_time.
But if I comment the line

local categories, err = red:smembers("categories:"..clear_url)

I don't have in the result this >100ms time requests. Thats why I think that problem is not in network I/O time between the client and my web server.

Maybe do you have other idea?

agentzh · 2014-10-10T19:43:45Z

@zavada Then you'll have to use tools like systemtap and dtrace to grab more details from those slow request samples in various places: nginx processes, kernel network stack, and the redis-server process. See my sample tools based on systemtap in the nginx-systemtap-toolkit and stapxx projects on GitHub for some ideas.

agentzh · 2014-10-10T19:47:10Z

@zavada Also, $request_time may have some error due to the time caching mechanism in the nginx core. Use the ngx-lua-tcp-recv-time tool in my stapxx project to measure the upstream cosocket recv latency time accurately at real time:

https://github.com/openresty/stapxx#ngx-lua-tcp-recv-time

agentzh · 2014-10-10T19:51:05Z

@zavada Furthermore, watch the CPU usage of your nginx worker processes, if their CPU usage can hit 100% from time to time, then you might just run out of CPU time and such intermittent long latency is totally reasonable. Similarly, you should watch the CPU usage of your redis-server process as well.

If any process is exhausting the CPU time, then you should use the various flame graph sampling tools to analyze the bottleneck:

https://github.com/openresty/nginx-systemtap-toolkit#sample-bt

https://github.com/openresty/stapxx#lj-lua-stacks

Regards,
-agentzh

agentzh · 2014-10-10T19:54:07Z

@zavada Sometimes, nginx may just block on some blocking I/O system calls (like file I/O) or semaphores used as some internal locks in nginx. Such blocking things can contribute to your cosocket latency as well. You can just the epoll-loop-blocking-distr tool to verify this:

https://github.com/openresty/stapxx#epoll-loop-blocking-distr

and further use the off-CPU time flame graph tool to analyze the causes:

https://github.com/openresty/nginx-systemtap-toolkit#sample-bt-off-cpu

zavada · 2014-10-12T09:34:58Z

@agentzh When I try to run your examples I have next error
Checking "/lib/modules/2.6.32-431.el6.x86_64/build/.config" failed with error: No such file or directory
On my centos server I have "/lib/modules/2.6.32-431.el6.x86_64/build/" but don't ".config" inside. Do you know how I can fix it?

Thank you for your time and help.

agentzh · 2014-10-12T22:45:58Z

Hello!

On Sun, Oct 12, 2014 at 2:34 AM, zavada wrote:

@agentzh When I try to run your examples I have next error
Checking "/lib/modules/2.6.32-431.el6.x86_64/build/.config" failed with error: No such file or directory
On my centos server I have "/lib/modules/2.6.32-431.el6.x86_64/build/" but don't ".config" inside. Do you know how I can fix it?

Have you installed the "kernel-devel" and "kernel-debuginfo" packages
for your kernel actually being used?

See the following documentation for more details:

https://www.sourceware.org/systemtap/SystemTap_Beginners_Guide/using-systemtap.html#install-kinfo

Regards,
-agentzh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency problem with 1k req/sec #50

Latency problem with 1k req/sec #50

zavada commented Oct 9, 2014

agentzh commented Oct 9, 2014

zavada commented Oct 10, 2014

agentzh commented Oct 10, 2014

agentzh commented Oct 10, 2014

agentzh commented Oct 10, 2014

agentzh commented Oct 10, 2014

zavada commented Oct 12, 2014

agentzh commented Oct 12, 2014

Latency problem with 1k req/sec #50

Latency problem with 1k req/sec #50

Comments

zavada commented Oct 9, 2014

agentzh commented Oct 9, 2014

zavada commented Oct 10, 2014

agentzh commented Oct 10, 2014

agentzh commented Oct 10, 2014

agentzh commented Oct 10, 2014

agentzh commented Oct 10, 2014

zavada commented Oct 12, 2014

agentzh commented Oct 12, 2014