Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Nginx overloaded while redis server unavailable #172

Closed
Fry-kun opened this Issue · 11 comments

2 participants

@Fry-kun

While restarting redis server, nginx CPU usage goes to 100% for all workers. I would've expected that since pretty small timeouts are set, nginx processes should not be affected...

Here's my config

nginx.conf:
...
upstream redis-metrics {
hash $arg_key;
server gw513:6391;
server gw514:6391;
keepalive 1024; # keepalive must come after hash!
}
...
location /redis_metrics_internal {
internal;
redis2_connect_timeout 5ms;
redis2_send_timeout 5ms;
redis2_read_timeout 10ms;
redis2_pass redis-metrics;
redis2_raw_query $echo_request_body;
}

sitemetrics.lua:
...
local qry={'EVAL', "local u=require('redis_utils'); return u.sitemetrics(KEYS,ARGV)", 6, ... }
ngx.location.capture("/redis_metrics_internal?key="..hex_ip, {method=ngx.HTTP_PUT, body=redis_parser_build_query(qry)})

Here are the errors that show up en masse while redis server is being restarted. Note: these are written to a ram drive (/dev/shm) so as not overload HDD with writes

2012/10/29 14:11:22 [error] 15033#0: *7354385 upstream timed out (110: Connection timed out) while connecting to upstream, client: 123.45.53.122, server: mysites, request: "GET /somepage HTTP/1.1", subrequest: "/redis_metrics_internal", upstream: "redis2://192.168.0.15:6391", host: "mysite.net", referrer: "http://mysite.net/"

2012/10/29 14:01:35 [error] 15034#0: *4681058 lua handler aborted: runtime error: /usr/local/nginx/lua/sitemetrics.lua:38: failed to issue subrequest: -1
stack traceback:
coroutine 0:

redis_version:2.9.7 (built from git)

nginx version: nginx/1.3.6
built by gcc 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC)
TLS SNI support enabled

@agentzh
Owner
@agentzh
Owner
@Fry-kun
  • pstack returns nothing (do I need to recompile without stripping debug info?)
  • strace has a lot of output, will send email if you don't mind
  • will try without upstream module
  • yes, lots of traffic on the server

Definitely not out of system RAM:
Mem: 132017596k total, 81299744k used, 50717852k free, 1000488k buffers
Swap: 8191996k total, 0k used, 8191996k free, 46979548k cached

@agentzh
Owner
@agentzh
Owner
@Fry-kun

Yes, nginx workers stay at 100% forever (until redis server is restored)
You're right, there's actually a message saying "subrequests cycle"! That must be it!

Also, I didn't realize lua-resty-redis is more efficient/etc. -- I'll probably switch the config over soon :)

@Fry-kun

Unfortunately, setting redis2_next_upstream off didn't help -- still getting "subrequests cycle".. and it still takes ~140 tries for it to get there... that's weird

@agentzh
Owner
@Fry-kun

Found the problem:
error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 500 501 502 503 504 505 $request_uri; # hack to use nginx internal error pages

Stupid config that seemed to work with other pages for some reason -- but with these requests, nginx kept banging on redis2 subrequest (~140 times)

Got rid of this garbage and everything is fine now

P.S. pstack never returned anything, though gdb showed a stack trace without a problem...

@agentzh
Owner
@agentzh
Owner

I'm closing this :)

@agentzh agentzh closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.