Nginx overloaded while redis server unavailable #172

Fry-kun opened this Issue

While restarting redis server, nginx CPU usage goes to 100% for all workers. I would've expected that since pretty small timeouts are set, nginx processes should not be affected...

Here's my config

upstream redis-metrics {
hash $arg_key;
server gw513:6391;
server gw514:6391;
keepalive 1024; # keepalive must come after hash!
location /redis_metrics_internal {
redis2_connect_timeout 5ms;
redis2_send_timeout 5ms;
redis2_read_timeout 10ms;
redis2_pass redis-metrics;
redis2_raw_query $echo_request_body;

local qry={'EVAL', "local u=require('redis_utils'); return u.sitemetrics(KEYS,ARGV)", 6, ... }
ngx.location.capture("/redis_metrics_internal?key="..hex_ip, {method=ngx.HTTP_PUT, body=redis_parser_build_query(qry)})

Here are the errors that show up en masse while redis server is being restarted. Note: these are written to a ram drive (/dev/shm) so as not overload HDD with writes

2012/10/29 14:11:22 [error] 15033#0: *7354385 upstream timed out (110: Connection timed out) while connecting to upstream, client:, server: mysites, request: "GET /somepage HTTP/1.1", subrequest: "/redis_metrics_internal", upstream: "redis2://", host: "", referrer: ""

2012/10/29 14:01:35 [error] 15034#0: *4681058 lua handler aborted: runtime error: /usr/local/nginx/lua/sitemetrics.lua:38: failed to issue subrequest: -1
stack traceback:
coroutine 0:

redis_version:2.9.7 (built from git)

nginx version: nginx/1.3.6
built by gcc 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC)
TLS SNI support enabled

  • pstack returns nothing (do I need to recompile without stripping debug info?)
  • strace has a lot of output, will send email if you don't mind
  • will try without upstream module
  • yes, lots of traffic on the server

Definitely not out of system RAM:
Mem: 132017596k total, 81299744k used, 50717852k free, 1000488k buffers
Swap: 8191996k total, 0k used, 8191996k free, 46979548k cached


Yes, nginx workers stay at 100% forever (until redis server is restored)
You're right, there's actually a message saying "subrequests cycle"! That must be it!

Also, I didn't realize lua-resty-redis is more efficient/etc. -- I'll probably switch the config over soon :)


Unfortunately, setting redis2_next_upstream off didn't help -- still getting "subrequests cycle".. and it still takes ~140 tries for it to get there... that's weird


Found the problem:
error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 500 501 502 503 504 505 $request_uri; # hack to use nginx internal error pages

Stupid config that seemed to work with other pages for some reason -- but with these requests, nginx kept banging on redis2 subrequest (~140 times)

Got rid of this garbage and everything is fine now

P.S. pstack never returned anything, though gdb showed a stack trace without a problem...


I'm closing this :)

@agentzh agentzh closed this
