-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add graceful shutdown timeout for Apache #66
Add graceful shutdown timeout for Apache #66
Conversation
…s to not-present, so default behaviour should not change.
…wn any idle connections. This is the same timeout used for client connections, so should have no user-visible effect.
What problem is this solving? I mean five minutes is much longer than we would normally want to wait... |
I mean I don't ever remember waiting that long for apache to restart so either the default is less than that, or we just don't have connections that need that long, or the master process restarts to take new connections and the old one hangs around harmlessly sending data back to the outstanding clients. |
I think that this may help mitigate circumstances where we would otherwise get |
Well "scoreboard is full" is a secondary problem normally, not a root cause. It just means something else is screwed and connections aren't making progress so eventually apache runs out of slots. I'm not sure why changing the behaviour of shutdown would even help that? Presumably nobody was shutting apache down when they happened given we were all asleep... An actual connection timeout would help but that's hard because a small proportion of our connections are legitimatally long running. |
The suggestion in this Apache bug report is that MPM workers stopping due to reduced load will enter a "shutting down" state which consumes "scoreboard" slots while waiting for the connections to finish. The default timeout is infinity, so they can wait until the TCP connection resets if the other side has vanished. When the server takes more load, the "shutting down" state isn't reversed, but new MPM workers started, which leads to eventual resource starvation. Unfortunately, I wasn't around to capture |
So the thing is I read http://httpd.apache.org/docs/2.4/mod/mpm_common.html#gracefulshutdowntimeout as only applying when apache is shutdown with Now you may well be right - the documentation could easily be read either way. I'm not sure it helps that much though if you're the person whose large diff upload happened to be one of the last requests sent to a server and you don't get the reply because the five minute timeout was hit... |
One data point - the longest request on thorn-04 today was 18.5 minutes. |
Good point. I see we already have the timeout for proxied connections set very high. In which case, adding a shutdown timeout on top of it probably isn't going to make much difference. The real fix is clearly API change, but while out 99.99% upload response time is 351s (for a very large changeset, it has to be said) a timeout of 300s isn't going to work. |
Oh I was still looking at this... But I need to unbreak logstash first so I can see the distribution of run times and I had to give up on that and go to bed in the end last night. |
I just did some changeset upload statistics from thorn-04's
0.4% (35 / 87975) finished at or after 300s. |
The figure in the above comment should have been 399s not 351s - I wasn't paying enough attention to the rounding, so it's the 99.985th percentile rather than the 99.99th! |
@tomhughes humour me and my worrying? 😉