Add graceful shutdown timeout for Apache #66

zerebubuth · 2016-05-31T18:54:51Z

@tomhughes humour me and my worrying? 😉

…s to not-present, so default behaviour should not change.

…wn any idle connections. This is the same timeout used for client connections, so should have no user-visible effect.

tomhughes · 2016-05-31T18:57:32Z

What problem is this solving? I mean five minutes is much longer than we would normally want to wait...

tomhughes · 2016-05-31T19:01:06Z

I mean I don't ever remember waiting that long for apache to restart so either the default is less than that, or we just don't have connections that need that long, or the master process restarts to take new connections and the old one hangs around harmlessly sending data back to the outstanding clients.

zerebubuth · 2016-05-31T19:02:22Z

I think that this may help mitigate circumstances where we would otherwise get scoreboard is full errors. Under normal conditions, I don't think this would have an effect, and I've never waited that long for Apache, which is partly why I chose a long timeout to try and avoid impacting normal operations.

tomhughes · 2016-05-31T19:23:20Z

Well "scoreboard is full" is a secondary problem normally, not a root cause. It just means something else is screwed and connections aren't making progress so eventually apache runs out of slots.

I'm not sure why changing the behaviour of shutdown would even help that? Presumably nobody was shutting apache down when they happened given we were all asleep...

An actual connection timeout would help but that's hard because a small proportion of our connections are legitimatally long running.

zerebubuth · 2016-05-31T19:53:44Z

The suggestion in this Apache bug report is that MPM workers stopping due to reduced load will enter a "shutting down" state which consumes "scoreboard" slots while waiting for the connections to finish. The default timeout is infinity, so they can wait until the TCP connection resets if the other side has vanished. When the server takes more load, the "shutting down" state isn't reversed, but new MPM workers started, which leads to eventual resource starvation.

Unfortunately, I wasn't around to capture /server-status when it was happening, so I can't confirm whether all the slots were really in the G state. But I think it's worth adding the timeout, just to be on the safe side. If 300s is too short, then 600s or more would still be better than an outage.

tomhughes · 2016-05-31T22:04:53Z

So the thing is I read http://httpd.apache.org/docs/2.4/mod/mpm_common.html#gracefulshutdowntimeout as only applying when apache is shutdown with apachectl graceful but you're reading it as applying when an individual process is recycled because it has hit it's connection limit.

Now you may well be right - the documentation could easily be read either way.

I'm not sure it helps that much though if you're the person whose large diff upload happened to be one of the last requests sent to a server and you don't get the reply because the five minute timeout was hit...

tomhughes · 2016-05-31T22:13:51Z

One data point - the longest request on thorn-04 today was 18.5 minutes.

zerebubuth · 2016-06-01T11:25:31Z

Good point. I see we already have the timeout for proxied connections set very high. In which case, adding a shutdown timeout on top of it probably isn't going to make much difference.

The real fix is clearly API change, but while out 99.99% upload response time is 351s (for a very large changeset, it has to be said) a timeout of 300s isn't going to work.

tomhughes · 2016-06-01T11:29:13Z

Oh I was still looking at this... But I need to unbreak logstash first so I can see the distribution of run times and I had to give up on that and go to bed in the end last night.

zerebubuth · 2016-06-01T11:35:32Z

I just did some changeset upload statistics from thorn-04's access.log.1, so probably the same you were looking at:

99% of requests finished in <42s
99.9% of requests finished in <161s
99.99% of requests finished in <399s.

0.4% (35 / 87975) finished at or after 300s.

zerebubuth · 2016-06-01T11:37:34Z

The figure in the above comment should have been 399s not 351s - I wasn't paying enough attention to the rounding, so it's the 99.985th percentile rather than the 99.99th!

zerebubuth added 2 commits May 31, 2016 19:53

Add graceful shutdown timeout option to Apache configuration. Default…

0739169

…s to not-present, so default behaviour should not change.

Web frontends should wait no longer than 5 minutes before shutting do…

16b4f78

…wn any idle connections. This is the same timeout used for client connections, so should have no user-visible effect.

zerebubuth closed this Jun 1, 2016

zerebubuth deleted the add-graceful-shutdown-timeout branch June 1, 2016 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graceful shutdown timeout for Apache #66

Add graceful shutdown timeout for Apache #66

zerebubuth commented May 31, 2016

tomhughes commented May 31, 2016

tomhughes commented May 31, 2016

zerebubuth commented May 31, 2016

tomhughes commented May 31, 2016

zerebubuth commented May 31, 2016

tomhughes commented May 31, 2016

tomhughes commented May 31, 2016

zerebubuth commented Jun 1, 2016

tomhughes commented Jun 1, 2016

zerebubuth commented Jun 1, 2016 •

edited

Loading

zerebubuth commented Jun 1, 2016

Add graceful shutdown timeout for Apache #66

Add graceful shutdown timeout for Apache #66

Conversation

zerebubuth commented May 31, 2016

tomhughes commented May 31, 2016

tomhughes commented May 31, 2016

zerebubuth commented May 31, 2016

tomhughes commented May 31, 2016

zerebubuth commented May 31, 2016

tomhughes commented May 31, 2016

tomhughes commented May 31, 2016

zerebubuth commented Jun 1, 2016

tomhughes commented Jun 1, 2016

zerebubuth commented Jun 1, 2016 • edited Loading

zerebubuth commented Jun 1, 2016

zerebubuth commented Jun 1, 2016 •

edited

Loading