Raise the Keep-Alive idle timeout to 100 seconds #261
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Heroku's Router 2.0 now supports Keep-Alive connections:
https://www.heroku.com/blog/tips-tricks-router-2dot0-migration/#keepalives-always-on
https://devcenter.heroku.com/articles/http-routing#legacy-router-and-router-2-0
Gunicorn supports Keep-Alive connections, and so now if an app is using Router 2.0 connections between the router instances and the app will be kept alive.
However, gunicorn's default
keepaliveidle timeout setting is only 5 seconds:https://docs.gunicorn.org/en/stable/settings.html#keepalive
This is shorter than the 90 second Router 2.0 keep-alive timeout:
https://devcenter.heroku.com/articles/http-routing#keepalives
As such:
(a) This causes connections to be closed sooner than they need to be (as noted in the gunicorn docs, the default 5 seconds is more suited for cases where many clients are connecting directly to gunicorn, rather than gunicorn being behind a load balancer).
(b) there is a possibility of a race condition whereby gunicorn starts initiating the closing of an idle Keep-Alive connection just at the moment that the Heroku Router sends a new request to it. If this occurred, that new request could fail.
This race condition isn't unique to Heroku or gunicorn, but applies to any load balancer/app server combination that supports Keep-Alive. For example, see AWS' explanation about this in their ELB docs:
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/edit-load-balancer-attributes.html#connection-idle-timeout
Therefore, the gunicorn idle timeout has been raised from 5 seconds to 100 seconds, so it is greater than the Router's idle timeout - ensuring that the router is always the one initiating connection closing (and the router will know to not send new requests to a connection it's in the middle of closing).
GUS-W-18319007.