-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful draining/shutdown of connections #915
Comments
Thanks. I can base a test on this and then implement something matching the test. |
Any updates for this? |
I just had a discussion about zero-downtime deploys with Kubernetes and Elixir/Phoenix in the elixir-lang slack channel and I think cowboy is probably the right place to implement connection draining. When Kubernetes is ready to cycle your app/pod, it sends a SIGTERM to your app so you can perform pre-shutdown tasks and drain connections. No new requests go to your app during this time. 30s later, if your app is still running, it sends a SIGKILL and brutally kills the app. The problem is Elixir/Phoenix seems to shutdown immediately after receiving the SIGTERM. There is a discussion about adding connection draining to Phoenix, but it was rejected. See phoenixframework/phoenix#1742 Regardless of whether anyone has time to implement it, do you think cowboy is at least the right place to do it? |
Just putting in my 2 cents that I like this idea; also a disclaimer that I'm currently being bitten by it on a production web app whenever I do a CI deploy... it's only a few seconds, but still |
Cowboy is not the right place to do things like this, Ranch is. The way you describe, where something external (load balancer etc.) holds off sending new connections while you effectively gracefully restart the listener. I am not really interested in this because it only works for the more complex setups. If you don't have a load balancer, you don't want to wait before accepting new connections. It has to happen even while older connections exist. What I would like to have is a way in Ranch to close/reopen the listening socket with new options. The existing supervision tree would stay, and Ranch would just propagate the changed socket everywhere it's needed. New and old connections would continue working side by side. The end result is that if you don't have a load balancer you have a very minimal interruption in accepting connections but existing connections stay alive, and if you do have a more complex setup then you can do this update in a few milliseconds instead of waiting for connections to slowly be dropped. (I do not know Kubernetes so I can't say if it would keep connections alive though. It depends on your deployment. If it just stops sending connections and then resumes sending them later without touching existing connections, no problem.) I have no plans for this before Ranch 2.0 though. There's an old ticket, ninenines/ranch#83 and I have no plans to work on this at the moment, at least not when Cowboy 2.0 still needs heavy maintenance. |
@essen thanks for the thoughtful response! I may be misunderstanding you, but I don't think "wait before accepting new connections" is actually something we're looking for. When we deploy, we bring up a new instance that immediately starts serving requests. Then, we stop sending requests to the old instance, drain requests, and then terminate it. We're not really to "restart" the app at all. We bring up a brand new one and destroy the old one. |
Ah right. You want a graceful_stop_listener, and I'm talking about a graceful_restart_listener. Well the good news is that they're not incompatible goals. |
Ah awesome :) would you still say that the graceful_stop_listener should be implemented in ranch and not cowboy? |
Yes everything that has to do with listeners is Ranch's territory. Cowboy only has some shortcuts. |
The most recent commit of Ranch has the ability to suspend listeners (killing the listening socket but leaving connections alive). This is a first step toward a graceful drain. Not sure more should be done. Please look it up! |
Can't wait for this to be implemented 👍 |
Is someone working on that issue? Our Elixir pods are being killed instantaneously |
Ranch can already be used to implement graceful draining via https://ninenines.eu/docs/en/ranch/1.6/manual/ranch.suspend_listener/ and https://ninenines.eu/docs/en/ranch/1.6/manual/ranch.wait_for_connections/ It can be combined with start_listener or set_transport_options depending on what you set out to do. Please experiment and provide feedback. |
I've implemented graceful HTTP connection draining in my project by adding a GenServer after my HTTP endpoint (Phoenix in this case) with the following implementation: defmodule GracefulShutdownManager do
use GenServer
def child_spec(_) do
%{
id: __MODULE__,
start: {__MODULE__, :start_link, []},
shutdown: 10_000
}
end
def start_link() do
GenServer.start_link(__MODULE__, nil)
end
def init(nil) do
Process.flag(:trap_exit, true)
{:ok, nil}
end
def terminate(_reason, nil) do
:ranch.suspend_listener(MyPhoenixEndpoint.HTTP)
:ranch.wait_for_connections(MyPhoenixEndpoint.HTTP, :==, 0, 10_000)
end
end I do feel that some thing like this belongs in either Cowboy, Plug, or Phoenix. Is there appetite for adding something like this to Cowboy? (and in any case, maybe this code snippet will help the next person who comes here looking for answers) |
Doesn't sound like it's worth adding to Cowboy directly. |
On the contrary, if this is implemented in Cowboy directly, then everyone building software on Cowboy would benefit. IMHO connection draining isn't a frivolous feature, but something most programs can benefit from. That said, I respect your decision as a maintainer and will take this up the chain (to Plug / Phoenix). |
If you want the reuse you could also just make a small "drainer" lib that anyone (and not just plug/phoenix) can pair up with their cowboy install. |
@ferd already on it ;) edit: here it is: https://hex.pm/packages/ranch_connection_drainer |
The problem with adding to Cowboy (or Ranch, it'd be more fitting there) is that there's a number of different scenarios that may be interesting to people and at this point I do not know what people need. So I would encourage experimentation and then we can revisit when we have more data. Sorry for the short answer earlier, I wanted to say something before leaving yet was getting late. :-) |
Has there been any talk about how to handle Keep-Alive connections? Ranch will suspend the listener and wait a configurable amount of time before force shut down, but the keep-alive connections are still able to send requests and they will be happily processed by ranch even if the listener is suspended. |
For existing connections the Cowboy processes should handle the shutdown exit signal or similar and there's more work to be done on that. |
Thanks @essen . For follow up for other readers (I hate not coming back with a solution after posting a problem): I ended up setting a value in an ets table that indicates that any open connections should be terminated upon their next request (https://github.com/pushex-project/pushex/blob/master/lib/push_ex_web/config.ex#L11). A "drainer process" which suspends ranch listener also sets this value. This value is then checked in each API request to see if it needs to send a close header (https://github.com/pushex-project/pushex/blob/master/lib/push_ex_web/controllers/push_controller.ex#L31). This works great for the particular application I'm working with and caused a bunch of errors on shutdown to 0 errors on shutdown. |
Cowboy now has graceful shutdown of HTTP/2 connections, but it can't be triggered by the user just yet. Still it shouldn't be much work needed to do it for both HTTP/1.1 and HTTP/2 since the mechanisms are already there. For Websocket the mechanism is missing and it will need to be added. |
@essen which version of cowboy are you targeting for these changes? Then I can add a notice to the readme of ranch_connection_drainer. |
I'm not sure yet. Currently working on 2.7 but I can't promise this will be in it. |
Ok no worries. If you update this when you know then I can just add it then 👍 |
Considering the scope of this ticket is still fairly large it won't make it into 2.7. However I think it should be worked on soon after 2.7 so that the changes are available for testing as long as possible before 2.8. |
In our use case, we have very long-lived HTTP/2 connections, used for machine-to-machine communication (5G mobile network infrastructure in our case). They are never idle so they never time out. In order to trigger load balancing after adding more nodes (VMs/containers/etc.) to the system, we need a way to tell some of the clients to re-connect, i.e. trigger a graceful shutdown (goaway) on individual connections. Something as simple as Pids = ranch:procs(Ref, connections),
[Pid ! goaway || Pid <- lists:sublist(Pids, 1, 5)].
We are willing to contribute an interface for it, but first it would be nice to know if it would be accepted and how you'd want it to look like. We only need it for HTTP/2. Thanks! |
Via |
The graceful shutdown PR has been merged. Closing, thanks! |
This is the code we use in a project to make sure Cowboy closes down gracefully when the application stops. To do this, we have a listener bound to
http_api
and we have an application callback usingprep_stop/1
to prepare the application for stopping gracefully:The helper
drain_connections/1
runs the following loop:It would be really nice to have some kind of "official" support for this kind of thing, so we didn't have to go peek inside ranch. I'm also not sure this is entirely the right way to go at it.
The reason this is nice to have is that if you close down the system, then connections are finished while no new connections are made. Once drained, the application is stopped for real. This means any dependencies the app might have on correct operation is torn down after the last connection has been drained. It avoids races when you stop a node to deploy a new version of the code.
The text was updated successfully, but these errors were encountered: