Graceful draining/shutdown of connections #915

jlouis · 2015-11-12T20:04:46Z

This is the code we use in a project to make sure Cowboy closes down gracefully when the application stops. To do this, we have a listener bound to http_api and we have an application callback using prep_stop/1 to prepare the application for stopping gracefully:

prep_stop(_) ->
    ranch:set_max_connections(http_api, 0),
    drain_connections(10),
    [].

The helper drain_connections/1 runs the following loop:

drain_connections(0) -> ok;
drain_connections(N) ->
    timer:sleep(1000),
    case ranch_server:count_connections(http_api) of
        0 -> ok;
        K when K > 0 ->
            lager:info("finishing work on active connections: ~p connections left", [K]),
            drain_connections(N-1)
    end.

It would be really nice to have some kind of "official" support for this kind of thing, so we didn't have to go peek inside ranch. I'm also not sure this is entirely the right way to go at it.

The reason this is nice to have is that if you close down the system, then connections are finished while no new connections are made. Once drained, the application is stopped for real. This means any dependencies the app might have on correct operation is torn down after the last connection has been drained. It avoids races when you stop a node to deploy a new version of the code.

The text was updated successfully, but these errors were encountered:

essen · 2015-11-12T20:15:40Z

Thanks. I can base a test on this and then implement something matching the test.

nhooyr · 2017-02-02T06:39:40Z

Any updates for this?

jesseshieh · 2017-11-02T16:35:23Z

I just had a discussion about zero-downtime deploys with Kubernetes and Elixir/Phoenix in the elixir-lang slack channel and I think cowboy is probably the right place to implement connection draining.

When Kubernetes is ready to cycle your app/pod, it sends a SIGTERM to your app so you can perform pre-shutdown tasks and drain connections. No new requests go to your app during this time. 30s later, if your app is still running, it sends a SIGKILL and brutally kills the app.

The problem is Elixir/Phoenix seems to shutdown immediately after receiving the SIGTERM. There is a discussion about adding connection draining to Phoenix, but it was rejected. See phoenixframework/phoenix#1742

Regardless of whether anyone has time to implement it, do you think cowboy is at least the right place to do it?

pmarreck · 2017-11-02T16:37:59Z

Just putting in my 2 cents that I like this idea; also a disclaimer that I'm currently being bitten by it on a production web app whenever I do a CI deploy... it's only a few seconds, but still

essen · 2017-11-02T17:01:57Z

Cowboy is not the right place to do things like this, Ranch is.

The way you describe, where something external (load balancer etc.) holds off sending new connections while you effectively gracefully restart the listener. I am not really interested in this because it only works for the more complex setups. If you don't have a load balancer, you don't want to wait before accepting new connections. It has to happen even while older connections exist.

What I would like to have is a way in Ranch to close/reopen the listening socket with new options. The existing supervision tree would stay, and Ranch would just propagate the changed socket everywhere it's needed. New and old connections would continue working side by side.

The end result is that if you don't have a load balancer you have a very minimal interruption in accepting connections but existing connections stay alive, and if you do have a more complex setup then you can do this update in a few milliseconds instead of waiting for connections to slowly be dropped. (I do not know Kubernetes so I can't say if it would keep connections alive though. It depends on your deployment. If it just stops sending connections and then resumes sending them later without touching existing connections, no problem.)

I have no plans for this before Ranch 2.0 though. There's an old ticket, ninenines/ranch#83 and I have no plans to work on this at the moment, at least not when Cowboy 2.0 still needs heavy maintenance.

jesseshieh · 2017-11-02T20:46:41Z

@essen thanks for the thoughtful response!

I may be misunderstanding you, but I don't think "wait before accepting new connections" is actually something we're looking for. When we deploy, we bring up a new instance that immediately starts serving requests. Then, we stop sending requests to the old instance, drain requests, and then terminate it. We're not really to "restart" the app at all. We bring up a brand new one and destroy the old one.

essen · 2017-11-02T20:48:25Z

Ah right. You want a graceful_stop_listener, and I'm talking about a graceful_restart_listener. Well the good news is that they're not incompatible goals.

jesseshieh · 2017-11-02T20:52:46Z

Ah awesome :) would you still say that the graceful_stop_listener should be implemented in ranch and not cowboy?

essen · 2017-11-02T20:58:34Z

Yes everything that has to do with listeners is Ranch's territory. Cowboy only has some shortcuts.

essen · 2018-05-02T15:43:50Z

The most recent commit of Ranch has the ability to suspend listeners (killing the listening socket but leaving connections alive). This is a first step toward a graceful drain. Not sure more should be done. Please look it up!

gmanolache · 2018-09-25T12:09:58Z

Can't wait for this to be implemented 👍

hugohenley · 2018-09-25T12:11:12Z

Is someone working on that issue? Our Elixir pods are being killed instantaneously

essen · 2018-09-25T12:24:33Z

Ranch can already be used to implement graceful draining via https://ninenines.eu/docs/en/ranch/1.6/manual/ranch.suspend_listener/ and https://ninenines.eu/docs/en/ranch/1.6/manual/ranch.wait_for_connections/

It can be combined with start_listener or set_transport_options depending on what you set out to do.

Please experiment and provide feedback.

derekkraan · 2019-01-22T14:34:55Z

I've implemented graceful HTTP connection draining in my project by adding a GenServer after my HTTP endpoint (Phoenix in this case) with the following implementation:

defmodule GracefulShutdownManager do
  use GenServer

  def child_spec(_) do
    %{
      id: __MODULE__,
      start: {__MODULE__, :start_link, []},
      shutdown: 10_000
    }
  end

  def start_link() do
    GenServer.start_link(__MODULE__, nil)
  end

  def init(nil) do
    Process.flag(:trap_exit, true)
    {:ok, nil}
  end

  def terminate(_reason, nil) do
    :ranch.suspend_listener(MyPhoenixEndpoint.HTTP)

    :ranch.wait_for_connections(MyPhoenixEndpoint.HTTP, :==, 0, 10_000)
  end
end

I do feel that some thing like this belongs in either Cowboy, Plug, or Phoenix. Is there appetite for adding something like this to Cowboy? (and in any case, maybe this code snippet will help the next person who comes here looking for answers)

essen · 2019-01-22T15:00:52Z

Doesn't sound like it's worth adding to Cowboy directly.

derekkraan · 2019-01-22T15:07:50Z

On the contrary, if this is implemented in Cowboy directly, then everyone building software on Cowboy would benefit. IMHO connection draining isn't a frivolous feature, but something most programs can benefit from.

That said, I respect your decision as a maintainer and will take this up the chain (to Plug / Phoenix).

ferd · 2019-01-22T15:17:10Z

If you want the reuse you could also just make a small "drainer" lib that anyone (and not just plug/phoenix) can pair up with their cowboy install.

derekkraan · 2019-01-22T15:17:36Z

@ferd already on it ;)

edit: here it is: https://hex.pm/packages/ranch_connection_drainer

essen · 2019-01-22T19:29:49Z

The problem with adding to Cowboy (or Ranch, it'd be more fitting there) is that there's a number of different scenarios that may be interesting to people and at this point I do not know what people need. So I would encourage experimentation and then we can revisit when we have more data.

Sorry for the short answer earlier, I wanted to say something before leaving yet was getting late. :-)

sb8244 · 2019-04-24T20:02:59Z

Has there been any talk about how to handle Keep-Alive connections? Ranch will suspend the listener and wait a configurable amount of time before force shut down, but the keep-alive connections are still able to send requests and they will be happily processed by ranch even if the listener is suspended.

essen · 2019-05-02T13:45:46Z

For existing connections the Cowboy processes should handle the shutdown exit signal or similar and there's more work to be done on that.

sb8244 · 2019-05-02T14:31:29Z

Thanks @essen . For follow up for other readers (I hate not coming back with a solution after posting a problem):

I ended up setting a value in an ets table that indicates that any open connections should be terminated upon their next request (https://github.com/pushex-project/pushex/blob/master/lib/push_ex_web/config.ex#L11). A "drainer process" which suspends ranch listener also sets this value. This value is then checked in each API request to see if it needs to send a close header (https://github.com/pushex-project/pushex/blob/master/lib/push_ex_web/controllers/push_controller.ex#L31).

This works great for the particular application I'm working with and caused a bunch of errors on shutdown to 0 errors on shutdown.

essen · 2019-10-03T10:54:57Z

Cowboy now has graceful shutdown of HTTP/2 connections, but it can't be triggered by the user just yet. Still it shouldn't be much work needed to do it for both HTTP/1.1 and HTTP/2 since the mechanisms are already there. For Websocket the mechanism is missing and it will need to be added.

derekkraan · 2019-10-03T13:16:55Z

@essen which version of cowboy are you targeting for these changes? Then I can add a notice to the readme of ranch_connection_drainer.

essen · 2019-10-03T13:20:03Z

I'm not sure yet. Currently working on 2.7 but I can't promise this will be in it.

derekkraan · 2019-10-03T13:21:12Z

Ok no worries. If you update this when you know then I can just add it then 👍

essen · 2019-10-10T09:45:20Z

Considering the scope of this ticket is still fairly large it won't make it into 2.7. However I think it should be worked on soon after 2.7 so that the changes are available for testing as long as possible before 2.8.

zuiderkwast · 2020-09-23T22:03:50Z

In our use case, we have very long-lived HTTP/2 connections, used for machine-to-machine communication (5G mobile network infrastructure in our case). They are never idle so they never time out. In order to trigger load balancing after adding more nodes (VMs/containers/etc.) to the system, we need a way to tell some of the clients to re-connect, i.e. trigger a graceful shutdown (goaway) on individual connections. Something as simple as Pid ! goaway would do. Then we can use ranch to find the connections:

    Pids = ranch:procs(Ref, connections),
    [Pid ! goaway || Pid <- lists:sublist(Pids, 1, 5)].

Cowboy now has graceful shutdown of HTTP/2 connections, but it can't be triggered by the user just yet

We are willing to contribute an interface for it, but first it would be nice to know if it would be accepted and how you'd want it to look like. We only need it for HTTP/2. Thanks!

essen · 2020-09-24T08:29:33Z

Via sys:terminate, there's a TODO about using graceful shutdown there. There's a similar TODO for the parent process exit signal so that all connections attempt to shutdown gracefully when the supervisor exits.

essen · 2020-11-27T14:41:59Z

The graceful shutdown PR has been merged. Closing, thanks!

essen added this to the 2.0.0 milestone Nov 12, 2015

essen added the Feature request label Nov 12, 2015

essen modified the milestones: 2.0.0, After 2.0 Feb 3, 2017

essen removed this from the After 2.0 milestone Oct 2, 2017

jesseshieh mentioned this issue Nov 2, 2017

Feature request: connection draining ninenines/ranch#175

Closed

jimdigriz mentioned this issue Jul 30, 2018

Allow the user to trigger graceful shutdown for a connection #1312

Closed

essen changed the title ~~Graceful draining of connections~~ Graceful draining/shutdown of connections Oct 10, 2019

essen added this to the 2.8 milestone Oct 10, 2019

essen closed this as completed Nov 27, 2020

keynslug mentioned this issue Jan 31, 2022

TD-47: Update templates valitydev/erlang-templates#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful draining/shutdown of connections #915

Graceful draining/shutdown of connections #915

jlouis commented Nov 12, 2015

essen commented Nov 12, 2015

nhooyr commented Feb 2, 2017

jesseshieh commented Nov 2, 2017 •

edited

Loading

pmarreck commented Nov 2, 2017 •

edited

Loading

essen commented Nov 2, 2017

jesseshieh commented Nov 2, 2017

essen commented Nov 2, 2017

jesseshieh commented Nov 2, 2017

essen commented Nov 2, 2017

essen commented May 2, 2018

gmanolache commented Sep 25, 2018

hugohenley commented Sep 25, 2018

essen commented Sep 25, 2018

derekkraan commented Jan 22, 2019 •

edited

Loading

essen commented Jan 22, 2019

derekkraan commented Jan 22, 2019

ferd commented Jan 22, 2019

derekkraan commented Jan 22, 2019 •

edited

Loading

essen commented Jan 22, 2019

sb8244 commented Apr 24, 2019

essen commented May 2, 2019

sb8244 commented May 2, 2019

essen commented Oct 3, 2019

derekkraan commented Oct 3, 2019

essen commented Oct 3, 2019

derekkraan commented Oct 3, 2019

essen commented Oct 10, 2019

zuiderkwast commented Sep 23, 2020

essen commented Sep 24, 2020

essen commented Nov 27, 2020

Graceful draining/shutdown of connections #915

Graceful draining/shutdown of connections #915

Comments

jlouis commented Nov 12, 2015

essen commented Nov 12, 2015

nhooyr commented Feb 2, 2017

jesseshieh commented Nov 2, 2017 • edited Loading

pmarreck commented Nov 2, 2017 • edited Loading

essen commented Nov 2, 2017

jesseshieh commented Nov 2, 2017

essen commented Nov 2, 2017

jesseshieh commented Nov 2, 2017

essen commented Nov 2, 2017

essen commented May 2, 2018

gmanolache commented Sep 25, 2018

hugohenley commented Sep 25, 2018

essen commented Sep 25, 2018

derekkraan commented Jan 22, 2019 • edited Loading

essen commented Jan 22, 2019

derekkraan commented Jan 22, 2019

ferd commented Jan 22, 2019

derekkraan commented Jan 22, 2019 • edited Loading

essen commented Jan 22, 2019

sb8244 commented Apr 24, 2019

essen commented May 2, 2019

sb8244 commented May 2, 2019

essen commented Oct 3, 2019

derekkraan commented Oct 3, 2019

essen commented Oct 3, 2019

derekkraan commented Oct 3, 2019

essen commented Oct 10, 2019

zuiderkwast commented Sep 23, 2020

essen commented Sep 24, 2020

essen commented Nov 27, 2020

jesseshieh commented Nov 2, 2017 •

edited

Loading

pmarreck commented Nov 2, 2017 •

edited

Loading

derekkraan commented Jan 22, 2019 •

edited

Loading

derekkraan commented Jan 22, 2019 •

edited

Loading