Graceful shutdown #604

hardie · 2017-06-07T11:30:49Z

Martin Thomson's presentation in Paris proposed a pattern in which the application layer protocol manages graceful shutdown by using timeouts. It might also define in its mapping documents an explicit signal from the application protocol to the transport for graceful shutdown.

After discussion, we agreed to consider an additional pattern, common across applications, that would allow an application protocol to gracefully close a transport without having to define an explicit signal. Two initial ideas (both with some issues) are closing all streams and forcing idle state by setting the stream maximum to 0.

This issue is to track further discussion of possible patterns that might work for this.

mirjak · 2017-06-07T11:34:09Z

One or multiple showdown procedures can/should be described in the applicability document (based on one or multiple protocol mechanisms quic provides). Further we already discussed at the last meeting (in Chicago) that we might want another document to describe an (abstract) application interface.

martinthomson · 2017-08-22T01:43:21Z

Here's the model that I've been using to drive the design for HTTP:

Model

Each endpoint has a set of goals for the connection.

For an HTTP client, that is to make a certain set of requests and get responses to those requests (add push and you might include getting responses to any pushes as well). An HTTP server doesn't really have goals of its own, it largely wants to ensure that clients achieve their goals.

The process goes approximately like this:

Agree to what goals will be attempted. In HTTP GOAWAY is used to divide the space of potential requests into two sets: those that might be fulfilled and those that won't be. This might require a bit of iteration or negotiation. Here, GOAWAY can be sent multiple times with reducing values.
Complete those goals that you agree to. That could take any amount of time, during which there might be a reassessment of the agreement on either side (in which case, goto 1).
Once agreed goals are complete, which could mean attempted and failed, then close. An HTTP client can walk away as soon as it gets responses to all agreed requests (and pushes). An HTTP server can walk away as soon as it knows that the client got all the responses.

That doesn't suggest much more mechanism than a TIME_WAIT (or DRAIN) period during which any ACKs can be generated in response to unnecessary retransmissions of STREAM and other frames.

I don't see a way that this can be translated into artifacts that are visible to the transport without some bad compromises. Critically, the process could take an indeterminate period. Also, HTTP shows that stream creation has to continue beyond the point that a shutdown is initiated.

Alternative: When Streams Close

The most plausible suggestion I've heard for this is to make stream closure the signal. When all streams are closed, then the connection can be discarded. I don't think that it produces the right outcome either.

Aside: The fact that this approach doesn't work with unidirectional streams is indicative of a critical point: that connection state isn't necessarily bound to any one stream.

Let's say that the decision to close streams is driven by the above logic and that closing any remaining streams creates the signal to the transport. What has this gained? It's just indirection: rather than telling the transport to close, the application is using other existing APIs to generate an indirect signal. (We'd also have to resolve better the question of what closed means, which doesn't currently suit any mechanism that requires strong assurances about the state.)

Closing stream 1 on HTTP would take an RTT or two, during which time no useful work is performed, and then you transition into the DRAIN timer. That's wasteful; why do that when the alternative (closing when you are done) is so simple and appealing? Not to mention the secondary effects of the choice: you can no longer treat closure of stream 1 as a protocol error because it's potentially a shutdown signal. And now there are two signals that can get confused. If a shutdown wasn't negotiated you might still see a stream close. Does that mean that you missed a GOAWAY?

There might be value in an explicit signal at the point that an application decided to abandon the connection, but we'd have to establish that a signal is both necessary and justified. Signals like this take time to exchange.

Applicability Seems Right

I tend to agree with @mirjak on the point regarding applicability. If this is as amorphous as my model suggests, then text in an applicability statement is perfect.

mirjak · 2017-08-22T16:28:48Z

I think this is actually connected to the keep-alive question. I agree that there should be some (configurable) timer, after which the connection should simply be closed, or a ping sent if explicitly indicated by the application that keep-alives should be used and the connection is not ready be closed yet.

Therefore I would say the default should be silent close after a certain idle time and no pings, given that an application away needs to be prepared that a connection doesn't work any more even after a short idle time (if e.g. the network state is gone because the UDP timeouts was incredibly low) and also given that quic is optimized to minimize start-up latency and therefore reconnecting is fast.

martinthomson · 2017-08-23T00:18:35Z

@mirjak, the timeout exists. But a timeout only works if the connection becomes idle. The primary use case for graceful shutdown - at least in my experience - is on high-use connections between servers where moving to a new connection or server without losing in-progress work is critical.

martinthomson · 2017-10-23T03:47:53Z

@hardie, has this been addressed by recent changes? That is, the application decides, but signals to the transport (and any intermediaries with keys) using APPLICATION_CLOSE when it has concluded its shutdown.

hardie · 2017-10-23T18:50:59Z

@martinthomson I'm not sure if you're referring to a PR or the published docs. If the published docs, transport looks like it covers APPLICATION_CLOSE but the HTTP mapping doc doesn't mention it and 7.1 seems to be using the HTTP application error codes with CONNECTION_CLOSE. So I think it would need a PR to cover APPLICATION_CLOSE.

martinthomson · 2017-10-24T01:07:39Z

@hardie, thanks for checking HTTP. This issue was specifically about the generic issue though. Can we say that sending APPLICATION_CLOSE when a connection is "done" is sufficiently generic a signal?

I'll open an editorial PR on the HTTP spec to correct the error(s) there.

@hardie

As noted by @hardie in #604, the description of GOAWAY hasn't tracked the changes in the transport. This updates it. Most importantly, it says what to do to finish a graceful shutdown, including the sending of APPLICATION_CLOSE frames. I added a new error code for when the server receives a GOAWAY. I also changed the description to be less generic and use client and server as appropriate.

hardie · 2017-10-24T15:51:49Z

On Mon, Oct 23, 2017 at 6:07 PM, Martin Thomson ***@***.***> wrote: @hardie <https://github.com/hardie>, thanks for checking HTTP. This issue was specifically about the generic issue though. Can we say that sending APPLICATION_CLOSE when a connection is "done" is sufficiently generic a signal? As a generic, I think this is fine. I'll open an editorial PR on the HTTP spec to correct the error(s) there.

Personally, I would close this issue when that PR is merged, but up to you.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#604 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABVb5O12G6KrHriS0kc7-MbtYD9GLKKvks5svThcgaJpZM4NykgT> .

@hardie

As noted by @hardie in #604, the description of GOAWAY hasn't tracked the changes in the transport. This updates it. Most importantly, it says what to do to finish a graceful shutdown, including the sending of APPLICATION_CLOSE frames. I added a new error code for when the server receives a GOAWAY. I also changed the description to be less generic and use client and server as appropriate.

martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Jun 7, 2017

mnot changed the title ~~Identify, if possible, a common application protocol pattern for graceful shutdown~~ Graceful shutdown Jun 20, 2017

mnot added this to Closing, Shutdown, Reset in QUIC Jun 21, 2017

martinthomson mentioned this issue Jun 27, 2017

Unidirectional Streams #643

Closed

martinthomson mentioned this issue Oct 24, 2017

Improve GOAWAY description #898

Merged

martinthomson closed this as completed Jan 16, 2018

mnot removed this from Connection End in QUIC Mar 6, 2018

mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown #604

Graceful shutdown #604

hardie commented Jun 7, 2017

mirjak commented Jun 7, 2017

martinthomson commented Aug 22, 2017

mirjak commented Aug 22, 2017

martinthomson commented Aug 23, 2017

martinthomson commented Oct 23, 2017

hardie commented Oct 23, 2017

martinthomson commented Oct 24, 2017

hardie commented Oct 24, 2017 via email

Graceful shutdown #604

Graceful shutdown #604

Comments

hardie commented Jun 7, 2017

mirjak commented Jun 7, 2017

martinthomson commented Aug 22, 2017

Model

Alternative: When Streams Close

Applicability Seems Right

mirjak commented Aug 22, 2017

martinthomson commented Aug 23, 2017

martinthomson commented Oct 23, 2017

hardie commented Oct 23, 2017

martinthomson commented Oct 24, 2017

hardie commented Oct 24, 2017 via email