Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown #604

Closed
hardie opened this issue Jun 7, 2017 · 8 comments
Closed

Graceful shutdown #604

hardie opened this issue Jun 7, 2017 · 8 comments
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@hardie
Copy link

hardie commented Jun 7, 2017

Martin Thomson's presentation in Paris proposed a pattern in which the application layer protocol manages graceful shutdown by using timeouts. It might also define in its mapping documents an explicit signal from the application protocol to the transport for graceful shutdown.

After discussion, we agreed to consider an additional pattern, common across applications, that would allow an application protocol to gracefully close a transport without having to define an explicit signal. Two initial ideas (both with some issues) are closing all streams and forcing idle state by setting the stream maximum to 0.

This issue is to track further discussion of possible patterns that might work for this.

@mirjak
Copy link
Contributor

mirjak commented Jun 7, 2017

One or multiple showdown procedures can/should be described in the applicability document (based on one or multiple protocol mechanisms quic provides). Further we already discussed at the last meeting (in Chicago) that we might want another document to describe an (abstract) application interface.

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Jun 7, 2017
@mnot mnot changed the title Identify, if possible, a common application protocol pattern for graceful shutdown Graceful shutdown Jun 20, 2017
@mnot mnot added this to Closing, Shutdown, Reset in QUIC Jun 21, 2017
@martinthomson
Copy link
Member

Here's the model that I've been using to drive the design for HTTP:

Model

Each endpoint has a set of goals for the connection.

For an HTTP client, that is to make a certain set of requests and get responses to those requests (add push and you might include getting responses to any pushes as well). An HTTP server doesn't really have goals of its own, it largely wants to ensure that clients achieve their goals.

The process goes approximately like this:

  1. Agree to what goals will be attempted. In HTTP GOAWAY is used to divide the space of potential requests into two sets: those that might be fulfilled and those that won't be. This might require a bit of iteration or negotiation. Here, GOAWAY can be sent multiple times with reducing values.

  2. Complete those goals that you agree to. That could take any amount of time, during which there might be a reassessment of the agreement on either side (in which case, goto 1).

  3. Once agreed goals are complete, which could mean attempted and failed, then close. An HTTP client can walk away as soon as it gets responses to all agreed requests (and pushes). An HTTP server can walk away as soon as it knows that the client got all the responses.

That doesn't suggest much more mechanism than a TIME_WAIT (or DRAIN) period during which any ACKs can be generated in response to unnecessary retransmissions of STREAM and other frames.

I don't see a way that this can be translated into artifacts that are visible to the transport without some bad compromises. Critically, the process could take an indeterminate period. Also, HTTP shows that stream creation has to continue beyond the point that a shutdown is initiated.

Alternative: When Streams Close

The most plausible suggestion I've heard for this is to make stream closure the signal. When all streams are closed, then the connection can be discarded. I don't think that it produces the right outcome either.

Aside: The fact that this approach doesn't work with unidirectional streams is indicative of a critical point: that connection state isn't necessarily bound to any one stream.

Let's say that the decision to close streams is driven by the above logic and that closing any remaining streams creates the signal to the transport. What has this gained? It's just indirection: rather than telling the transport to close, the application is using other existing APIs to generate an indirect signal. (We'd also have to resolve better the question of what closed means, which doesn't currently suit any mechanism that requires strong assurances about the state.)

Closing stream 1 on HTTP would take an RTT or two, during which time no useful work is performed, and then you transition into the DRAIN timer. That's wasteful; why do that when the alternative (closing when you are done) is so simple and appealing? Not to mention the secondary effects of the choice: you can no longer treat closure of stream 1 as a protocol error because it's potentially a shutdown signal. And now there are two signals that can get confused. If a shutdown wasn't negotiated you might still see a stream close. Does that mean that you missed a GOAWAY?

There might be value in an explicit signal at the point that an application decided to abandon the connection, but we'd have to establish that a signal is both necessary and justified. Signals like this take time to exchange.

Applicability Seems Right

I tend to agree with @mirjak on the point regarding applicability. If this is as amorphous as my model suggests, then text in an applicability statement is perfect.

@mirjak
Copy link
Contributor

mirjak commented Aug 22, 2017

I think this is actually connected to the keep-alive question. I agree that there should be some (configurable) timer, after which the connection should simply be closed, or a ping sent if explicitly indicated by the application that keep-alives should be used and the connection is not ready be closed yet.

Therefore I would say the default should be silent close after a certain idle time and no pings, given that an application away needs to be prepared that a connection doesn't work any more even after a short idle time (if e.g. the network state is gone because the UDP timeouts was incredibly low) and also given that quic is optimized to minimize start-up latency and therefore reconnecting is fast.

@martinthomson
Copy link
Member

@mirjak, the timeout exists. But a timeout only works if the connection becomes idle. The primary use case for graceful shutdown - at least in my experience - is on high-use connections between servers where moving to a new connection or server without losing in-progress work is critical.

@martinthomson
Copy link
Member

@hardie, has this been addressed by recent changes? That is, the application decides, but signals to the transport (and any intermediaries with keys) using APPLICATION_CLOSE when it has concluded its shutdown.

@hardie
Copy link
Author

hardie commented Oct 23, 2017

@martinthomson I'm not sure if you're referring to a PR or the published docs. If the published docs, transport looks like it covers APPLICATION_CLOSE but the HTTP mapping doc doesn't mention it and 7.1 seems to be using the HTTP application error codes with CONNECTION_CLOSE. So I think it would need a PR to cover APPLICATION_CLOSE.

@martinthomson
Copy link
Member

@hardie, thanks for checking HTTP. This issue was specifically about the generic issue though. Can we say that sending APPLICATION_CLOSE when a connection is "done" is sufficiently generic a signal?

I'll open an editorial PR on the HTTP spec to correct the error(s) there.

martinthomson added a commit that referenced this issue Oct 24, 2017
As noted by @hardie in #604, the description of GOAWAY hasn't tracked
the changes in the transport.  This updates it.  Most importantly, it says what
to do to finish a graceful shutdown, including the sending of APPLICATION_CLOSE
frames.

I added a new error code for when the server receives a GOAWAY.  I also changed
the description to be less generic and use client and server as appropriate.
@hardie
Copy link
Author

hardie commented Oct 24, 2017 via email

MikeBishop pushed a commit that referenced this issue Nov 11, 2017
As noted by @hardie in #604, the description of GOAWAY hasn't tracked
the changes in the transport.  This updates it.  Most importantly, it says what
to do to finish a graceful shutdown, including the sending of APPLICATION_CLOSE
frames.

I added a new error code for when the server receives a GOAWAY.  I also changed
the description to be less generic and use client and server as appropriate.
@mnot mnot removed this from Connection End in QUIC Mar 6, 2018
@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

4 participants