Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify shutdown procedure #3962

Closed
fholzer opened this issue Jul 16, 2023 · 7 comments · Fixed by #4138
Closed

clarify shutdown procedure #3962

fholzer opened this issue Jul 16, 2023 · 7 comments · Fixed by #4138
Labels
Milestone

Comments

@fholzer
Copy link
Contributor

fholzer commented Jul 16, 2023

While writing up a summary for #3961 I realized that quic-go's Listener.Close() behaves differently than Go's TCPListener.Close() with the former closing all active connection, and the latter not doing that.

Conceptually I like the idea of being able to first stop listening for new connections, then take care of gracefully shutting down any active connections. Another issue is that listener.Close() doesn't accept a user error like connection.CloseWithError(error) does. Even if it would accept an error, users may not want to close all active connections with the same error.

I also noticed that I can cancel calls to listener.Accept(ctx) by cancelling the passed context, though I am not sure if it would have an adverse effect when not calling Accept for some time and clients want to open new connections.

As it is now I have to decide between:

  1. stopping to listen by calling listener.Close() and not being able to convey the desired error to the remote side (neither to have it surfaced via connections context cancellation cause, see connection: surface connection error as connection context cancellation cause #3961)
  2. stopping to listen by calling listener.Close(), also calling conn.CloseWithError(err), and hope that conn.CloseWithError(err) can cleanly close the connection before listener.Close() does what it does.
  3. Canceling any pending listener.Accept(ctx) calls, closing all open connections via conn.CloseWithError(err), once all connection contexts are done, close the listener, then the transport, then the transport.Conn.

I have a feeling that option 3 is probably the recommended way, but unsure about cancelling the context on listener.Accept(ctx). Though upon further testing I found that even when cancelling the accept context, the library still continues to accept connections. At least on client side the call to transport.Dial completes with err = nil. Later, after transport.Close is called on the server, the client receives Application error 0x0 (remote).

@marten-seemann
Copy link
Member

marten-seemann commented Sep 6, 2023

Hi @fholzer! Sorry for the late response here, and thank you for your detailed issue!

I have a feeling that option 3 is probably the recommended way, but unsure about cancelling the context on listener.Accept(ctx).

Calling the context on Listener.Accept doesn't have any effect beyond making the currently blocked Listener.Accept call return. It neither stops the listener nor closes any connections.

While writing up a summary for #3961 I realized that quic-go's Listener.Close() behaves differently than Go's TCPListener.Close() with the former closing all active connection, and the latter not doing that.

This is an important data point, and a very strong argument for changing the behavior of Listener.Close. The current behavior comes from a time when we didn't have the Transport yet, and we needed a way to make the Go routine that's reading from the underlying connection return.
We don't have that problem anymore. Now we have a Transport, which comes with a Close method.

I suggest changing the API such that:

  • Listener.Close only closes the listener, but it doesn't close any QUIC connections. After closing, the Transport will behave as if no listener had ever been attached, i.e. it will drop incoming Initial packets, but will still allow dialing out new connections.
  • It is the caller's responsibility to handle existing connections. This might entail closing them, but that's up to the application to decide.

Open questions:

  1. What happens with connections that are currently in the server's accept queue (i.e. have finished the QUIC handshake, but weren't yet returned by Accept)?
  2. What happens with connections that are currently handshaking?
  3. What happens when the quic.Listen convenience function was used? How would the Go routine reading from the underlying connection be closed?

@fholzer, wdyt? Does this API make more sense to you?

@marten-seemann
Copy link
Member

3. What happens when the quic.Listen convenience function was used? How would the Go routine reading from the underlying connection be closed?

We could count the number of connections tracked by the packetHandlerMap, and once it reaches zero, we shut down the Go routine. This is not as clean as calling a Close method explicitly, but this seems to be the best we can do when using quic.Listen.

@fholzer
Copy link
Contributor Author

fholzer commented Sep 12, 2023

What happens with connections that are currently in the server's accept queue (i.e. have finished the QUIC handshake, but weren't yet returned by Accept)?

I tried to figure this out, looking at winsock API, and man pages for linux tcp functions. After going down that rabbit hole I ended up finding something in the linux kernel source, and something in the TCP RFC 9293.

i am not entirely sure how to interpret the paragraph about the "LISTEN STATE" in the "Close Call" section. It doesn't talk about implementation details like the accept queue. The exact behavior could very well be implementation(/OS) specific.

Based on what I found in the Linux source code, i think that the kernel sends RST to any established connection that is not yet returned to user space via an accept call. With that in mind, if you want to model the behavior after Linux TCP stack, then any connection that hasn't been returned yet to the user via an accept call should be closed. Regardless of whether the handshake is in progress or completed already. Ideally not silently, but in a way that let's the remote end know that something failed, though I don't know enough about the QUIC spec to say how that could/should be done.

Sorry, haven't had time to look into the convenience function quic.Listen yet.

@marten-seemann
Copy link
Member

Thanks for researching, @fholzer!

i am not entirely sure how to interpret the paragraph about the "LISTEN STATE" in the "Close Call" section. It doesn't talk about implementation details like the accept queue. The exact behavior could very well be implementation(/OS) specific.

It's reasonable to assume that most users will call Accept in a loop until that call returns an error. Therefore it makes sense to return these connections to the application and have it decide what to do. Even if it just decides to close them, at least it can close them using a meaningful application-specific error.

This means that we'll only need to deal with connections that are still handshaking. Unfortunately, there's no dedicated error code defined for that in the RFC. CONNECTION_REFUSED seems like the best fit though.

@fholzer
Copy link
Contributor Author

fholzer commented Sep 14, 2023

It's reasonable to assume that most users will call Accept in a loop until that call returns an error. Therefore it makes sense to return these connections to the application and have it decide what to do. Even if it just decides to close them, at least it can close them using a meaningful application-specific error.

If I understand correctly, that means calling close on the listener, while having another go routine call accept in a loop will not make accept error immediately if there's connections in the backlog where the handshake completed already. Instead it will keep returning those connections from the backlog until none are left, at which point accept will return some error signaling that that the listener was closed. This is different than what the Linux kernel does, puts additional burden on users of the library, but allows them more freedom in how they can deal with connections in that scenario.

Reg. close during handshake, CONNECTION_REFUSED looks like a good choice.

@marten-seemann marten-seemann modified the milestones: v0.39, v0.40 Sep 16, 2023
@marten-seemann
Copy link
Member

#4072 implements this logic when using a Listener derived from a Transport. It still uses the old logic when using a Listener created with quic.Listen.

@marten-seemann
Copy link
Member

I'm not sure about this anymore. The problem is that if you use quic.Listen and close the listener, you have no way of knowing when you can use the UDP port again. Maybe we should just document that closing a listener created with quic.Listen immediately kills all connections, and if you don't want this, you should use a Transport?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants