Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement graceful shutdown of servers and connections #153

Open
lucas-clemente opened this issue May 30, 2016 · 35 comments
Open

Implement graceful shutdown of servers and connections #153

lucas-clemente opened this issue May 30, 2016 · 35 comments

Comments

@lucas-clemente
Copy link
Member

For normal shutdowns, we should send GOAWAY frames, stop accepting new connections, and let all requests finish before shutting down the server.

@lucas-clemente lucas-clemente modified the milestone: 0.4 Jun 1, 2016
@marten-seemann marten-seemann self-assigned this Jun 6, 2016
@lucas-clemente lucas-clemente modified the milestones: 0.4, 0.5 Aug 10, 2016
@lucas-clemente lucas-clemente modified the milestone: 0.5 Aug 25, 2016
@lucas-clemente lucas-clemente self-assigned this Oct 17, 2016
@marten-seemann marten-seemann added this to the 0.6 milestone Nov 1, 2016
@jameshartig
Copy link

Now that Golang 1.8 brought graceful shutdown [1], hopefully this is simpler?

[1] https://golang.org/doc/go1.8#http_shutdown

@lucas-clemente
Copy link
Member Author

Yes, we could implement a similar API for both quic and h2quic. I didn't start working on this yet, if anyone wants to help please ping the issue and I will provide more context.

@lucas-clemente lucas-clemente removed their assignment Apr 25, 2017
@lucas-clemente lucas-clemente modified the milestone: 0.6 Aug 22, 2017
@ocastx
Copy link

ocastx commented Apr 13, 2018

I would like to give a hand here

@Kh4n
Copy link

Kh4n commented Jan 23, 2020

@lucas-clemente Is this issue still available? I see that the file has moved to http3/server.go, but the stub is still there.

@marten-seemann
Copy link
Member

@Kh4n Yes, we haven't worked on this yet.

@Kh4n
Copy link

Kh4n commented Jan 23, 2020

Looking at the most recent draft version for QUIC (v25), GOAWAY frames are no longer part of the spec (section C.21). Should we now send a STOP_SENDING frame with an appropriate application error code?

@marten-seemann
Copy link
Member

GOAWAY is a HTTP feature, not a QUIC feature, see https://quicwg.org/base-drafts/draft-ietf-quic-http.html#name-goaway.

@Kh4n
Copy link

Kh4n commented Jan 23, 2020

That makes sense. I have added a GOAWAY frame into frames.go, following how the headersFrame is constructed. Looking at the spec, I need to send this frame on a control stream, but I do not have access to any of the sessions or their respective streams. There are a variety of ways I think we can track the sessions and streams; is there a preferred method of doing this that you have in mind?

@marten-seemann
Copy link
Member

The control stream is opened here: https://github.com/lucas-clemente/quic-go/blob/62d3a4166a86de0f4a6a31f66411a6c3446f28fb/http3/server.go#L166-L171.

It would probably make sense to save it in a member of the server. Furthermore, you'll need to accept the control stream (by calling Session.AcceptUniStream()) when a new connection is established, and then process frames arriving on this stream.

Does that make sense? Let me know if you run into any problems, I'm happy to help!

@Kh4n
Copy link

Kh4n commented Jan 23, 2020

Ok, so I opted to make a map[quic.Stream]quic.SendStream, basically each stream and its associated control stream. CloseGracefully now looks like this:

func (s *Server) CloseGracefully(timeout time.Duration) error {
	s.closed.Set(true)

	s.mutex.Lock()
	defer s.mutex.Unlock()

	// send GOAWAY frames to each open incoming stream
	var err error
	for str, ctrlStr := range s.ctrlStreams {
		goaway := goawayFrame{
			StreamID: str.StreamID(),
		}
		buf := bytes.NewBuffer([]byte{0})
		goaway.Write(buf)
		_, cerr := ctrlStr.Write(buf.Bytes())
		if cerr != nil && err == nil {
			err = cerr
		}
	}

	// give time for those connections to either complete or self terminate
	time.Sleep(timeout)

	// close them
	for ln := range s.listeners {
		if cerr := (*ln).Close(); cerr != nil && err == nil {
			err = cerr
		}
	}
	return err
}

I am confused by the AcceptUniStream() part. You want me to add this into client.go? I don't see where it would fit in the server. Additionally, I don't see the session ever closing, where does that happen? Does it close by itself when the streams are closed?

@marten-seemann
Copy link
Member

Why do you need a map? There's only a single control stream: https://quicwg.org/base-drafts/draft-ietf-quic-http.html#name-control-streams. For the graceful shutdown, it looks like you just need to send a single GOAWAY frame on that control stream: https://quicwg.org/base-drafts/draft-ietf-quic-http.html#name-connection-shutdown.

Regarding accepting the control stream using AcceptUniStream(), I would expected that both client and server accept each other's control stream in setupSession() and handleConn(), respectively.

Does that make sense?

@Kh4n
Copy link

Kh4n commented Jan 24, 2020

Ah ok, I see that in the spec. I used a map originally because I thought we needed a GOAWAY frame for each open stream, and its easier to use a map[stream]controlStream than map[controlStream][]stream

So I have added a basic implementation in. All it does is send a GOAWAY with the max StreamID as suggested in the spec, waits for a timeout, then closes the connections. You can get more fancy by checking for new requests and letting those complete, or timing the current ones, but I have not done that.
I also added the AcceptUniStream calls, but this is now breaking a ton of tests with:

Unexpected call to *mockquic.MockSession.AcceptUniStream([context.Background]) at quic-go/internal/mocks/quic/session.go:59 because: there are no expected calls of the method "AcceptUniStream" for that receiver

so I will see if I can fix that. Testing with the example web server and client seems to show that it is working. How does one check if an error is an Application error code 0x0, by the way?

@marten-seemann
Copy link
Member

So I have added a basic implementation in. All it does is send a GOAWAY with the max StreamID as suggested in the spec, waits for a timeout, then closes the connections. You can get more fancy by checking for new requests and letting those complete, or timing the current ones, but I have not done that.

After sending the GOAWAY, do you stop accepting new requests?

I also added the AcceptUniStream calls, but this is now breaking a ton of tests with:
Unexpected call to *mockquic.MockSession.AcceptUniStream([context.Background]) at quic-go/internal/mocks/quic/session.go:59 because: there are no expected calls of the method "AcceptUniStream" for that receiver

That's expected. We're using a mock session to check what calls the http3 package is making to a QUIC session. You'll probably need to add a MockSession.EXPECT().AcceptUniStream() in the test cases.

@Kh4n
Copy link

Kh4n commented Jan 24, 2020

I thought that is what Server.closed was supposed to do, but I see now that is only checked once at the top of servImpl, so I will also add it in handleConn. As far as the tests go, it seems they are checking with an established quic server at https://quic.clemente.io/foobar.html. However, since that server does not accept the control stream, it just hangs while testing (and fails the test). Not sure how to proceed. I have fixed some of the other issues via MockSession.EXPECT().AcceptUniStream()

@marten-seemann
Copy link
Member

None of our tests actually dials a QUIC connection to an external server. That's what we have mocks for.

@Kh4n
Copy link

Kh4n commented Jan 24, 2020

@marten-seemann
Copy link
Member

It's using a mock session, not a real quic.Session. It doesn't doesn't dial any external server (or any server at all, for that matter).

@Kh4n
Copy link

Kh4n commented Jan 24, 2020

How do I modify this mock session? I see that one of the tests returns an error 266 (0x10A) which is an error i just added indicating that the settings stream was not received.

@marten-seemann
Copy link
Member

You can't modify the mock session. You can register calls and define return values via EXPECT(), for example as done here: https://github.com/lucas-clemente/quic-go/blob/62d3a4166a86de0f4a6a31f66411a6c3446f28fb/http3/roundtrip_test.go#L95-L97

@Kh4n
Copy link

Kh4n commented Jan 25, 2020

Finally fixed the errors. Cleaning up a little now and testing again with example server

@Kh4n
Copy link

Kh4n commented Jan 25, 2020

A common thing that I'm adding is keeping track of streams. However, I feel that the quic.Session interface should have some method for doing this by itself. I didn't see any method inside quic.Session, but I do see that the implementation has a streamsMap field. Is it ok to add a method like this?

@marten-seemann
Copy link
Member

Why do you need to keep track of streams for graceful shutdown?

@Kh4n
Copy link

Kh4n commented Jan 26, 2020

Well the server almost certainly needs to, as it uses the streamIDs to decide what streams to allow to continue. As of right now, it just allows them all to continue. On the client side, requests with streamIDs >= goaway frame streamID need to be terminated, as they will not complete. However, for right now, I have not added that in, given that the server sends the max streamID anyways.
I think it is good for now, should I just submit a pull request?

@marten-seemann
Copy link
Member

Well the server almost certainly needs to, as it uses the streamIDs to decide what streams to allow to continue.

Why is keeping track of the highest opened stream ID not sufficient?

On the client side, requests with streamIDs >= goaway frame streamID need to be terminated, as they will not complete.

As long as a request is in flight, there's a go routine somewhere in RoundTrip(). Wouldn't it be possible to notify all Go routines of the GOAWAY and have the ones with a higher stream ID shut down?

@Kh4n
Copy link

Kh4n commented Jan 27, 2020

Keeping track of the highest streamID is not really necessary, if all you want to do is let all the requests finish and then quit. In that case, just send the highest possible streamID (2^62 - 4 i believe) and the servers job is done. But if you wanted to do something more complicated like evaluate what requests you want to process, you'll need to save the requests and their associated streams somewhere so you can know when they complete, as well as close any streams above the selected streamID.

How would I notify the go routines in the client? I didn't see a built in way. I see a Cancel channel in the requests, maybe use that?

@marten-seemann
Copy link
Member

Keeping track of the highest streamID is not really necessary, if all you want to do is let all the requests finish and then quit. In that case, just send the highest possible streamID (2^62 - 4 i believe) and the servers job is done.

That doesn't match my understanding of how GOAWAY works: https://quicwg.org/base-drafts/draft-ietf-quic-http.html#name-connection-shutdown

How would I notify the go routines in the client? I didn't see a built in way. I see a Cancel channel in the requests, maybe use that?

That's something that you'd need to build. Currently this is not yet supported. Note that there are multiple places where RoundTrip() can block: when opening the stream, and when doing the request. Canceling OpenStreamSync should be trivial, as it accepts a context. Making the Read call from the stream return could be accomplished by setting a read deadline on that stream that lies in the past.

@Kh4n
Copy link

Kh4n commented Jan 27, 2020

Hmm, I'm rereading it but I don't see any way to implement third to last paragraph of that section without keeping track of the streams/requests. I will take your word for it though.

As far as the client goes, I am alright with adding it in. I will try those suggestions as well.

@marten-seemann
Copy link
Member

Hmm, I'm rereading it but I don't see any way to implement third to last paragraph of that section without keeping track of the streams/requests. I will take your word for it though.

AcceptStream returns streams in order (the stream ID will increase by 4).

@Kh4n
Copy link

Kh4n commented Jan 27, 2020

Ok, maybe I have a misunderstanding of how the server works. Basically, the StreamID parameter in a GOAWAY frame indicates that all requests with StreamID >= to it have are not going to be processed. How would the server, without having access to a list of streams, close those requests?

@marten-seemann
Copy link
Member

Wouldn't it make sense that the server processes all requests on streams that it already has accepted, and then stops accepting new streams after that?

@Kh4n
Copy link

Kh4n commented Jan 27, 2020

Ah, yes that is true for graceful shutdown. General case of GOAWAY allows server to terminate streams at will. I guess we should change the CloseGracefully doc comment to indicate that there is no timeout, and that all current requests will be processed. Otherwise, I don't think that would follow the quic-http spec.

@Kh4n
Copy link

Kh4n commented Feb 3, 2020

Was unable to work due to school, but I just finished it up. Redid most of it bc of merges. As of now:

  • The server keeps track of control streams and gets/expects a settings frame. It maintains the highest streamID encountered and when CloseGracefully is called, it sends out that streamID and waits for all of its requests to finish, while also cancelling the AcceptStream for each session
  • The client receives the goaway frame and closes streams >= to it by setting the read deadline in the past. I have tested this and it works, but it does return an error that could be confusing. It also cancels the context in OpenStreamSync, however this required me to change what context it originally was (used to take a req.Context(), but I changed to be a context with a parent of our own context)
  • The tests are fixed acceptably, with the exception of two tests in roundtrip_test, where I made the OpenStreamSync calls take a gomock.Any() instead of a context, because I did not see a way to access the client context from there

@marten-seemann
Copy link
Member

Thank you @Kh4n. That sounds reasonable. Would you mind sending us a PR?

@Kh4n
Copy link

Kh4n commented Feb 4, 2020

Done, but it looks like I missed a test. Trying to see if I can fix

Kh4n added a commit to Kh4n/quic-go that referenced this issue Feb 12, 2020
@ice-cronus
Copy link

Any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants