Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Shutdown must make all future Serves return errors #20239

Closed
mrwonko opened this issue May 4, 2017 · 20 comments

Comments

Projects
None yet
@mrwonko
Copy link

commented May 4, 2017

What version of Go are you using (go version)?

1.8.1

What operating system and processor architecture are you using (go env)?

darwin amd64, but this is a general issue I also encounter in docker containers

What did you do?

I'm trying to reliably gracefully shut down an http Server: https://play.golang.org/p/ESmgJ6O_WU

However, because ListenAndServe() blocks while the Server is running, there is no way to ensure its call happens before a call to Shutdown(), so Shutdown() may be called before ListenAndServe(), in which case it returns nil and the later call to ListenAndServe() still starts a server that is then never shut down.

What did you expect to see?

I expect to have some way of knowing that the server is now up and running such that Shutdown() will stop it, short of sending it http requests. Maybe an interface closer to httptest.Server.

What did you see instead?

It seems I either need to repeatedly do http requests to the server to ensure it is up before calling Shutdown() or repeatedly call Shutdown() until ListenAndServe() returns, like this: https://play.golang.org/p/LPfpz0LaBY

Note that due to the pre-emptive nature of Goroutines, since Shutdown() can never be called on the same Goroutine as ListenAndServe(), this theoretically affects all programs using Shutdown() without a loop.

@matipan

This comment has been minimized.

Copy link

commented May 4, 2017

Hi @mrwonko, in order to follow Go's conventions on issues: could you please change the subject of the issue to: net/http: No simple way to definitely shut down an http.Server

Thanks

@bradfitz bradfitz changed the title No simple way to definitely shut down an http.Server net/http: some bad interaction between Server Shutdown and ListenAndServe May 4, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented May 4, 2017

I'm busy this week and don't have time to look at this. Can somebody distill this down to a short summary for me?

@mrwonko

This comment has been minimized.

Copy link
Author

commented May 4, 2017

tldr: Because ListenAndServe() never returns, there's no way to ensure a call to Shutdown() happens after the server is started, so it may be a noop and afterwards the server still starts and runs forever. Non-returning functions are very prone to race conditions.

@bradfitz

This comment has been minimized.

Copy link
Member

commented May 4, 2017

ListenAndServe is a convenience function. If you don't find it convenient, do the Listen and Serve (https://golang.org/pkg/net/http/#Server.Serve) steps separately.

@bradfitz

This comment has been minimized.

Copy link
Member

commented May 4, 2017

But I still haven't read this enough to understand the problem so I might be replying prematurely, though.

@mrwonko

This comment has been minimized.

Copy link
Author

commented May 4, 2017

Serve() has the same problems as it does not return (until the server is shut down) either.

@vcabbage

This comment has been minimized.

Copy link
Member

commented May 4, 2017

I believe the issue is that there's no simple way to know if the server has started.

If the shutdown is happening immediately after the Serve() goroutine is started there's a race and Shutdown() may be called on a server which has not started.

This could be worked around with a custom listener (https://play.golang.org/p/x15ZPYYmfo), but I think it would be useful if there were a Started() method (or something similar) on a server that blocked until the server has started.

@kdomanski

This comment has been minimized.

Copy link

commented May 5, 2017

As @quentinmit wrote in #16220 (comment)

If we were designing net/http from scratch, I would say that ListenAndServe should take a Context argument. But we can't add that now without breaking the Go1 compatibility guarantee, and I don't think it's worth adding a second ListenAndServeWithContext. :/

Turns out adding a ServeWithContext() might now be necessary given the raciness of current Shutdown() solution

@magiconair

This comment has been minimized.

Copy link
Contributor

commented May 19, 2017

@bradfitz Here is a version which just uses Listen and Serve and it deadlocks on play.golang.org when you close the server immediately. I'm running into this with some tests.

https://play.golang.org/p/LOD878em-R

When you trigger a context switch it works.

2009/11/10 23:00:00 waiting
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x107814dc, 0x5516)
	/usr/local/go/src/runtime/sema.go:47 +0x40
sync.(*WaitGroup).Wait(0x107814d0, 0x1)
	/usr/local/go/src/sync/waitgroup.go:131 +0x80
main.main()
	/tmp/sandbox649309351/main.go:29 +0x200

goroutine 5 [semacquire]:
sync.runtime_notifyListWait(0x107504e0, 0x0)
	/usr/local/go/src/runtime/sema.go:297 +0x140
sync.(*Cond).Wait(0x107504d8, 0x107504d0)
	/usr/local/go/src/sync/cond.go:57 +0xc0
syscall.(*queue).waitRead(0x107504d0, 0x1, 0x0, 0x0, 0x9f6c0, 0x5516, 0x3fd980, 0x0)
	/usr/local/go/src/syscall/net_nacl.go:284 +0x120
syscall.(*msgq).read(0x107504d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/go/src/syscall/net_nacl.go:401 +0xc0
syscall.(*netFile).accept(0x10734240, 0x5516, 0x10734240, 0x0, 0x0, 0x1070e030, 0x800000, 0x0)
	/usr/local/go/src/syscall/net_nacl.go:563 +0x60
syscall.Accept(0x3, 0x107421e8, 0x4f700, 0x5516, 0x107321fc, 0x10715e98, 0x8, 0x5516)
	/usr/local/go/src/syscall/net_nacl.go:783 +0xa0
net.accept(0x3, 0x2fbec0, 0x107421e0, 0x0, 0x18, 0x2bf020, 0x0, 0x1070e030)
	/usr/local/go/src/net/sys_cloexec.go:45 +0x40
net.(*netFD).accept(0x107421e0, 0x5516, 0x0, 0x0, 0x0, 0x1070e030)
	/usr/local/go/src/net/fd_unix.go:422 +0xe0
net.(*TCPListener).accept(0x1070c518, 0x5516, 0x1073213c, 0x10740d40, 0x2a1140, 0x3f5100)
	/usr/local/go/src/net/tcpsock_posix.go:136 +0x40
net.(*TCPListener).Accept(0x1070c518, 0x10740d20, 0x2a1140, 0x3f5100, 0x2c2760, 0x107409c0)
	/usr/local/go/src/net/tcpsock.go:228 +0x60
net/http.(*Server).Serve(0x1073e680, 0x3d2ae0, 0x1070c518, 0x0, 0x0, 0x0)
	/usr/local/go/src/net/http/server.go:2643 +0x260
main.main.func1(0x107814d0, 0x1073e680, 0x3d2ae0, 0x1070c518)
	/tmp/sandbox649309351/main.go:22 +0x80
created by main.main
	/tmp/sandbox649309351/main.go:25 +0x140
@magiconair

This comment has been minimized.

Copy link
Contributor

commented May 19, 2017

One way to work around this for now is to use a wait group in reverse. I think you can still have a context switch between isUp.Done() and srv.Serve so this shouldn't be 100% safe but better than sprinkled context switches. At least code from the right go routine was executed.

srv := &http.Server{}
var isUp sync.WaitGroup
isUp.Add(1)
go func() {
    isUp.Done()
    if err := srv.Serve(l); err != nil {...}
}()
isUp.Wait()
// safe to call srv.Shutdown() from here on

@bradfitz bradfitz added this to the Go1.9Maybe milestone May 19, 2017

@lmb

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2017

I think there is another, related race condition: a connection can be accepted in Serve but not closed by Shutdown. The sequence is something like this:

Server.Serve: ln.Accept()
Server.Shutdown: s.closeListenersLocked()
Server.Shutdown: s.closeIdleConns() == true
Server.Serve: c.setState(StateNew)

The fix would be to wait for Serve to exit before returning from Shutdown.

@lestrrat

This comment has been minimized.

Copy link

commented Oct 6, 2017

I stumbled across this issue, so I took a few hours taking a look. I investigated the interaction between Serve() and Shutdown()/Close() instead of ListenAndServe() as the original report discusses, as that's where the main logic is.

First, I did see a somewhat surprising behavior when I tried to run code similar to #20239 (comment). You would expect the server to stop and Serve() to bail out upon calling Close(), but Serve() just blocks on l.Accept(), waiting for connections.

The reason for this is that while Close() kills off all net.Listeners that were registered before the call to Close() is made, it does not necessarily "stop" the Serve() call. The order of event that leads to Serve() to block is as follows:

  1. srv.Close() gets called, kills all previous connections
  2. srv.Serve(l) gets called
  3. srv.trackListeners(l) gets called, registers the listener l
  4. At the same time as (3), srv.trackListeners(l) sees that the number of registered listeners == 0, so clears srv.doneChan, and basically resets the state of the Server as running.
  5. the server merrily goes onto waiting for incoming connections, and blocks on l.Accept(). It will even go into the server process loop again, because the srv.doneChan says that we're running.

The really tricky part is that srv.trackListeners() actually resets srv.doneChan.

From the perspective of a user who is creating a simple Web server, you would expect Close() to set a flag in the Server to not allow it to process anymore requests, so the fact that Serve() could block should be understandably confusing.

However, this reset is there to allow re-using the Server even after Close() was called. In pseudocode, this is allowed in the current http.Server:

var srv http.Server
l1, err := newListener(...)
go srv.Serve(l1)
srv.Close()

// ... later ...
l2, err := newListener(...)
go srv.Serve(l2)

It's fairly easy to make Serve() not block when Close() is called before it, but a simple fix would actually make the Server not work when Serve() is called again.

We have a situation where we are sharing http.Server (global state) between multiple goroutines, and their behavior is timing dependent as to when you call Close() and Serve().

Given this fact and that we have a backwards compatible guarantee, I think we should be looking at introducing a context.Context aware API. This would free the user from having to know the timing when the server is ready as the OP posted alltogether

I expect to have some way of knowing that the server is now up and running such that Shutdown() will stop it, short of sending it http requests.

l1, ... := newListener(addr1)
l2, ... := newListener(addr2)
ctx, cancel := context.WithCancel()
go srv.ServeCtx(ctx, l1)
go srv.ServeCtx(ctx, l2)

// ... later ...
cancel() // kills both previous
// if you need to reuse srv, you can, by using a new context
go srv.ServeCtx(ctx2, l3)

Maybe inclusion of such API is planned, I don't know.

But meanwhile, without a new API, I don't think this an easy to fix problem. So I'd like to suggest documenting that it is the users' responsibility to synchronize calling Close()/Shutdown() and Serve() (or equivalent), or make sure to call Close()/Shutdown() repeatedly until the call to Serve() returns (this is not an elegant solution, but it should work 99% of the times).


If people are up for (1) documentation and (2) context.Context aware API, I'm willing to work on either.

evan2645 added a commit to evan2645/spire that referenced this issue Oct 11, 2017

Fix flaky test and mutex bug in server endpoint
When not unlocking the mutex with defer, ensure that all return paths unlock
Since ListenAndServe blocks forever, you are generally forced to run it in a goroutine. The goroutine generally returns before the servers are fully loaded. As a result, tests which invoke the server in a goroutine and then call Shutdown will hang indefinitely as the Shutdown call can't be guaranteed to occur after the servers have fully loaded.
golang/go#20239
@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2017

This is a fundamental mistake in the Shutdown API. The idea that you can Shutdown a Server and then once it's done call Serve again is wrong. We should make it be that once you Shutdown a server, it's shut down. Future calls to Serve simply return errShutdown.

If this is too invasive for now it would be fine to punt to Go 1.11 but there's really no other way it can work. Any other way, you have this race between concurrent Shutdown and Serve behaving unpredictably.

It's only a few lines of code to make shutdown sticky, and there's a decent argument for making this possibly-breaking change now instead of letting Shutdown last another release in the racy form (it was introduced in Go 1.8), just to head off 6 more months of people potentially thinking they can Serve again after Shutdown.

/cc @tombergan @bradfitz to decide about Go 1.10 vs Go 1.11

@rsc rsc changed the title net/http: some bad interaction between Server Shutdown and ListenAndServe net/http: Shutdown must make all future Serves return errors Nov 22, 2017

@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2017

(The needed decision is what milestone.)

@gopherbot

This comment has been minimized.

Copy link

commented Dec 4, 2017

Change https://golang.org/cl/81778 mentions this issue: net/http: prevent Server reuse after a Shutdown

@rsc rsc removed this from the Go1.10 milestone Dec 4, 2017

@rsc rsc added this to the Go1.11 milestone Dec 4, 2017

@pam4

This comment has been minimized.

Copy link

commented Apr 15, 2018

I'm trying to understand better this issue and I have a few questions:
(All identifiers are fields or methods of http.Server)

  1. This CL address Shutdown, but what about Close? Just an oversight or there is some reason to still allow reuse after Close? It seems to me that Close suffers from exactly the same problems.

  2. Is it correct that we want to be able to unblock any (present and future) ListenAndServe/Serve reliably with just one call to Close/Shutdown?
    That would be useful and spare the need for the aforementioned loop workaround.

This CL doesn't seem to achieve it:

  • it is still possible that a Close/Shutdown will leave ListenAndServe/Serve blocked;
  • it is still possible that a Close/Shutdown will make a ListenAndServe/Serve return "use of closed network connection" instead of "server closed" (see test below).
  1. I'm trying to understand why there is a need for doneChan to be replaced during the server life cycle (besides supporting reuse).
    Can someone explain the following piece of code, please?
    How, closing doneChan "closes http2 conns" here?
func (srv *Server) SetKeepAlivesEnabled(v bool) {
	// [CUT]
	// Close HTTP/2 conns, as soon as they become idle, but reset
	// the chan so future conns (if the listener is still active)
	// still work and don't get a GOAWAY immediately, before their
	// first request:
	srv.mu.Lock()
	defer srv.mu.Unlock()
	srv.closeDoneChanLocked() // closes http2 conns
	srv.doneChan = nil

Proposal:

  • Change the type of inShutdown to http.atomicBool, to make it explicit that it goes from false to true and never back.
  • Replace every occurrence of shuttingDown with inShutdown.isSet (they do the same thing).
  • Add srv.inShutdown.setTrue() on top of both Close and Shutdown.
    This will make doKeepAlives return false not only after Shutdown but after Close too; I don't suppose it would be a problem but correct me if I'm wrong.
  • In method Serve, add if srv.inShutdown.isSet() { return ErrServerClosed } between the call to trackListener and the for loop.
    This will ensure that, if l.Accept is reached, then the listener l was already tracked before any Close/Shutdown had the chance to run.
  • In method Serve, replace the select with if srv.inShutdown.isSet() { return ErrServerClosed }.
  • Remove methods closeDoneChanLocked, getDoneChan, getDoneChanLocked, shuttingDown, and field doneChan.

This should fix mentioned problems and simplify a bit, without changing any behavior that is currently predictable.
But I may be missing something, specially about the above code block.

I run a test program spawning 3 goroutines per http.Server (2 calling Serve and 1 calling Close).
In one test run with the current version (100,000 iterations):

  • 192,900 Serve returned http.ErrServerClosed
  • 832 Serve returned "use of closed network connection"
  • 6,268 Serve never returned

With the proposed modifications all 200,000 Serve returned http.ErrServerClosed.

@bradfitz bradfitz self-assigned this May 18, 2018

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 2, 2018

@pam4, let's do that conversion to atomicBool in August when the Go 1.12 tree opens. I added a TODO for that in the just-updated https://go-review.googlesource.com/c/go/+/81778

@gopherbot gopherbot closed this in 17e503f Jul 3, 2018

@pam4

This comment has been minimized.

Copy link

commented Jul 3, 2018

@bradfitz Thanks for your reply, I have a few concerns about the CL:

  1. The serveDone channel in method Serve seems to be unused. Is there a purpose to it?
serveDone := make(chan struct{})
defer close(serveDone)
  1. This code in the trackListener method should probably be removed, since we are now preventing Server reuse:
// If the *Server is being reused after a previous
// Close or Shutdown, reset its doneChan:
if len(s.listeners) == 0 && len(s.activeConn) == 0 {
    s.doneChan = nil
}
  1. Actually, I don't understand why we are still using Server.doneChan at all. Is it because at the moment we want to change as little as possible?
    My suggestion was to replace the select in the Serve method with if srv.shuttingDown() { return ErrServerClosed }.
    Is there any reason why doneChan is not completely redundant?

  2. doneChan is closed and reset in the SetKeepAlivesEnabled method, but I can't think of any useful effect in doing so.
    The only effect I can imagine is to cause Serve to terminate unexpectedly, and that could be a problem.
    How can it affect http2 connections, as the comment says?

// Close HTTP/2 conns, as soon as they become idle, but reset
// the chan so future conns (if the listener is still active)
// still work and don't get a GOAWAY immediately, before their
// first request:
srv.mu.Lock()
defer srv.mu.Unlock()
srv.closeDoneChanLocked() // closes http2 conns
srv.doneChan = nil
  1. (A minor observation) I see that you put the shuttingDown check inside trackListener (changing trackListener signature).
    I see no problem with that code, I just wanted to mention that doing the check anywhere between trackListener and the Accept should be equally safe for our purposes (there is no need for the check to be in the critical section, or for the listener to be tracked conditionally).
@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 9, 2018

Thanks for your reply, I have a few concerns about the CL:

Replies on Gerrit would be easier, but thanks for the comments.

The serveDone channel in method Serve seems to be unused. Is there a purpose to it?

No, looks unused. I'll send out a change to delete it.

This code in the trackListener method should probably be removed, since we are now preventing Server reuse:

Nice catch. Will remove.

And yes, the doneChan stuff is for http2. Or, it was. That changed in https://go-review.googlesource.com/43230 which is only for Go 1.9+.

So, yeah, the "closes http2 conns" comment doesn't seem accurate anymore. I'll file a bug.

@gopherbot

This comment has been minimized.

Copy link

commented Jul 9, 2018

Change https://golang.org/cl/122820 mentions this issue: net/http: remove dead code noted in post-submit review of CL 81778

gopherbot pushed a commit that referenced this issue Jul 10, 2018

net/http: remove dead code noted in post-submit review of CL 81778
Per comments in #20239 (comment)

Updates #20239
Updates #26303

Change-Id: Iddf34c0452bd30ca9111b951bca48d1e011bd85a
Reviewed-on: https://go-review.googlesource.com/122820
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

jasonwbarnett pushed a commit to jasonwbarnett/fileserver that referenced this issue Jul 11, 2018

net/http: remove dead code noted in post-submit review of CL 81778
Per comments in golang/go#20239 (comment)

Updates #20239
Updates #26303

Change-Id: Iddf34c0452bd30ca9111b951bca48d1e011bd85a
Reviewed-on: https://go-review.googlesource.com/122820
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.