Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/poll: Close should not return until the fd is really closed #21856

Closed
slingamn opened this issue Sep 13, 2017 · 10 comments
Closed

internal/poll: Close should not return until the fd is really closed #21856

slingamn opened this issue Sep 13, 2017 · 10 comments

Comments

@slingamn
Copy link

@slingamn slingamn commented Sep 13, 2017

What version of Go are you using (go version)?

1.9

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

linux/amd64

Proposal

I'm taking TCPListener as an example, but I'm pretty sure the issue applies to UDP and Unix domain sockets as well. TCPListener.Close is documented as "Close stops listening on the TCP address." However, Close() does not guarantee that the address is safe to listen on again. This is because if a concurrent goroutine is in the middle of an Accept() call on the same listener, it will hold an additional reference, via an fdmutex readLock, to the underlying file descriptor. Consequently, in the typical case, Close() will merely decrement the reference count and return without actually calling close(2) on the file descriptor.

Because the runtime sets SO_REUSEADDR on all its listen sockets, the address will be free for bind(2) again immediately after the close(2) system call returns. But since close(2) will not execute until both Close() and Accept() return, a synchronization primitive (e.g., a chan or a sync.WaitGroup) is currently required in order to produce the necessary happens-before relationship to safely create a new TCPListener.

Here's a test case (adapted from one suggested by @davecheney on the golang-nuts list):

https://gist.github.com/slingamn/3b0f81169de6578c732ec279828c4866

This case will reliably panic with: panic: listen tcp :6502: bind: address already in use.

The proposal is that all the Listener types affected by this issue should perform this synchronization automatically in their Close methods. This would (at least on POSIX platforms) make it safe to re-listen on an address immediately after Close.

Here are some reasons in support of this change:

  1. The present behavior is counterintuitive because it contradicts the behavior of the underlying, eponymous system calls. (On Linux, with SO_REUSEADDR set, the address is available immediately after close(2) returns; any concurrent accept(2) call will fail with EBADF. This behavior is arguably being simulated, inaccurately, by the polling layer, which does not actually do a blocking accept(2).)
  2. The present behavior is a source of race conditions. If the time.Sleep() call is deleted from the above test case, the program reliably executes without a panic, even when built with -race. It's easy to write code that appears to work, but has a lurking race.
  3. The present behavior is a source of confusion. Here's one example, which includes a workaround that either never worked, or does not work under the current runtime.
  4. Althought "happens-after both Accept() and Close() return" is currently a sufficient condition to safely listen on the address again, this is not clearly specified in the documentation; it seems more like an implementation detail than anything else. It is not clear that future versions of the runtime will behave the same way.
  5. The change appears to have no serious implications for backwards compatibility or performance (since closing a Listener is a relatively uncommon operation, relative to, e.g., accepting new connections).

Thanks very much for your time.

@gopherbot gopherbot added this to the Proposal milestone Sep 13, 2017
@gopherbot gopherbot added the Proposal label Sep 13, 2017
@ianlancetaylor ianlancetaylor changed the title proposal: net.Listener.Close() should not return until the underlying address is freed proposal: net: Listener.Close should not return until the socket is really closed Sep 13, 2017
@slingamn

This comment has been minimized.

Copy link
Author

@slingamn slingamn commented Sep 15, 2017

Sorry --- did anyone have thoughts on this?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 15, 2017

Proposal reviews are done approximately weekly. When this one will be reviewed depends on the overall load. There have been a bunch of proposals recently, for some reason.

My initial reaction is that this seems reasonable, but what is special about TCPListener? Currently I believe that for all the net types with a Close method the Close method returns only with a promise that the socket will be closed when all pending I/O is complete or times out. Should we change all of the Close methods to wait until the socket descriptor is actually really closed?

@slingamn

This comment has been minimized.

Copy link
Author

@slingamn slingamn commented Sep 15, 2017

Thanks for the clarification --- I was confused because the latency of the review process in practice often outperforms the specification :-) (This seems to be a common issue in systems design, e.g., O_PONIES.)

In my opinion, TCPListener, UDPListener, and UnixListener are all special because their file descriptor is associated with a unique resource --- the address they were bound to with bind(2) --- that will not be released until close(2) is called. (Normally, the resource being held is not unique --- something like an ephemeral port, or just the entry in the file descriptor table. It is less important to know exactly when such a resource has been released.)

This was my intention with the original proposal title of "Listener.Close() should not return until the underlying address is freed". The goal is to have Listener.Close reclaim the address so that it can be bound again; the fact that this can only be accomplished via close(2) is more of a detail.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Sep 18, 2017

It's not clear to me if this should be restricted to listeners. It's plausible that other special fds might also exist and that we should instead make sure that when any fd Close returns, the underlying close(2) system call has returned.

It's then not clear to me if this would potentially cause any deadlocks by making Close wait for pending I/O to be aborted, and that pending I/O might take a long time.

Or maybe we should do this for all fds we poll, since at least there we (think we) know how to abort pending I/O promptly.

@slingamn

This comment has been minimized.

Copy link
Author

@slingamn slingamn commented Sep 20, 2017

I think it would definitely help to enumerate any other cases that might be "special"; thus far I can only think of listeners.

It seems like an implementation at the poll / fdMutex level should be deadlock-free, because the fdMutex semaphores are the "innermost" synchronization primitives held on those code paths. It also seems like the code shouldn't have to wait for any blocking I/O calls to complete, just for other goroutines to wake up from waitRead and waitWrite and realize that Close is trying to close the socket? But this in itself could conceivably have an impact on performance for some workloads.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Sep 25, 2017

Per discussion with @ianlancetaylor let's just try making internal/poll's Close do this for all file descriptors and back down to Listeners only if we find some reason to do so.

@rsc rsc changed the title proposal: net: Listener.Close should not return until the socket is really closed internal/poll: Close should not return until the fd is really closed Sep 25, 2017
@rsc rsc modified the milestones: Proposal, Go1.10 Sep 25, 2017
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Sep 26, 2017

Change https://golang.org/cl/66150 mentions this issue: internal/poll: don't return from Close until descriptor is closed

@gopherbot gopherbot closed this in 382d492 Sep 26, 2017
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Sep 27, 2017

Change https://golang.org/cl/66334 mentions this issue: net: use newLocalListener in TestClosingListener.

gopherbot pushed a commit that referenced this issue Sep 28, 2017
Updates #21856

Change-Id: I9baa51fe23e6dd2fcf9dd14f7acfaf7457571e1d
Reviewed-on: https://go-review.googlesource.com/66334
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 13, 2017

Change https://golang.org/cl/83715 mentions this issue: net, os: don't wait for Close in blocking mode

gopherbot pushed a commit that referenced this issue Dec 14, 2017
Updates #7970
Updates #21856
Updates #23111

Change-Id: I0cd0151fcca740c40c3c976f941b04e98e67b0bf
Reviewed-on: https://go-review.googlesource.com/83715
Reviewed-by: Russ Cox <rsc@golang.org>
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 14, 2017

Change https://golang.org/cl/83995 mentions this issue: os: don't wait for Close if the File was returned by NewFile

gopherbot pushed a commit that referenced this issue Dec 14, 2017
os.NewFile doesn't put the fd into non-blocking mode.
In most cases, an *os.File returned by os.NewFile is in blocking mode.

Updates #7970
Updates #21856
Updates #23111

Change-Id: Iab08432e41f7ac1b5e25aaa8855d478adb7f98ed
Reviewed-on: https://go-review.googlesource.com/83995
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Dec 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.