-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: concurrent Accept-Close with no timeout on TCPListener defers closing underlying socket #10527
Comments
Just reproduced with tip also $ go version |
Perhaps, sometimes Accept takes the shared fdMutex in Listener when you call Close. A workaround would be using SetReadDeadline with appropriate value just before Accept, or wait for the Acceptor goroutine to stop before calling Close. |
I don't agree that a timeout is a solution here. That only bounds the time between the Listener.Close() and when the underlying socket is closed, it doesn't improve the situation in any other way. Regarding waiting for the Acceptor goroutine to stop before calling Close that doesn't seem like an easily implementable (or common) pattern. Assuming the goroutine is blocked on Accept() how does one signal that goroutine to exit? That's exactly what Close() is typically used for. I've hacked together a net.Listener wrapper that seems to do what I want, but it seems like it would be better if these were the semantics provided by the standard library. // NewSaneListener returns a new SaneListener around the provided net.Listener.
func NewSaneListener(l net.Listener) net.Listener {
return &SaneListener{l: l, c: sync.NewCond(&sync.Mutex{})}
}
// SaneListener wraps a net.Listener and ensures that once SaneListener.Close
// returns the underlying socket has been closed.
type SaneListener struct {
l net.Listener
c *sync.Cond
refCount int
}
func (s *SaneListener) incRef() {
s.c.L.Lock()
s.refCount++
s.c.L.Unlock()
}
func (s *SaneListener) decRef() {
s.c.L.Lock()
s.refCount--
s.c.Broadcast()
s.c.L.Unlock()
}
// Accept waits for and returns the next connection to the listener.
func (s *SaneListener) Accept() (net.Conn, error) {
s.incRef()
defer s.decRef()
return s.l.Accept()
}
// Close closes the listener.
// Any blocked Accept operations will be unblocked and return errors.
func (s *SaneListener) Close() error {
err := s.l.Close()
if err == nil {
s.c.L.Lock()
for s.refCount > 0 {
s.c.Wait()
}
s.c.L.Unlock()
}
return err
}
// Addr returns the listener's network address.
func (s *SaneListener) Addr() net.Addr {
return s.l.Addr()
} I'm mostly including the code above to demonstrate the behavior I would like to see from the default Listener. I would need to vet/test this much more thoroughly before I trusted it in a production environment. |
I don't think it's a "sometimes" thing. I believe that everytime there is a concurrent Close with Accept, the Close will return before the underlying socket is closed. The only thing that makes the test not fully deterministic is that Sometimes the acceptor goroutine hasn't been run yet when Close is invoked. I've modified the test to demonstrate this better package foo
import (
"net"
"testing"
)
const addr = "127.0.0.1:12345"
func TestXxx(t *testing.T) {
for i := 0; i < 100000; i++ {
t.Logf("Iteration %d", i+1)
l, err := net.Listen("tcp4", addr)
if err != nil {
t.Fatalf("Failed to listen on %s: %v", addr, err)
}
ch := make(chan struct{})
go func(l net.Listener) {
ch <- struct{}{}
for {
_, err := l.Accept()
if err != nil {
return
}
}
}(l)
<-ch
if err := l.Close(); err != nil {
t.Fatalf("Failed to close server: %v", err)
}
}
} This ensures that the acceptor goroutine runs before Close is invoked and at least on my machine this makes it reliably fail on the second iteration. |
I just did some more investigation and it is indeed the fdmu in netFD. The OS socket is not closed until netFD.destroy is called. The sequence of events is
This all appears to be working as intended, but leads to the confusing behavior exhibited by my test case. I would think it would be far more intuitive to have Close remove the filedescriptor from the network poller and close it immediately, and have the other methods return an "already closed" error whenever they are scheduled next. |
I think we could reasonably do something in fdMutex to block until the pending operations are done, so that when Close returns the fd really is closed. Too late for Go 1.5 though. @dvyukov might be interested. |
We want this behavior only for closing of passive sockets, right? |
My particular case is on passive sockets and I think that's certainly where this behavior is more likely to be seen, but it seems like ideally it would apply to passive and active sockets. If Close() is called on a socket it should invoke close on the underlying file descriptor before returning. I wold also expect that shutdown is called before CloseRead() or CloseWrite() return on a TCPConn. Is there a reason to only implement it for passive sockets? |
If you can't observe effects, then there is no point in adding additional waits, goroutine rescheduling, goroutine unblocking and rescheduling again. Just like when write syscall returns, the data is not yet on the disk, and it would be prohibitively expensive to wait for the data to reach the disk. |
My SaneListener may have confused the situation more than it helped. I'm not proposing waiting for other goroutines to schedule before returning from Close(). I'm simply saying that Close() should invoke the underlying close syscall before returning. I don't know go internals at all, but at a high level I'm imagining that Close() would set a flag on the netFD to indicate it's closed, it would then immediately close the underlying file descriptor removing it from the network poller and then return. Any goroutines that are still blocked on netFD operations would become runnable and return with connection closed whenever they are next scheduled. I'm not proposing that there should be a synchronization barrier preventing Close() from returning until all other goroutines have been scheduled. |
You can't do that. |
Thanks for explaining. I hadn't considered the race between checking closed, starting a syscall and the interaction with reusing fd's. In that case I guess the behavior in SaneListener does very accurately mimic what needs to happen and I can understand why it would be preferable to only have this behavior on passive sockets. |
I recently ran into this while trying to write a benchmark for an http server. Here was my reproducer: https://play.golang.org/p/2TDWbImCgow I originally ran into the issue when using the fasthttp library, with code that look similar to this: package wtf
import (
"fmt"
"io/ioutil"
"math/rand"
"net"
"net/http"
"strconv"
"testing"
"time"
"github.com/valyala/fasthttp"
)
func BenchmarkFastHttpHandler(b *testing.B) {
fmt.Println("\n\nTest Iterations:", b.N)
// Get a random number
rand.Seed(time.Now().UTC().UnixNano())
r := strconv.Itoa(rand.Intn(1000000))
fmt.Println("Starting Server:", r)
// Create listener
ln, err := net.Listen("tcp", ":8080")
Must(b, err)
// Start server
go func() {
httpServer := &fasthttp.Server{Handler: func(ctx *fasthttp.RequestCtx) {
fmt.Println("Hit:", r)
ctx.SetBody([]byte(r))
}}
Must(b, httpServer.Serve(ln))
fmt.Println("Closed Server:", r)
}()
time.Sleep(10 * time.Millisecond) // Let the server start
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
req, err := http.NewRequest("GET", "http://127.0.0.1:8080", nil)
Must(b, err)
resp, err := (&http.Client{}).Do(req)
Must(b, err)
if err == nil {
body, err := ioutil.ReadAll(resp.Body)
Must(b, err)
Must(b, resp.Body.Close())
// Check that we are hitting the correct server
if string(body) != r {
b.Fatal("Expected:", r, "Got: ", string(body))
}
}
}
})
Must(b, ln.Close())
}
func Must(tb testing.TB, err error) {
if err != nil {
tb.Fatal(err)
}
} This gave the results:
Reproducing on Golang 1.10.1 with the playground link: package main
import (
"fmt"
"io/ioutil"
"math/rand"
"net"
"net/http"
"strconv"
"testing"
"time"
)
func TestListener(t *testing.T) {
rand.Seed(time.Now().UTC().UnixNano())
for i := 0; i < 5; i++ {
// Create listener
ln, err := net.Listen("tcp", ":8080")
Must(t, err)
// Get a random number
r := strconv.Itoa(rand.Intn(1000000))
fmt.Println("Starting Server:", r)
// Setup server
mux := http.NewServeMux()
mux.HandleFunc("/", func(writer http.ResponseWriter, request *http.Request) {
fmt.Println("Hit:", r)
_, err := writer.Write([]byte(r))
Must(t, err)
})
srv := &http.Server{Handler: mux}
// Start server
go func() {
srv.Serve(ln)
fmt.Println("Closed Server")
}()
time.Sleep(time.Millisecond) // Let the server start
httpClient := &http.Client{}
req, err := http.NewRequest("GET", "http://127.0.0.1:8080", nil)
Must(t, err)
resp, err := httpClient.Do(req)
Must(t, err)
body, err := ioutil.ReadAll(resp.Body)
Must(t, err)
Must(t, resp.Body.Close())
fmt.Println("Got:", string(body))
if string(body) != r {
t.Error("Expected:", r, "Got:", string(body))
}
Must(t, ln.Close())
time.Sleep(time.Second) // Let the listener close
}
}
func Must(tb testing.TB, err error) {
if err != nil {
tb.Fatal(err)
}
} Gave:
|
Go version: go1.4.2
OS: Linux 3.13 x86_64
The following test demonstrates what I consider a bug:
Briefly, within a loop it Listens on tcp port 12345, invokes a goroutine that calls Accept on the listener, and then Closes the listener. My expectation was that this would always pass regardless of scheduler behavior or number of loop iterations as I expected Close() would only return once the underlying socket is no longer in use.
Instead what I see is that it will sometimes pass, but often fails on the net.Listen call of subsequent loop iterations with an error that the address is already in use. I believe I understand the cause of the failure. netFD underlies the TCPListener and it's a reference counted object. A goroutine blocked on Accept will increment the reference count. Close() flips a flag on netFD to indicate the socket is considered closed which will make other netFD operations like Accept return errors indicating the socket is closed. The underlying OS socket is only closed once the reference count for the netFD has reached zero.
The end result is that an OS socket is only closed after TCPListener.Close() is invoked AND all goroutines that are blocked in TCPListener.Accept() are scheduled and return the expected error condition. It seems like more intuitive behavior would be for TCPListener.Close() to immediately close the underlying OS socket, and goroutines blocked in TCPListener.Accept() to return errors whenever they were next scheduled/run.
The text was updated successfully, but these errors were encountered: