-
Notifications
You must be signed in to change notification settings - Fork 17.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FreeBSD sporadic EADDRINUSE #1694
Labels
Comments
http://code.google.com/p/go/issues/attachmentText?id=1694&aid=7352691858693800505&name=stress.go&token=61fdba68ed6cb1d98fc58743e3f81265#19 o_O How is that possible ? When you bind a socket with a port of 0 the kernel is supposed to choose a unique port on your behalf. |
After hammering this for a while on a 8.2-PRERELEASE host I got this 2011/04/13 23:49:53 at 13500 2011/04/13 23:49:53 at 13600 2011/04/13 23:49:53 at 13700 2011/04/13 23:49:53 Accept: accept tcp4 127.0.0.1:15132: too many open files in system Could it be that the sockets are not being closed, or this is some kind of SO_LINGER behaviour ? |
I've been playing with this on a FreeBSD-8.2-PRE host and am only able to trigger it very sporadically. Specifically, once it's triggered the issue goes into hiatus for a few minutes, not that it helps with CI builds. I have a suspicion that this bug may not be triggered if the socket is bound to 0.0.0.0:0, but it's proven hard to demonstrate as I can't reliably reproduce the error. Binding to 0.0.0.0 will drive the the Darwin users with strict firewall settings homicidal, so I'd like to propose submitting Cl that switches on syscall.OS and substitutes net.IPv4zero for the 127.0.0.1 when a BSD system is present. |
Digging into this a bit more, I don't think that net.netFD's are finalized (except under windows) so my best guess is this is a race between the OS clearing out sockets in TIME_WAIT and the test code creating more of them. After changing stress.go to close each connection and running ./6.out in a loop for an hour, I haven't been able to reproduce the issue. |
With package main import ( "log" "net" ) func server(ls net.Listener) { for { c, err := ls.Accept() if err != nil { log.Fatalf("Accept: %v", err) } c.Close() } } func main() { ls, err := net.ListenTCP("tcp4", &net.TCPAddr{net.ParseIP("127.0.0.1"), 0}) if err != nil { log.Fatalf("listen: %v", err) } go server(ls) for i := 0; ; i++ { if i % 100 == 0 { log.Printf("at %d", i) } c, err := net.Dial("tcp", ls.Addr().String()) if err != nil { log.Fatalf("Accept: %v", err) } var buf [1]byte c.Read(buf[:]) c.Close() } } .... it now runs until 50,000-ish and hangs, usually 55500. But if I control-C and re-run, it gets right back to the same spot quickly and hangs again. You have this running for long periods of time without hanging? |
Hi Brad, Your right, when the program freezes there are only around 3500 sockets in TIME_WAIT on my system. Curiously in testing this I saw hangs at 60,600 and 13,500 so I don't think there is anything special about the number of connections. In my testing, while ./6.out stopped processing connections, I could telnet to the accepting socket and it picked up my connection, read a byte, then closed the socket as expected, but that did not unblock the test program. I ran this code on a darwin/6g system and got the familar 2011/04/22 15:03:24 at 16200 2011/04/22 15:03:24 at 16300 2011/04/22 15:03:24 Accept: dial tcp 127.0.0.1:51630: address already in use Have you run this test code on and linux hosts ? I'm trying now on an 8g and a 6g system but the results are inconclusive so far. If you can't reproduce the hang on a linux system then I would like to close this issue and open another one against the kqueue pollsters in fd_darwin.go and fd_freebsd.go. What do you think ? |
Again: Change 4bfb4ffacef6 broke the freebsd-386 build: http://godashboard.appspot.com/log/cb0fb10ddfc730f7c0c637c3675279474a4569093f4bb0bc034114f855dcb2ed |
Running the code from comment 11 on linux/6g, it's still running strong (no slow-downs) past 750,000 iterations. And this is holding steady at around 28,230 +/- a couple: $ sudo netstat -n | grep TIME_WAIT | grep 127.0.0.1: | wc -l 28230 So Linux must re-cycle things in TIME_WAIT quicker. This doesn't really help us, though. |
Seems like an old friend. I've just created a wild hotfix as <http://golang.org/cl/4445067>; but I'm not sure whether this is a correct way as for Go standard library. |
Hi Brad, Okay, I will push 445067 to review process. What I want to say in comment #18 is, I'm a bit afraid of 445067's side effect which describes in old textbook like below. <q> From UNIX Network Programming Volume 1, Third Edition: The Sockets Networking API: 4.4BSD introduced the SO_REUSEPORT socket option when support for multicasting was added. Instead of overloading SO_REUSEADDR with the desired multicast semantics that allow completely duplicate bindings, this new socket option was introduced with the following semantics: 1) This option allows completely duplicate bindings, but only if each socket that wants to bind the same IP address and port specify this socket option. 2) SO_REUSEADDR is considered equivalent to SO_REUSEPORT if the IP address being bound is a multicast address (p. 731 of TCPv2). </q> |
This issue was closed by revision bc92671. Status changed to Fixed. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Attachments:
The text was updated successfully, but these errors were encountered: