Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: Accept sometimes returns address family not supported by protocol family error on OS X #3849

Closed
snaury opened this issue Jul 22, 2012 · 10 comments

Comments

@snaury
Copy link
Contributor

@snaury snaury commented Jul 22, 2012

I had a go http server up for several days. Today I fire up my screen and see this:

panic: accept tcp 0.0.0.0:12345: address family not supported by protocol family

This is on latest go1 on Mac OS X 10.7.4, 64-bit. I wonder how accept could fail like
that?
@snaury
Copy link
Contributor Author

@snaury snaury commented Jul 22, 2012

Comment 1:

Btw, panic is coming from my code, this is an error returned from
http.ListenAndServe(":12345",...)
@mikioh
Copy link
Contributor

@mikioh mikioh commented Jul 22, 2012

Comment 2:

I guess basically OS's syscall accept never returns EAFNOSUPPORT.
By walking through go standard libarary, looks like syscall.anyToSockaddr
does return EAFNOSUPPORT and it's called by syscall.Accept. How could it 
be? memory corruption or alignment failure? perhaps... not sure.
@snaury
Copy link
Contributor Author

@snaury snaury commented Jul 22, 2012

Comment 3:

Yes, you may be right, it's very strange indeed. I looked into xnu and I think there
might be a race condition there somewhere. Basically, during accept it takes a malloc'ed
sockaddr, which could be NULL, and there might be a chance (haven't checked if locking
is correct) for socket accept to succeed, but sockaddr_in generation to fail (result of
in_setpeeraddr is not checked for errors) due to a connection being reset in the mean
time (another possibility of malloc failing is too unlikely).
In short, it may be that on Mac OS X accept() may succeed but not fill sockaddr address
parameter, leaving it with garbage. :(
@snaury
Copy link
Contributor Author

@snaury snaury commented Jul 22, 2012

Comment 4:

Oh, and the comment in there is just marvelous, "Actually, we don't trap because there
actually /is/ a programming error somewhere..." :D
@snaury
Copy link
Contributor Author

@snaury snaury commented Jul 23, 2012

Comment 6:

I've been running my server for a while now and found that this error is actually
happening quite often (probably because chineese IP addresses are constantly trying to
scan me). I wrapped ListenAndServe into a for loop (since I got tired of restarting it),
and here it is, less than an hour and two errors logged already:
2012/07/24 00:05:11 Server running...
2012/07/24 00:25:02 accept tcp 0.0.0.0:12345: address family not supported by protocol
family
2012/07/24 00:51:35 accept tcp 0.0.0.0:12345: address family not supported by protocol
family
@mikioh
Copy link
Contributor

@mikioh mikioh commented Jul 24, 2012

Comment 7:

Can you show the dump data of such garbage sockaddr that captured
at  syscall.anySockaddr? RawSockaddr.Len and RawSockaddr.Family
give us more hints, I guess.
@snaury
Copy link
Contributor Author

@snaury snaury commented Jul 24, 2012

Comment 8:

Sure, I added a simple Write(1, ...) to anyToSockaddr and here's what I got:
anyToSockaddr: Addr.Len = 0 Addr.Family = 0
anyToSockaddr failed, nfd = 3
2012/07/25 00:07:49 accept tcp 0.0.0.0:12345: address family not supported by protocol
family
anyToSockaddr: Addr.Len = 0 Addr.Family = 0
anyToSockaddr failed, nfd = 4
2012/07/25 00:07:49 accept tcp 0.0.0.0:12345: address family not supported by protocol
family
anyToSockaddr: Addr.Len = 0 Addr.Family = 0
anyToSockaddr failed, nfd = 4
2012/07/25 00:07:55 accept tcp 0.0.0.0:12345: address family not supported by protocol
family
anyToSockaddr: Addr.Len = 0 Addr.Family = 0
anyToSockaddr failed, nfd = 4
2012/07/25 00:07:56 accept tcp 0.0.0.0:12345: address family not supported by protocol
family
This is especially cool since I found how to reproduce it basically 100% of the time.
Server up on my machine (Mac OS X 10.7.4), then I run this:
$ nmap -p 80 myhost
On my linux box in London (the port 80 is forwarded by my router to port 12345 on my
machine). Turns out it happens to work like a charm and reliably triggers this bug. :)
Here's a diff in pkg/syscall where I added that print:
diff -r 5e806355a9e1 src/pkg/syscall/syscall_bsd.go
--- a/src/pkg/syscall/syscall_bsd.go    Thu Jun 14 12:50:42 2012 +1000
+++ b/src/pkg/syscall/syscall_bsd.go    Wed Jul 25 00:14:05 2012 +0400
@@ -294,6 +294,7 @@
                }
                return sa, nil
        }
+       Write(1, []byte("anyToSockaddr: Addr.Len = " + itoa(int(rsa.Addr.Len)) + "
Addr.Family = " + itoa(int(rsa.Addr.Family)) + "\n"))
        return nil, EAFNOSUPPORT
 }
 
@@ -306,6 +307,7 @@
        }
        sa, err = anyToSockaddr(&rsa)
        if err != nil {
+               Write(1, []byte("anyToSockaddr failed, nfd = " + itoa(nfd) + "\n"))
                Close(nfd)
                nfd = 0
        }
It's probably just like I suspected, socket is accepted by due to a race in the kernel
there's no in_pcb associated anymore, so there's no sockaddr and thus it's not copied to
user space, leaving all zeroes.
If you're interested this is a possible call chain during accept:
http://fxr.watson.org/fxr/source/bsd/kern/kpi_socket.c?v=xnu-1699.24.8#L154
http://fxr.watson.org/fxr/source/bsd/netinet/tcp_usrreq.c?v=xnu-1699.24.8;im=10#L537
http://fxr.watson.org/fxr/source/bsd/netinet/in_pcb.c?v=xnu-1699.24.8;im=10#L1072
I'm not sure there's any sane way to fix (or even workaround) it in go though... it's
just not supposed to work like that. :(
@snaury
Copy link
Contributor Author

@snaury snaury commented Jul 24, 2012

Comment 9:

Oh wow, it's been right there... First I was looking in the wrong file (should be
uipc_syscalls.c, not kpi_socket.c), but even there accept_nocancel() simply ignores
error result from soacceptlock(). There might not even be any races, ECONNABORTED simply
gets ignored and never propagates to the caller. :-/ It is feasible to convert such
"successes" when Addr.Len is 0 to ECONNABORTED? Preliminary patch is something like this:
diff -r 5e806355a9e1 src/pkg/syscall/syscall_bsd.go
--- a/src/pkg/syscall/syscall_bsd.go    Thu Jun 14 12:50:42 2012 +1000
+++ b/src/pkg/syscall/syscall_bsd.go    Wed Jul 25 01:25:41 2012 +0400
@@ -304,6 +304,14 @@
        if err != nil {
                return
        }
+       if rsa.Addr.Len == 0 && rsa.Addr.Family == AF_UNSPEC {
+               // Workaround for Darwin: xnu ignores errors from
+               // soacceptlock, so ECONNABORTED is not returned
+               // from accept syscall. Turn it into a correct
+               // error here.
+               Close(nfd)
+               return 0, nil, ECONNABORTED
+       }
        sa, err = anyToSockaddr(&rsa)
        if err != nil {
                Close(nfd)
I'm not sure if this is entirely correct though.
@snaury
Copy link
Contributor Author

@snaury snaury commented Jul 24, 2012

Comment 10:

Oh, and just for reference, I managed to find probably the same bug only in FreeBSD 3:
http://fxr.watson.org/fxr/source/kern/uipc_syscalls.c?v=FREEBSD3#L271 (in later FreeBSD
versions it appears to be fixed)
@adg
Copy link
Contributor

@adg adg commented Jul 29, 2012

Comment 11:

This issue was closed by revision 5197fa8.

Status changed to Fixed.

@snaury snaury added fixed labels Jul 29, 2012
adg added a commit that referenced this issue May 11, 2015
««« backport 0eae95b0307a
syscall: workaround accept() bug on Darwin

Darwin kernels have a bug in accept() where error result from
an internal call is not checked and socket is accepted instead
of ECONNABORTED error. However, such sockets have no sockaddr,
which results in EAFNOSUPPORT error from anyToSockaddr, making
Go http servers running on Mac OS X easily susceptible to
denial of service from simple port scans with nmap.
Fixes #3849.

R=golang-dev, adg, mikioh.mikioh
CC=golang-dev
https://golang.org/cl/6456045

»»»
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.