Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: race in netpollinit vs netpoll #22606

rsc opened this issue Nov 7, 2017 · 2 comments


None yet
3 participants
Copy link

commented Nov 7, 2017

I received a private report of a crash in netpoll, running go 1.8.3 on an old linux kernel:

runtime: epollwait on fd -38 failed with 9
fatal error: epollwait failed

runtime stack:
runtime.throw(0x811258, 0x10)
        go/src/runtime/panic.go:596 +0x95
runtime.netpoll(0x14f49fe8c24db200, 0x0)
        go/src/runtime/netpoll_epoll.go:71 +0x13f
        go/src/runtime/proc.go:3820 +0x3a2
        go/src/runtime/proc.go:1179 +0x11e
        go/src/runtime/proc.go:1149 +0x64

This was at program startup during the program's first attempted net.Dial.

It looks to me like there is a race between netpollinit and netpoll. Specifically, netpollinit does:

epfd = epollcreate1(_EPOLL_CLOEXEC)
if epfd >= 0 {
epfd = epollcreate(1024)
if epfd >= 0 {

If epollcreate1 fails (as I assume it might on an old kernel) then that first line is assigning a negative errno to epfd. Netpoll only checks that epfd != -1, which is why it passed -38 (-ENOSYS) to epollwait. Presumably netpollinit was about to overwrite the -38 with the result of epollcreate(1024), which was probably going to work, but it didn't get a chance.

More generally epfd is not itself an atomic variable. Instead there is an atomic netpollInited checked by func netpollinited. It looks like any call to netpoll should be guarded by an if netpollinited(). But the runtime has five calls and only three are guarded:

proc.go:1089:	gp := netpoll(false) // non-blocking
proc.go:2247:		if gp := netpoll(false); gp != nil { // non-blocking
proc.go:2387:		gp := netpoll(true) // block until new work is available
proc.go:2422:		if gp := netpoll(false); gp != nil {
proc.go:4242:			gp := netpoll(false) // non-blocking - returns list of goroutines

1089 and 4242 need guards.

This is probably not reproducible in a test, but we should probably fix it anyway.

/cc @aclements @ianlancetayor @dvyukov

@rsc rsc added this to the Go1.10 milestone Nov 7, 2017


This comment has been minimized.

Copy link

commented Nov 7, 2017

Just a random ping here @rsc to /cc @ianlancetaylor not @ianlancetayor :)


This comment has been minimized.

Copy link

commented Nov 7, 2017

Change mentions this issue: runtime: only call netpoll if netpollinited returns true

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.