Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: race in netpollinit vs netpoll #22606

rsc opened this issue Nov 7, 2017 · 2 comments

runtime: race in netpollinit vs netpoll #22606

rsc opened this issue Nov 7, 2017 · 2 comments


Copy link

@rsc rsc commented Nov 7, 2017

I received a private report of a crash in netpoll, running go 1.8.3 on an old linux kernel:

runtime: epollwait on fd -38 failed with 9
fatal error: epollwait failed

runtime stack:
runtime.throw(0x811258, 0x10)
        go/src/runtime/panic.go:596 +0x95
runtime.netpoll(0x14f49fe8c24db200, 0x0)
        go/src/runtime/netpoll_epoll.go:71 +0x13f
        go/src/runtime/proc.go:3820 +0x3a2
        go/src/runtime/proc.go:1179 +0x11e
        go/src/runtime/proc.go:1149 +0x64

This was at program startup during the program's first attempted net.Dial.

It looks to me like there is a race between netpollinit and netpoll. Specifically, netpollinit does:

epfd = epollcreate1(_EPOLL_CLOEXEC)
if epfd >= 0 {
epfd = epollcreate(1024)
if epfd >= 0 {

If epollcreate1 fails (as I assume it might on an old kernel) then that first line is assigning a negative errno to epfd. Netpoll only checks that epfd != -1, which is why it passed -38 (-ENOSYS) to epollwait. Presumably netpollinit was about to overwrite the -38 with the result of epollcreate(1024), which was probably going to work, but it didn't get a chance.

More generally epfd is not itself an atomic variable. Instead there is an atomic netpollInited checked by func netpollinited. It looks like any call to netpoll should be guarded by an if netpollinited(). But the runtime has five calls and only three are guarded:

proc.go:1089:	gp := netpoll(false) // non-blocking
proc.go:2247:		if gp := netpoll(false); gp != nil { // non-blocking
proc.go:2387:		gp := netpoll(true) // block until new work is available
proc.go:2422:		if gp := netpoll(false); gp != nil {
proc.go:4242:			gp := netpoll(false) // non-blocking - returns list of goroutines

1089 and 4242 need guards.

This is probably not reproducible in a test, but we should probably fix it anyway.

/cc @aclements @ianlancetayor @dvyukov

@rsc rsc added this to the Go1.10 milestone Nov 7, 2017

This comment has been minimized.

Copy link

@odeke-em odeke-em commented Nov 7, 2017

Just a random ping here @rsc to /cc @ianlancetaylor not @ianlancetayor :)


This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 7, 2017

Change mentions this issue: runtime: only call netpoll if netpollinited returns true

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.