Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: race in netpollinit vs netpoll #22606

Closed
rsc opened this issue Nov 7, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@rsc
Copy link
Contributor

commented Nov 7, 2017

I received a private report of a crash in netpoll, running go 1.8.3 on an old linux kernel:

runtime: epollwait on fd -38 failed with 9
fatal error: epollwait failed

runtime stack:
runtime.throw(0x811258, 0x10)
        go/src/runtime/panic.go:596 +0x95
runtime.netpoll(0x14f49fe8c24db200, 0x0)
        go/src/runtime/netpoll_epoll.go:71 +0x13f
runtime.sysmon()
        go/src/runtime/proc.go:3820 +0x3a2
runtime.mstart1()
        go/src/runtime/proc.go:1179 +0x11e
runtime.mstart()
        go/src/runtime/proc.go:1149 +0x64

This was at program startup during the program's first attempted net.Dial.

It looks to me like there is a race between netpollinit and netpoll. Specifically, netpollinit does:

epfd = epollcreate1(_EPOLL_CLOEXEC)
if epfd >= 0 {
	return
}
epfd = epollcreate(1024)
if epfd >= 0 {
	closeonexec(epfd)
	return
}

If epollcreate1 fails (as I assume it might on an old kernel) then that first line is assigning a negative errno to epfd. Netpoll only checks that epfd != -1, which is why it passed -38 (-ENOSYS) to epollwait. Presumably netpollinit was about to overwrite the -38 with the result of epollcreate(1024), which was probably going to work, but it didn't get a chance.

More generally epfd is not itself an atomic variable. Instead there is an atomic netpollInited checked by func netpollinited. It looks like any call to netpoll should be guarded by an if netpollinited(). But the runtime has five calls and only three are guarded:

proc.go:1089:	gp := netpoll(false) // non-blocking
proc.go:2247:		if gp := netpoll(false); gp != nil { // non-blocking
proc.go:2387:		gp := netpoll(true) // block until new work is available
proc.go:2422:		if gp := netpoll(false); gp != nil {
proc.go:4242:			gp := netpoll(false) // non-blocking - returns list of goroutines

1089 and 4242 need guards.

This is probably not reproducible in a test, but we should probably fix it anyway.

/cc @aclements @ianlancetayor @dvyukov

@rsc rsc added this to the Go1.10 milestone Nov 7, 2017

@odeke-em

This comment has been minimized.

Copy link
Member

commented Nov 7, 2017

Just a random ping here @rsc to /cc @ianlancetaylor not @ianlancetayor :)

@gopherbot

This comment has been minimized.

Copy link

commented Nov 7, 2017

Change https://golang.org/cl/76319 mentions this issue: runtime: only call netpoll if netpollinited returns true

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.