Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: spinning CPU on NetBSD calling compat_60__lwp_park #22968

Closed
bradfitz opened this issue Dec 1, 2017 · 15 comments

Comments

Projects
None yet
4 participants
@bradfitz
Copy link
Member

commented Dec 1, 2017

The NetBSD builder's buildlet process boots up immediately spinning, using 100% CPU.

$ gomote create netbsd-amd64-8branch
$ gomote ssh user-bradfitz-netbsd-amd64-8branch-0
...
# dtruss -n buildlet 2>&1 | head -500 > x
# cat x
PID/LWP    SYSCALL(args)                 = return
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
   505/2:  compat_60__lwp_park(0xC42003DE58, 0x0, 0xC42002E398)          = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE60, 0x0)                 = 0 0
   505/2:  clock_gettime(0x3, 0xC42003DE00, 0x0)                 = 0 0
....

This seemed to start when I rebuilt the NetBSD-amd64 8.0-BETA VM image yesterday, but I'm not sure.

Filing while I investigate.

/cc @ianlancetaylor @aclements @bsiegert @krytarowski @zoulasc

@bradfitz bradfitz added this to the Go1.10 milestone Dec 1, 2017

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Dec 1, 2017

Okay, if I build the buildlet with Go 1.9, it doesn't spin.

So this appears to be some regression at Go master.

@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2017

While there, it looks like golang uses clock_gettime(), while it should use newer syscall __clock_gettime50().

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2017

Update:

Even with the Go 1.9-built buildlet, we're still getting a column of red failing builds for netbsd because now the go_bootstrap binary is spinning, taking 100% CPU while the rest of the build tries to complete & pass its tests. But many tests fail while competing with a 100% CPU spinner. The spin is the same:

buildlet-netbsd-amd64-8branch-rnc9c727e# dtruss -n go_bootstrap 2>&1 | head -50 > x
buildlet-netbsd-amd64-8branch-rnc9c727e# cat x
PID/LWP    SYSCALL(args)                 = return
  2140/2:  clock_gettime(0x3, 0xC42003FE00, 0x0)                 = 0 0
  2140/2:  compat_60__lwp_park(0xC42003FE58, 0x0, 0xC420030398)          = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE60, 0x0)                 = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE00, 0x0)                 = 0 0
  2140/2:  compat_60__lwp_park(0xC42003FE58, 0x0, 0xC420030398)          = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE60, 0x0)                 = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE00, 0x0)                 = 0 0
  2140/2:  compat_60__lwp_park(0xC42003FE58, 0x0, 0xC420030398)          = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE60, 0x0)                 = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE00, 0x0)                 = 0 0
  2140/2:  compat_60__lwp_park(0xC42003FE58, 0x0, 0xC420030398)          = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE60, 0x0)                 = 0 0
  2140/2:  clock_gettime(0x3, 0xC42003FE00, 0x0)                 = 0 0
  2140/2:  compat_60__lwp_park(0xC42003FE58, 0x0, 0xC420030398)          = 0 0
...

Because this all worked the other day before I rebuilt the NetBSD image, I'm curious if any changes in NetBSD-8 happened recently that make Go unhappy.

@gopherbot

This comment has been minimized.

Copy link

commented Dec 2, 2017

Change https://golang.org/cl/81637 mentions this issue: dashboard: disable NetBSD builders

gopherbot pushed a commit to golang/build that referenced this issue Dec 2, 2017

dashboard: disable NetBSD builders
Until issues are fixed. No point in wasting resources on column of
red.

Updates golang/go#22968

Change-Id: I2de5a4d8aca6fcbdab2f43d87818c7d8f333f8bc
Reviewed-on: https://go-review.googlesource.com/81637
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

@bradfitz bradfitz modified the milestones: Go1.10, Unplanned Dec 2, 2017

@zoulasc

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2017

I don't know if/what changed on 8. I can reproduce this on HEAD in one of the regexp tests and with i386 binaries on amd64 in many tests. I am investigating.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2017

Thanks, @zoulasc.

@zoulasc

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2017

Here's a patch to make go use the new lwp_park call on NetBSD (untested on arm) and to use CLOCK_MONOTONIC for nanotime(). This also saves a syscall each time we need to sleep for a semaphore...

https://www.netbsd.org/~christos/go-lwp-park-clock-monotonic.diff

With this fix and a tiny kernel patch (kern_lwp.c:1.191) go on amd64 passes all the tests.

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Dec 3, 2017

Thanks. Is there a pullup to the NetBSD-8 branch I should wait for for http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_lwp.c.diff?r1=1.190&r2=1.191&only_with_tag=MAIN ?

Btw, the nanotime change already happened in 2065685.

So were the processes spinning because lwp_park was using the wrong clock and the clock was going backwards sometimes, or ... ?

@zoulasc

This comment has been minimized.

Copy link
Contributor

commented Dec 3, 2017

206585 changes amd64/i386 but not arm. There is a different bug somewhere, perhaps in the compat32 code since amd64 now works. I don't think that processes were spinning because the clock went backwards, because before the changes both nanotime and lwp_park were using CLOCK_REALTIME. This change makes nanotime() to use CLOCK_MONOTONIC, but the lwp_park call is not using absolute time anymore, so it is not calling nanotime(), which is an improvement anyway.

The kern_lwp change fixes a different issue where we ended up having processes stuck exiting. I will ask for a pullup shortly.

@gopherbot

This comment has been minimized.

Copy link

commented Dec 4, 2017

Change https://golang.org/cl/81715 mentions this issue: runtime: make NetBSD lwp_park use monotonic time

@gopherbot gopherbot closed this in 66fcf45 Dec 4, 2017

@gopherbot

This comment has been minimized.

Copy link

commented Dec 4, 2017

Change https://golang.org/cl/81735 mentions this issue: dashboard: re-enable NetBSD/amd64 builder

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Dec 4, 2017

@zoulasc, thank you so much. The amd64 builder is back & happy. So the spin was in fact my fault when I submitted the incomplete 2065685. I had missed semasleep's use of nanotime.

I verified that 386 works too on my local VM. We won't have 386 builders until pullup 413 lands in the netbsd-8 branch (status: https://releng.netbsd.org/cgi-bin/req-8.cgi) so we can have virtio scsi/rnd on GCE, where we run most of our builders.

It's possible I might find time to get a NetBSD Raspberry Pi builder running, but it's a low priority. I filed #22979 for that.

The 386 binaries still spin (and thus don't pass all.bash) when running on an amd64 kernel, though. If you're interested, that might be another bug to hunt.

gopherbot pushed a commit to golang/build that referenced this issue Dec 4, 2017

dashboard: re-enable NetBSD/amd64 builder
Updates golang/go#22968
Updates golang/go#20852

Change-Id: Ida2bc7213d418735e897d0bae8e5fba9ae18fd6c
Reviewed-on: https://go-review.googlesource.com/81735
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@krytarowski

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2017

We have users of golang on NetBSD/evbarm so it is definitely worth the trouble.

@zoulasc

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2017

Thanks @bradfitz for keeping NetBSD/go going. I will keep looking for both the netbsd32 emulation bug and the 'message too long' issue on i386 -- I think that there is still a minor issue with i386 that needs to be fixed. The non-traditional syscall use by go makes it a good fuzzer :-)

@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Dec 4, 2017

@zoulasc, higher priority than the netbsd32 emulation is #22981 if you have time.

@golang golang locked and limited conversation to collaborators Dec 4, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.