Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: time management in OpenBSD port is wildly misconfigured #52475

Closed
mknyszek opened this issue Apr 21, 2022 · 5 comments
Closed

runtime: time management in OpenBSD port is wildly misconfigured #52475

mknyszek opened this issue Apr 21, 2022 · 5 comments
Labels
NeedsFix OS-OpenBSD

Comments

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 21, 2022

In the Go runtime, on most platforms, we generally assume that thread-level sleeps (and other time-based things) try to wake up in an amount of time that's somewhere close to what we pass. The one exception here, across the runtime, is Windows. For Windows, we query interrupt time because it's more efficient, and work around sleeps that rounded to a clock with coarse granularity.

OpenBSD has a coarse-granularity clock for sleeps, but the runtime doesn't act like it. (Thanks @prattmic for digging up these links!) That results in situations like this one:

gc 22 @0.471s 78%: 19+20+0.018 ms clock, 76+0.14/0.22/0+0.072 ms cpu, 4->4->1 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 23 @0.511s 79%: 19+0.17+0.015 ms clock, 77+0.094/0.097/0.10+0.060 ms cpu, 4->4->1 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 24 @0.531s 79%: 19+0.20+0.017 ms clock, 76+0.11/0.092/0.12+0.068 ms cpu, 4->4->1 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 25 @0.551s 80%: 19+0.23+0.016 ms clock, 76+0.086/0.079/0.10+0.064 ms cpu, 4->4->1 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 26 @0.571s 80%: 19+0.17+0.014 ms clock, 76+0.095/0.084/0.10+0.057 ms cpu, 4->4->1 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 27 @0.591s 81%: 19+0.17+0.004 ms clock, 77+0.084/0.091/0.10+0.016 ms cpu, 4->4->1 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 28 @0.611s 81%: 19+0.16+0.016 ms clock, 77+0.089/0.083/0.10+0.066 ms cpu, 4->4->1 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 29 @0.631s 79%: 19+20+0.003 ms clock, 77+0/0.16/2.1+0.014 ms cpu, 5->6->3 MB, 6 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 30 @0.671s 80%: 19+0.13+0.013 ms clock, 77+0.096/0.084/0.10+0.055 ms cpu, 5->5->2 MB, 6 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 31 @0.691s 78%: 19+19+0.015 ms clock, 77+0.060/0.30/0.17+0.060 ms cpu, 4->10->8 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 4 P

The above text is a GC trace for TestDecodeMemoryConsumption in the image/gif package. Note the very high GC stop-the-world times for the transition to the mark phase. This is time not spent in the pause, but actually trying to stop the world:

Screen Shot 2022-04-21 at 3 11 05 PM

Typically the runtime is waiting on one P, and it goes to sleep on a note for 100µs. If undisturbed, it takes between 10 and 20 ms to actually wake up. But that should be OK, because it's woken up early if the last P to stop actually stops. Except that we have thread-level sleeps in other parts of the runtime that have the same latency! For example, runqgrab might call usleep(3), but that too takes 10-20ms to return. This was a common one I discovered (and confirmed as a source of latency) and there are sure to be more.

The runtime needs to be overhauled for the OpenBSD port to treat sleeps as we would on Windows.

This is closely related to #14183.

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Apr 21, 2022

Turns out, the only sleep this matters for is that runqgrab sleep. Fixing that seems to resolve this issue.

@mknyszek mknyszek added the NeedsFix label Apr 21, 2022
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 21, 2022

Change https://go.dev/cl/401638 mentions this issue: runtime: yield instead of sleeping in runqgrab on OpenBSD

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 25, 2022

Could this also (perhaps partially) explain #50189, or are the sleeping paths for net separate from the ones involved here?

@mknyszek
Copy link
Contributor Author

@mknyszek mknyszek commented Apr 25, 2022

I think that's plausible. This happens so deep in the runtime that it can basically happen any time. And it's on the work stealing path, so anything that's scheduler-heavy will likely exacerbate it.

@gopherbot
Copy link

@gopherbot gopherbot commented May 18, 2022

Change https://go.dev/cl/407139 mentions this issue: runtime: use osyield in runqgrab on netbsd

gopherbot pushed a commit that referenced this issue May 19, 2022
NetBSD appears to have the same issue OpenBSD had in runqgrab. See
issue #52475 for more details.

For #35166.

Change-Id: Ie53192d26919b4717bc0d61cadd88d688ff38bb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/407139
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix OS-OpenBSD
Projects
Status: Done
Development

No branches or pull requests

3 participants