Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: correct attribution of internal mutex contention #66999

Closed
rhysh opened this issue Apr 23, 2024 · 9 comments
Closed

runtime: correct attribution of internal mutex contention #66999

rhysh opened this issue Apr 23, 2024 · 9 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Apr 23, 2024

As of Go 1.22, the mutex profile measures contention on internal mutex values, but attributes the contention to the function runtime._LostContendedRuntimeLock. This should be improved.

When running with GODEBUG=runtimecontentionstacks=1, the mutex profile attributes to the caller of runtime.unlock all the delay that the M encountered during its preceding runtime.lock call to obtain that lock. This is incorrect: the delay attributed to a runtime.unlock call should instead correspond to the total time that the other Ms were unable to proceed because this M was holding the lock.

I have the start of a plan to resolve this, so I'm seeking feedback from @mknyszek and @prattmic (or @golang/runtime or others) on whether I'm on the right track — both in the start of the plan, and on whether I'm working to the right set of constraints. Thank you!


The mutex profile for sync.Mutex uses fields on the sudog to track how many Gs have been waiting and for about how long. The sudogs form a linked list of waiters, and are dequeued under the control of the runtime. When waking a G, the caller of sync.Mutex.Unlock is able to claim responsibility for the delay that it specifically caused, and update the remaining waiters' start time so the next Unlocker can claim the right amount too. We'll need similar information for the Ms that hold runtime-internal locks.

I'd like to say a few assumed constraints out loud:

  • On Linux, futex(&l.key, FUTEX_WAKE, 1) can wake any M that was waiting on l.key.
  • We cannot change the dynamics of locks on Linux (such as by putting the runtime in charge of picking an M to wake, as it is with the semaphore-based implementation).
  • We cannot make interactions with an especially contended lock become O(N) with the number of waiters (any more than it is today, such as from the CPU's cache coherency work).
  • We cannot change the size of a runtime.mutex, it needs to remain uintptr.
  • An M can hold multiple contended locks at the same time.
  • It's OK to grow the M struct by several words.
  • It's OK to approximate the total wait time (as we do with sync.Mutex).
  • The M struct has some alignment (8 bytes?), so we can use a few of the least significant bits of an muintptr as flags (as seen in lock_sema.go).
  • An M can be waiting for only a single lock at a time.
  • There's no appetite for a new sudom struct, to represent waiting Ms as the sudog represents waiting Gs.

For OSes where we use lock_sema.go, an M that needs to block on a runtime.mutex first pushes itself onto the head of a linked list (runtime.mutex.key, linked via runtime.m.nextwaitm), and then sleeps on its own private semaphore. When the current holder of the lock is done, it pops an M off of the list and wakes that specific M. We could add fields to the M for the number of Ms after it on the list, for the time it went to sleep, and for the time that the tail M went to sleep (or a pointer to the tail M). I think that would be sufficient information to match the approximation we do for sync.Mutex via sudog, and would need only a simple set of invariants to update when pushing and popping Ms to/from this list.


For OSes such as Linux where we use lock_futex.go, the kernel-controlled wakeup order means we'll need something more.

  • The contents of runtime.mutex.key will point to the waiting M at the head of the linked list (as in lock_sema.go).
  • The list must be doubly-linked so that an M in the middle (if it's the one the kernel chooses to wake) can remove itself.
  • Any number of Ms may be adding themselves to the list concurrently (in the top half of runtime.lock).
  • But only one M will be reading/editing wait start timestamps and element counts (in the top half of runtime.unlock) OR removing itself from the list (in the bottom half of runtime.lock) at a time.
  • Also, if there are waiting Ms the mutex must always reflect that fact, which means removing lock_futex.go's initial atomic.Xchg speculative grab.

This uses space on the waiting Ms to store the necessary information; they're not using it for anything else while they're blocked.

I'll prepare a CL to implement this plan, but if I've gotten some assumptions wrong or it's otherwise clear that this won't work, it may be easier to spot and to discuss it here rather than in the details of code review. Thanks!

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 23, 2024
@rhysh rhysh self-assigned this Apr 24, 2024
@rhysh rhysh added this to the Go1.23 milestone Apr 24, 2024
@prattmic
Copy link
Member

At a glance, this plan seems reasonable to me.

One note on assumed constraints:

We cannot change the size of a runtime.mutex, it needs to remain uintptr.

While it is nice to avoid increasing the size of a mutex, I don't think this needs to be a hard requirement. Taking a look at where we use mutex, it looks like timers and channels probably have the most objects and would thus have the biggest impact. But neither of those structures is tiny, so I don't think it would be the end of the world if they get bigger.

@gopherbot
Copy link

Change https://go.dev/cl/585635 mentions this issue: runtime: prepare for extensions to waiting M list

@gopherbot
Copy link

Change https://go.dev/cl/585639 mentions this issue: runtime: remove GODEBUG=runtimecontentionstacks

@gopherbot
Copy link

Change https://go.dev/cl/585638 mentions this issue: runtime: profile mutex contention during unlock

@gopherbot
Copy link

Change https://go.dev/cl/585636 mentions this issue: runtime: use semaphore structure for futex locks

@gopherbot
Copy link

Change https://go.dev/cl/585637 mentions this issue: runtime: double-link list of waiting Ms

@gopherbot
Copy link

Change https://go.dev/cl/586237 mentions this issue: runtime: improve runtime-internal mutex profile tests

@dmitshur dmitshur added the NeedsFix The path to resolution is known, but the work has not been done. label May 21, 2024
gopherbot pushed a commit that referenced this issue May 21, 2024
Move the nextwaitm field into a small struct, in preparation for
additional metadata to track how long Ms need to wait for locks.

For #66999

Change-Id: Ib40e43c15cde22f7e35922641107973d99439ecd
Reviewed-on: https://go-review.googlesource.com/c/go/+/585635
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
gopherbot pushed a commit that referenced this issue May 21, 2024
Prepare the futex-based implementation of lock2 to maintain a list of
waiting Ms. Beyond storing an muintptr in the mutex's key field, we now
must never overwrite that field (even for a moment) without taking its
current value into account.

The semaphore-based implementation of lock2 already has that behavior.
Reuse that structure.

For #66999

Change-Id: I23b6f6bacb276fe33c6aed5c0571161a7e71fe6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/585636
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue May 21, 2024
When an M unlocks a contended mutex, it needs to consult a list of the
Ms that had to wait during its critical section. This allows the M to
attribute the appropriate amount of blame to the unlocking call stack.

Mirroring the implementation for users' sync.Mutex contention (via
sudog), we can (in a future commit) use the time that the head and tail
of the wait list started waiting, and the number of waiters, to estimate
the sum of the Ms' delays.

When an M acquires the mutex, it needs to remove itself from the list of
waiters. Since the futex-based lock implementation leaves the OS in
control of the order of M wakeups, we need to be prepared for quickly
(constant time) removing any M from the list.

First, have each M add itself to a singly-linked wait list when it finds
that its lock call will need to sleep. This case is safe against
live-lock, since any delay to one M adding itself to the list would be
due to another M making durable progress.

Second, have the M that holds the lock (either right before releasing,
or right after acquiring) update metadata on the list of waiting Ms to
double-link the list and maintain a tail pointer and waiter count. That
work is amortized-constant: we'll avoid contended locks becoming
proportionally more contended and undergoing performance collapse.

For #66999

Change-Id: If75cdea915afb59ccec47294e0b52c466aac8736
Reviewed-on: https://go-review.googlesource.com/c/go/+/585637
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
gopherbot pushed a commit that referenced this issue May 21, 2024
Go 1.22 promised to remove the setting in a future release once the
semantics of runtime-internal lock contention matched that of
sync.Mutex. That work is done, remove the setting.

For #66999

Change-Id: I3c4894148385adf2756d8754e44d7317305ad758
Reviewed-on: https://go-review.googlesource.com/c/go/+/585639
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue May 21, 2024
Have the test use the same clock (cputicks) as the profiler, and use the
test's own measurements as hard bounds on the magnitude to expect in the
profile.

Compare the depiction of two users of the same lock: one where the
critical section is fast, one where it is slow. Confirm that the profile
shows the slow critical section as a large source of delay (with #66999
fixed), rather than showing the fast critical section as a large
recipient of delay.

For #64253
For #66999

Change-Id: I784c8beedc39de564dc8cee42060a5d5ce55de39
Reviewed-on: https://go-review.googlesource.com/c/go/+/586237
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/587175 mentions this issue: runtime/pprof: shorten TestGoroutineSwitch

@gopherbot
Copy link

Change https://go.dev/cl/586796 mentions this issue: runtime: split mutex profile clocks

gopherbot pushed a commit that referenced this issue May 22, 2024
Mutex contention measurements work with two clocks: nanotime for use in
runtime/metrics, and cputicks for the runtime/pprof profile. They're
subject to different sampling rates: the runtime/metrics view is always
enabled, but the profile is adjustable and is turned off by default.
They have different levels of overhead: it can take as little as one
instruction to read cputicks while nanotime calls are more elaborate
(although some platforms implement cputicks as a nanotime call). The use
of the timestamps is also different: the profile's view needs to attach
the delay in some Ms' lock2 calls to another M's unlock2 call stack, but
the metric's view is only an int64.

Treat them differently. Don't bother threading the nanotime clock
through to the unlock2 call, measure and report it directly within
lock2. Sample nanotime at a constant gTrackingPeriod.

Don't consult any clocks unless the mutex is actually contended.

Continue liberal use of cputicks for now.

For #66999

Change-Id: I1c2085ea0e695bfa90c30fadedc99ced9eb1f69e
Reviewed-on: https://go-review.googlesource.com/c/go/+/586796
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

No branches or pull requests

4 participants