-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
The pprof mutex profile was meant to match the Google C++ (now Abseil) mutex profiler, originally designed and implemented by Mike Burrows. When we worked on the Go version, pjw and I missed that C++ counts the time each thread is blocked, even if multiple threads are blocked on a mutex. That is, if 100 threads are blocked on the same mutex for the same 10ms, that still counts as 1000ms of contention in C++. In Go, to date, /debug/pprof/mutex has counted that as only 10ms of contention. If 100 goroutines are blocked on one mutex and only 1 goroutine is blocked on another mutex, we probably do want to see the first mutex as being more contended, so the Abseil approach is the more useful one.
I propose we change the pprof mutex profile in Go 1.22 to scale contention by the number of blocked goroutines, matching Abseil's more useful behavior.
It is worth noting that if you have 100 goroutines contending a mutex and what happens is one of them holds it for 10ms at each acquisition while the others immediately drop it, the round-robin mutex granting combined with charging the Unlock stack that unblocks a goroutine will mean that only 1 of each 100 samples will be attributed to the 10ms goroutine. The others will be attributed to the previous goroutine in the chain. It is important to interpret such profiles as providing useful information about which locks are contended but not always providing clear-cut evidence about the slow Unlock stack. (See my next comment below for a way to avoid this problem while still scaling reported mutex contention by number of blocked goroutines.)