Right now lock contention is only possible to debug with pprof using the block profile. This is very useful once contention has been identified as the issue, but since it has to be turned on manullay it doesn't help in identifying that contention is an issue. Exporting the cumulative wait time via runtime/metrics would allow continouous monitoring of contention and help in debugging Go programs.
The text was updated successfully, but these errors were encountered:
Interesting... Looks like we already measure time spent blocked for starvation purposes anyway. I suppose this wouldn't be too hard to expose. Is cumulative wait time the right metric? It's useful for comparison, but it doesn't really tell you much on its own.
An approximate distribution of latencies might be more useful for this, because you can correlate that with e.g. request latency, but it's slightly more expensive and it requires a bit more post-processing. Its other downside is that it's approximate -- I'm not sure if there's some special use case enabled by having a precise cumulative wait time.