New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: high contention in mheap_.lock causing low CPU utilization on allocation-intensive workload #23182

Open
bryanpkc opened this Issue Dec 19, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@bryanpkc
Contributor

bryanpkc commented Dec 19, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go 1.8.0

Does this issue reproduce with the latest release?

It is reproducible with 1.9.2. We haven't tested with tip.

What operating system and processor architecture are you using (go env)?

linux amd64

What did you do?

While tuning our application, we observed that on a system with 38 cores, we could not get more than 3000% CPU utilization. Investigation with go tool trace showed that all processors were always busy executing goroutines with no obvious idle times. Since the application is designed to be parallel, we suspected that the lower-than-expected CPU utilization is due to unforeseen contention and blocking, but block profiling did not show any significant user-level blocking, so we turned our attention to runtime-level contention.

We instrumented runtime.lock to count the number of times threads have to call futexsleep, and record which locks they were waiting for. The hottest locks and the number of times they were contended within a 1-minute run time were:

6174 mheap_.lock
1933 stackpoolmu
 296 timers.lock
 101 gcBitsArenas.lock

Sampling the call sites of runtime.lock reveals that mheap_.lock is contended most in runtime.(*mheap).alloc_m, mostly called from runtime.(*mcentral).grow. By hacking grow to increase the number of pages reserved each time mcentral needs more space for a size class, we were able to reduce the contention of mheap_.lock by a significant amount, and actually improved the average request latency of our application.

@bradfitz bradfitz changed the title from High contention in mheap_.lock causing low CPU utilization on allocation-intensive workload to runtime: high contention in mheap_.lock causing low CPU utilization on allocation-intensive workload Dec 19, 2017

@bradfitz bradfitz modified the milestones: Unreleased, Unplanned Dec 19, 2017

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 19, 2017

@aclements

This comment has been minimized.

Member

aclements commented Dec 22, 2017

Thanks for the analysis. Could you describe your allocation pattern a bit? Usually many spans stay in the mcentrals, so they don't have to grow that much outside of initialization and phase changes. But it sounds like the allocations in your application have very closely tied lifetimes, so entire spans are often freed together and returned to the heap.

By hacking grow to increase the number of pages reserved each time mcentral needs more space for a size class, we were able to reduce the contention of mheap_.lock by a significant amount, and actually improved the average request latency of our application.

Interesting. With your modification, is mcentral.grow just allocating a span that's some multiple of the usual class_to_allocnpages? Did you also modify mcentral.cacheSpan's call to deductSweepCredit accordingly?

/cc @RLH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment