runtime: significant performance improvement on single core machines #15395
1. What version of Go are you using (
2. What operating system and processor architecture are you using (
Hosts are running stock Ubuntu 15.10:
3. What did you do?
Looking into performance of running containers on Docker, we realized there's a significant difference when running on a single core machine versus a multi core machine. The result seem independent of the
We cannot attribute that difference to Go with a 100% certainty, but we could use your help explaining some of the profiling results we've obtained so far.
Profiling / trace
We instrumented the code to find out where that difference materialized, and what comes out is that it is quite evenly distributed. However, syscalls are consistently taking much more time on the multi core machine (as made obvious with slower syscalls such as
Single core (link to the trace file)
Multi core with
The two hosts are virtual machines running on the same Digital Ocean zone, with the exact same CPU (
Please let us know if there's any more information we can provide, or if you need us to test with different builds of Go. Thanks for any help you can provide!
The text was updated successfully, but these errors were encountered:
@icecrime, I suspect this is not a problem with Go. According to the syscall blocking profile in the multicore trace you attached to your original message, you're spending 50ms blocked in the unmount syscall (from github.com/docker/docker/daemon/graphdriver/aufs.Unmount). There are plenty of other syscalls in that trace that you don't spend appreciable time in, suggesting that the performance problem is in unmount (possibly AUFS unmount) itself.
It would be worth trying with Go 1.7, though if it is an issue in the kernel, that won't make a difference. As an experiment, you could also try invoking just that unmount syscall from, say, C, and time that. If it also takes a long time in C, then we know it's not a Go issue. Running this under "perf record" may also prove enlightening, though if the delay comes from blocking not much will show up.
@aclements Thanks for your reply. Unmount here was just one example, as it is indeed part of the slower syscalls. However, we measured several times, and Unmount alone clearly isn't responsible for the difference.
I can remember we instrumented at multiple places in that code path, and no particular segment of that code path could explain the timings difference between single core and multi core machines. This is why we ended up thinking it might be a runtime issue.