New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: details of Linux SIGPROF delivery may cause very skewed profiles #14434

Open
alk opened this Issue Feb 20, 2016 · 3 comments

Comments

Projects
None yet
6 participants
@alk

alk commented Feb 20, 2016

I've initially spotted this in gperftools as this affects all users of SIGPROF. The problem is that SIGPROF is delivered to process, which translates to "any thread that isn't blocking SIGROF". Luckily for us, in practice it becomes "thread that is running now". But if there are several running threads something within kernel is making it choose one thread more often than another.

The test program at https://gist.github.com/alk/568c0465f4f208196d8b makes it very easy to reproduce. This program spawns two goroutines that do nothing but burn CPU.

When profiled with perf:

$ perf record ./goprof-test; perf report

I see correct 50/50 division of profiling ticks between two goroutines, since on multicore machine go runtime runs two goroutines on two OS threads which kernel run in parallel on two different cores.

When profiling with runtime/pprof:

$ CPUPROFILE=goprof-test-prof ./goprof-test ; pprof --web ./goprof-test ./goprof-test-prof

I see as much skew as 80/20.

This is exactly same behavior that I've seen with gperftools (and google3's profiler).

For most programs it apparently doesn't matter. But for programs that have distinct pools of threads doing very different work, this may cause real problems. Particularly, I've seen this (with gperftools) to cause very skewed profiles for Couchbase's memcached binary where they have small pool of network worker threads and another pool of IO worker threads.

In gperftools I've implemented workaround which creates per-thread timers that "tick" on corresponding thread's cpu time. But I don't think it's scalable enough to be made default (and another problem but arguably specific for gperftools is that all threads have to call ProfilerRegisterThread again). You can see my implementation at: https://github.com/gperftools/gperftools/blob/master/src/profile-handler.cc (parts that are under HAVE_LINUX_SIGEV_THREAD_ID defined)

I've seen this behavior on FreeBSD VMs too, but don't know about other OSes.

Maybe there is better way to avoid this skew or maybe we should just ask kernel folks to change SIGPROF signal delivery to avoid this skew. In any case this is bug worth tracking.

This is somewhat related, but distinct issue from #13841

@ianlancetaylor ianlancetaylor changed the title from Details of Linux SIGPROF delivery may cause very skewed profiles to runtime: details of Linux SIGPROF delivery may cause very skewed profiles Feb 21, 2016

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Feb 21, 2016

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016

@quentinmit quentinmit added the NeedsFix label Oct 11, 2016

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 27, 2016

When I run this program on OS X I get exactly 50/50, which is nice. When I run it on Linux I do get much more skewed results, as you say.

The runtime is actually written as though setitimer were per-thread. I am not sure why it works as well as it does given that setitimer appears to be actually per-process. In any event if there is a new per-thread timer system call to use on Linux, it seems like that would be easy to slide in. Probably not for Go 1.8.

@aclements, you had figured out some other reason pprof profiles might be very skewed, right? I thought you filed an issue but I can't find it.

@rsc rsc modified the milestones: Go1.9Early, Go1.8 Oct 27, 2016

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 27, 2016

To answer my question to @aclements, I think I was thinking of #13405 (see comment "This is a result of ARM's poor timer..."). But that was about sleeps avoiding profiling. There are no sleeps in @alk's test program.

@rsc rsc changed the title from runtime: details of Linux SIGPROF delivery may cause very skewed profiles to runtime/pprof: details of Linux SIGPROF delivery may cause very skewed profiles Oct 27, 2016

@aclements

This comment has been minimized.

Member

aclements commented Oct 27, 2016

@aclements, you had figured out some other reason pprof profiles might be very skewed, right?

I don't recall anything outside of what's already in #13841.

In any event if there is a new per-thread timer system call to use on Linux, it seems like that would be easy to slide in.

According to the man pages, timer_create with CLOCK_THREAD_CPUTIME_ID has been around since Linux 2.6.12. It's even kind of sort of an optional part of POSIX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment