Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: details of Linux SIGPROF delivery may cause very skewed profiles #14434

Closed
alk opened this issue Feb 20, 2016 · 7 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@alk
Copy link

alk commented Feb 20, 2016

I've initially spotted this in gperftools as this affects all users of SIGPROF. The problem is that SIGPROF is delivered to process, which translates to "any thread that isn't blocking SIGROF". Luckily for us, in practice it becomes "thread that is running now". But if there are several running threads something within kernel is making it choose one thread more often than another.

The test program at https://gist.github.com/alk/568c0465f4f208196d8b makes it very easy to reproduce. This program spawns two goroutines that do nothing but burn CPU.

When profiled with perf:

$ perf record ./goprof-test; perf report

I see correct 50/50 division of profiling ticks between two goroutines, since on multicore machine go runtime runs two goroutines on two OS threads which kernel run in parallel on two different cores.

When profiling with runtime/pprof:

$ CPUPROFILE=goprof-test-prof ./goprof-test ; pprof --web ./goprof-test ./goprof-test-prof

I see as much skew as 80/20.

This is exactly same behavior that I've seen with gperftools (and google3's profiler).

For most programs it apparently doesn't matter. But for programs that have distinct pools of threads doing very different work, this may cause real problems. Particularly, I've seen this (with gperftools) to cause very skewed profiles for Couchbase's memcached binary where they have small pool of network worker threads and another pool of IO worker threads.

In gperftools I've implemented workaround which creates per-thread timers that "tick" on corresponding thread's cpu time. But I don't think it's scalable enough to be made default (and another problem but arguably specific for gperftools is that all threads have to call ProfilerRegisterThread again). You can see my implementation at: https://github.com/gperftools/gperftools/blob/master/src/profile-handler.cc (parts that are under HAVE_LINUX_SIGEV_THREAD_ID defined)

I've seen this behavior on FreeBSD VMs too, but don't know about other OSes.

Maybe there is better way to avoid this skew or maybe we should just ask kernel folks to change SIGPROF signal delivery to avoid this skew. In any case this is bug worth tracking.

This is somewhat related, but distinct issue from #13841

@ianlancetaylor ianlancetaylor changed the title Details of Linux SIGPROF delivery may cause very skewed profiles runtime: details of Linux SIGPROF delivery may cause very skewed profiles Feb 21, 2016
@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Feb 21, 2016
@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 11, 2016
@rsc
Copy link
Contributor

rsc commented Oct 27, 2016

When I run this program on OS X I get exactly 50/50, which is nice. When I run it on Linux I do get much more skewed results, as you say.

The runtime is actually written as though setitimer were per-thread. I am not sure why it works as well as it does given that setitimer appears to be actually per-process. In any event if there is a new per-thread timer system call to use on Linux, it seems like that would be easy to slide in. Probably not for Go 1.8.

@aclements, you had figured out some other reason pprof profiles might be very skewed, right? I thought you filed an issue but I can't find it.

@rsc rsc modified the milestones: Go1.9Early, Go1.8 Oct 27, 2016
@rsc
Copy link
Contributor

rsc commented Oct 27, 2016

To answer my question to @aclements, I think I was thinking of #13405 (see comment "This is a result of ARM's poor timer..."). But that was about sleeps avoiding profiling. There are no sleeps in @alk's test program.

@rsc rsc changed the title runtime: details of Linux SIGPROF delivery may cause very skewed profiles runtime/pprof: details of Linux SIGPROF delivery may cause very skewed profiles Oct 27, 2016
@aclements
Copy link
Member

@aclements, you had figured out some other reason pprof profiles might be very skewed, right?

I don't recall anything outside of what's already in #13841.

In any event if there is a new per-thread timer system call to use on Linux, it seems like that would be easy to slide in.

According to the man pages, timer_create with CLOCK_THREAD_CPUTIME_ID has been around since Linux 2.6.12. It's even kind of sort of an optional part of POSIX.

@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.9Early May 3, 2017
@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 29, 2017
@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 May 18, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Jun 1, 2018
@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019
@felixge
Copy link
Contributor

felixge commented May 18, 2021

Since this issue is pretty old now, I decided to retest this against newer versions of Go. I tested go 1.13 to 1.16 on my MBP running docker for mac. What I found is that:

  • In Go 1.13 the bias seems to be consistently high (95/5 division 🤯 )
  • In Go 1.14 and above the bias seems to have gotten better, jumping between 80/20 to 50/50.

The detailed results can be seen here: https://gist.github.com/felixge/9858a2412b61853343263a19968b54c0

Take these results with a grain of salt given the environment they were produced in, but I think it's fair to say that this is still an issue. If I had to guess why 1.14+ is doing better, I'd say async preemption.

I don't know if my skills will be sufficient, but I might be able to make some time for the next release cycle to take a closer look to see if I'm able to help fix this issue.

@bcmills
Copy link
Member

bcmills commented Nov 19, 2021

Is this a duplicate of #35057?

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 19, 2021
@prattmic
Copy link
Member

The problem this issue describes (SIGPROF delivery to the "wrong" thread skews results) is different from #35057 (SIGPROF dropped in a high CPU usage process, deflating total apparent CPU time), but the fix (using per-thread CPU timers) is the same, so this should be addressed by the fix to #35057.

@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 19, 2021
@bcmills bcmills added this to the Go1.18 milestone Nov 19, 2021
@bcmills bcmills removed help wanted early-in-cycle A change that should be done early in the 3 month dev cycle. labels Nov 19, 2021
@bcmills bcmills closed this as completed Nov 19, 2021
@bcmills
Copy link
Member

bcmills commented Nov 19, 2021

Closing as fixed by CL 342054.

@bcmills bcmills reopened this Nov 19, 2021
@bcmills bcmills closed this as completed Nov 19, 2021
@golang golang locked and limited conversation to collaborators Nov 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests