New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openmp functions have incorrect counts #532

Open
alk opened this Issue Aug 23, 2015 · 5 comments

Comments

Projects
None yet
1 participant
@alk
Contributor

alk commented Aug 23, 2015

Originally reported on Google Code with ID 529

openmp functions have incorrect counts when using the cpu profiler. This seems to have
been a long standing issue: http://openmp.org/forum/viewtopic.php?f=3&t=1213 . Nearest
I can tell, this is because they are being improperly identified.

As an example:
$ cat temp.cpp
int main() {
    #pragma omp parallel
    {
        int x = 0;
    }
    return 0;
}
$ g++ -fopenmp -g -lprofiler temp.cpp
$ CPUPROFILE=out.prof ./a.out 
PROFILE: interrupts/evictions/bytes = 27/1/384
$ pprof --text a.out out.prof 
Using local file a.out.
Using local file out.prof.
Removing _L_unlock_16 from all stack traces.
Total: 27 samples
      27 100.0% 100.0%       27 100.0% omp_get_num_procs
       0   0.0% 100.0%       27 100.0% GOMP_taskwait
       0   0.0% 100.0%       27 100.0% __clone
       0   0.0% 100.0%       27 100.0% start_thread

What is the expected output? What do you see instead?
I do not expect omp_get_num_procs to dominate the traces.

What version of the product are you using? On what operating system?
$ md5sum gperftools-2.0.tar.gz 
13f6e8961bc6a26749783137995786b6  gperftools-2.0.tar.gz
$ uname -a 
Linux ghc33.ghc.andrew.cmu.edu 2.6.32-279.22.1.el6.x86_64 #1 SMP Sun Jan 13 09:21:40
EST 2013 x86_64 x86_64 x86_64 GNU/Linux

Please provide any additional information below.
If I look at GOMP_taskwait (the functions that calls omp_get_num_procs), I see the
following output in the assembly that leads me to believe that the functions are not
getting properly identified:

$ pprof --disas=GOMP_taskwait a.out out.prof 
ROUTINE ====================== GOMP_taskwait
     0     27 samples (flat, cumulative) 100.0% of total
... <snip> ...
     .     16        84ef: callq  9ca0 <omp_get_num_procs+0x540>
     .      .        84f4: nopl   0x0(%rax)
     .      .        84f8: mov    %fs:0x10(%rbx),%r13
     .      .        84fd: mov    %r12,%rdi
     .      .        8500: callq  *%rbp
     .      .        8502: lea    0x80(%r13),%rdi
     .     11        8509: callq  9b40 <omp_get_num_procs+0x3e0>
     .      .        850e: mov    %r14,%rdi
     .      .        8511: callq  9ca0 <omp_get_num_procs+0x540>
... <snip> ...

Also, I have to confess that I'm compiling with -static-libstdc++, but I don't have
an easy way to avoid that at the moment and I don't believe it is related.  

Reported by awreece on 2013-05-11 02:38:20

@alk

This comment has been minimized.

Show comment
Hide comment
@alk

alk Aug 23, 2015

Contributor
It looks like I spoke too soon. It appears that the library doesn't have symbols, which
would explain why the cpu profiler was unable to find them.

$ ldd a.out | grep gomp
    libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f674c22f000)
$ nm /usr/lib64/libgomp.so.1
nm: /usr/lib64/libgomp.so.1: no symbols
$ objdump -d /usr/lib64/libgomp.so.1 
... <snip> ...
   8270:       e9 fb 16 00 00          jmpq   9970 <omp_get_num_procs+0x210>
    8275:       80 7f 5c 00             cmpb   $0x0,0x5c(%rdi)
    8279:       48 c7 47 08 00 00 00    movq   $0x0,0x8(%rdi)
    8280:       00 
    8281:       0f 84 f0 fe ff ff       je     8177 <GOMP_taskwait+0x3d7>
    8287:       48 83 c7 60             add    $0x60,%rdi
    828b:       44 89 e8                mov    %r13d,%eax
    828e:       f0 44 0f b1 3f          lock cmpxchg %r15d,(%rdi)
    8293:       0f 84 de fe ff ff       je     8177 <GOMP_taskwait+0x3d7>
    8299:       e8 b2 15 00 00          callq  9850 <omp_get_num_procs+0xf0>
    829e:       e9 d4 fe ff ff          jmpq   8177 <GOMP_taskwait+0x3d7>
... <snip> ...

I'd probably hope that if this were encountered, some sort of 'failure to identify
function name' or the function address would be emitted, but the lack of symbols does
not appear to the fault of the profiler.

Reported by awreece on 2013-05-11 03:11:05

Contributor

alk commented Aug 23, 2015

It looks like I spoke too soon. It appears that the library doesn't have symbols, which
would explain why the cpu profiler was unable to find them.

$ ldd a.out | grep gomp
    libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f674c22f000)
$ nm /usr/lib64/libgomp.so.1
nm: /usr/lib64/libgomp.so.1: no symbols
$ objdump -d /usr/lib64/libgomp.so.1 
... <snip> ...
   8270:       e9 fb 16 00 00          jmpq   9970 <omp_get_num_procs+0x210>
    8275:       80 7f 5c 00             cmpb   $0x0,0x5c(%rdi)
    8279:       48 c7 47 08 00 00 00    movq   $0x0,0x8(%rdi)
    8280:       00 
    8281:       0f 84 f0 fe ff ff       je     8177 <GOMP_taskwait+0x3d7>
    8287:       48 83 c7 60             add    $0x60,%rdi
    828b:       44 89 e8                mov    %r13d,%eax
    828e:       f0 44 0f b1 3f          lock cmpxchg %r15d,(%rdi)
    8293:       0f 84 de fe ff ff       je     8177 <GOMP_taskwait+0x3d7>
    8299:       e8 b2 15 00 00          callq  9850 <omp_get_num_procs+0xf0>
    829e:       e9 d4 fe ff ff          jmpq   8177 <GOMP_taskwait+0x3d7>
... <snip> ...

I'd probably hope that if this were encountered, some sort of 'failure to identify
function name' or the function address would be emitted, but the lack of symbols does
not appear to the fault of the profiler.

Reported by awreece on 2013-05-11 03:11:05

@alk

This comment has been minimized.

Show comment
Hide comment
@alk

alk Aug 23, 2015

Contributor
Your test program appears too short to identify any bottlenecks.

Can you please retry something longer ?

Reported by alkondratenko on 2013-07-06 20:20:41

Contributor

alk commented Aug 23, 2015

Your test program appears too short to identify any bottlenecks.

Can you please retry something longer ?

Reported by alkondratenko on 2013-07-06 20:20:41

@alk

This comment has been minimized.

Show comment
Hide comment
@alk

alk Aug 23, 2015

Contributor
The test program is intentionally short to demonstrate the issue and has no
bottlenecks.

Please see my previous comment: the issue is that the library doesn't have
symbols.

Reported by awreece on 2013-07-06 21:34:16

Contributor

alk commented Aug 23, 2015

The test program is intentionally short to demonstrate the issue and has no
bottlenecks.

Please see my previous comment: the issue is that the library doesn't have
symbols.

Reported by awreece on 2013-07-06 21:34:16

@alk

This comment has been minimized.

Show comment
Hide comment
@alk

alk Aug 23, 2015

Contributor
Not at all. nm is not supposed to work on shared libraries. And your test is too short
to provide any relevant profile. Please consider reading how sampling profiling works.

Running somebody's program at link you provided gives me this:


    2944  97.8%  97.8%     2944  97.8% CALC_VALS._omp_fn.0
      64   2.1% 100.0%       64   2.1% omp_get_num_procs
       1   0.0% 100.0%        1   0.0% INITIALISE

Which clearly shows that majority of time is spent in user program. omp_get_num_procs
indeed seems a bit off. But it appears that there's not much we can do about that in
gperftools.

Reported by alkondratenko on 2013-07-06 23:10:49

Contributor

alk commented Aug 23, 2015

Not at all. nm is not supposed to work on shared libraries. And your test is too short
to provide any relevant profile. Please consider reading how sampling profiling works.

Running somebody's program at link you provided gives me this:


    2944  97.8%  97.8%     2944  97.8% CALC_VALS._omp_fn.0
      64   2.1% 100.0%       64   2.1% omp_get_num_procs
       1   0.0% 100.0%        1   0.0% INITIALISE

Which clearly shows that majority of time is spent in user program. omp_get_num_procs
indeed seems a bit off. But it appears that there's not much we can do about that in
gperftools.

Reported by alkondratenko on 2013-07-06 23:10:49

@alk

This comment has been minimized.

Show comment
Hide comment
@alk

alk Aug 23, 2015

Contributor
omp_get_num_procs isn't a bit off, it *isn't* used by the program. Other functions are
being misattributed to it:

$ pprof --disas=GOMP_taskwait solver out.prof 
ROUTINE ====================== GOMP_taskwait
     1   1330 samples (flat, cumulative) 96.2% of total
... <snip> ...
     .     16        84ef: callq  9ca0 <omp_get_num_procs+0x540>
     .      .        84f4: nopl   0x0(%rax)
     .      .        84f8: mov    %fs:0x10(%rbx),%r13
     .      .        84fd: mov    %r12,%rdi
     .    695        8500: callq  *%rbp
     .      .        8502: lea    0x80(%r13),%rdi
     .    391        8509: callq  9b40 <omp_get_num_procs+0x3e0>
     .      .        850e: mov    %r14,%rdi
     .    156        8511: callq  9ca0 <omp_get_num_procs+0x540>
... <snip> ...

Also, gdb can't get the function name for omp either: https://gist.github.com/awreece/5941661
(it has ?? for the name). I could try to come up with a better example, but I think
the snippet above clearly demonstrates the problem. Sorry about not using the right
flag to nm, nm -D clearly shows some symbols. Its no longer clear to me the problem
is "it doesn't export symbols" (I'll have to update the blog post later).

For a longer explanation: http://codearcana.com/posts/2013/05/10/why-is-omp_get_num_procs-so-slow.html

Reported by awreece on 2013-07-07 00:05:04

Contributor

alk commented Aug 23, 2015

omp_get_num_procs isn't a bit off, it *isn't* used by the program. Other functions are
being misattributed to it:

$ pprof --disas=GOMP_taskwait solver out.prof 
ROUTINE ====================== GOMP_taskwait
     1   1330 samples (flat, cumulative) 96.2% of total
... <snip> ...
     .     16        84ef: callq  9ca0 <omp_get_num_procs+0x540>
     .      .        84f4: nopl   0x0(%rax)
     .      .        84f8: mov    %fs:0x10(%rbx),%r13
     .      .        84fd: mov    %r12,%rdi
     .    695        8500: callq  *%rbp
     .      .        8502: lea    0x80(%r13),%rdi
     .    391        8509: callq  9b40 <omp_get_num_procs+0x3e0>
     .      .        850e: mov    %r14,%rdi
     .    156        8511: callq  9ca0 <omp_get_num_procs+0x540>
... <snip> ...

Also, gdb can't get the function name for omp either: https://gist.github.com/awreece/5941661
(it has ?? for the name). I could try to come up with a better example, but I think
the snippet above clearly demonstrates the problem. Sorry about not using the right
flag to nm, nm -D clearly shows some symbols. Its no longer clear to me the problem
is "it doesn't export symbols" (I'll have to update the blog post later).

For a longer explanation: http://codearcana.com/posts/2013/05/10/why-is-omp_get_num_procs-so-slow.html

Reported by awreece on 2013-07-07 00:05:04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment