-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: pprof tests failing with "missing cpuHog in pprof output" #26369
Comments
Looks like PC values that are in C code are somehow being misattributed to Go code. Perhaps the C symbols are simply missing? But I don't understand why it would matter whether this is running in a VM or not. Is something about the VM environment causing the C symbol table to be stripped? |
I have bit missed the version of Fedora I have been running. I have thought it has been F27 in the VM same as on my host, but it has been rawhide/F29. Sorry for the confusion. It seems that this started happening with latest binutils in F29, that is 2.30.90(AFAIK pre-release 2.31). Still digging deeper. |
Poking the runtime/testdata/testprogcgo binary with objectdmp I haven noted any obvious issue with it. Only notable difference that stands out to me is that with the latest binutils there are more segments otherwise i don't see any notable difference(C symbols are there). Could it be issue in (revers) function name lookup?
With 2.31:
@ianlancetaylor This already exceeds my current knowledge (of pprof), do you have any pointers in which direction to look? |
Having multiple Is it still true that this problem only arises when using a VM? |
For the record, I tried using GNU binutils tip on my Debian system, and everything passed. |
No it happens every time I use the latest(2.31/2.30.90 in Fedora) binutils. I will try to bisect when it started binutils version wise. |
Once more confirmed that this started with pre-release 2.31 binutils. |
I'm sorry, I have no idea at all. I guess the next step would be to try to find out whether the problem is in profile created by the Go program or whether it's in the symbolization done by pprof. I would tentatively guess the latter because it seems more likely to be affected by a binutils change, but I don't actually understand how that could be. |
To add another data point, I'm seeing this when running
Binutils version is |
Just a note that I still can't recreate this. It would help a great deal if someone who can recreate the problem can find some pointer to a possible problem. If
That Thanks. |
Seems that 100% of the exec time is assigned to main.init.16
Also only 20ms of samples? cpuHog is not there at all. |
Thanks. I also see 20ms of samples. The problem is that pprof is reporting Look at the output of |
|
Thanks. My only guess is that somehow pprof is unable to read the C symbol table, although that doesn't make any sense. Could you attach the testprogcgo and /tmp/profNNNN files here? |
Binary and prof archive here: issue-26369.zip |
Not only C symbol table. See the location list in the 'raw' command output. All symbol lookups are done incorrectly. (pprof) raw PeriodType: cpu nanoseconds Period: 10000000 Time: 2018-08-02 04:31:31.975312051 -0400 EDT Duration: 201. Samples: samples/count cpu/nanoseconds 2 20000000: 1 2 3 4 5 6 Locations 1: 0x4c50d1 M=1 main.init.16 /home/alberto/go/src/runtime/testdata/testprogcgo/threadpprof.go:81 s=0 2: 0x40612d M=1 runtime.cgocall /home/alberto/go/src/runtime/cgocall.go:128 s=0 3: 0x4bdcd0 M=1 os/exec.(*Cmd).Start /home/alberto/go/src/os/exec/exec.go:383 s=0 4: 0x4c15fd M=1 _cgoexp_0c3b74faa2ea_GoCheckM :-1 s=0 5: 0x4bd928 M=1 os/exec.(*Cmd).Run /home/alberto/go/src/os/exec/exec.go:309 s=0 6: 0x42dc96 M=1 runtime.panicindex /home/alberto/go/src/runtime/panic.go:44 s=0 Mappings 1: 0x403000/0x4c7000/0x3000 /tmp/testprogcgo 3218b6a46c85f75188b2c89579e4cff936288bfa [FN][FL][LN][IN] pprof's symbol lookup is done by Line 242 in 07bcfe5
Mappings section of the above output indicates non-zero offset. (Start:0x403000, Limit:0x4c7000, Offset:0x3000). The symbol lookup code tries to adjust the requested address with the offset Line 250 in 07bcfe5
But it looks like the addresses in the profile are already adjusted, so we ended up with garbage. $ addr2line -e /tmp/testprogcgo -f 0x4c50d1 cpuHog /home/alberto/go/src/runtime/testdata/testprogcgo/pprof.go:19 # 0x4c50d1 - 0x3000 = 0x4c20d1 $ addr2line -e /tmp/testprogcgo -f 0x4c20d1 main.init.16 /home/alberto/go/src/runtime/testdata/testprogcgo/threadpprof.go:81 @aalexand @rauls5382 |
@hyangah Thanks for the analysis. It is indeed due to the additional |
Change https://golang.org/cl/127895 mentions this issue: |
@ALTree Thanks for sending the files. |
@ianlancetaylor It's probably related to
, see https://fossies.org/linux/binutils/ld/NEWS. From this, sounds like the motivation for that change is "to avoid mixing code pages with data to improve cache performance as well as security". |
Reportedly on some new Fedora systems the linker is producing extra load segments, basically making the dynamic section non-executable. We were assuming that the first load segment could be used to determine the program's load offset, but that is no longer true. Use the first executable load segment instead. Fixes golang#26369 Change-Id: I5ee31ddeef2e8caeed3112edc5149065a6448456 Reviewed-on: https://go-review.googlesource.com/127895 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
What version of Go are you using (
go version
)?master branch
devel +8a330454dc Fri Jul 13 03:53:00 2018 +0000
Does this issue reproduce with the latest release?
Not observed with go1.10, but observed with 1.11beta1
What operating system and processor architecture are you using (
go env
)?amd64/linux
What did you do?
./all.bash build of Go
What did you expect to see?
All tests passing
in KVM VM(Fedora, host laptop/Fedora, also observed in the Fedora build system, x86_64 and armv7 VMs), although it seems that all test pass fine while being run on bare metal machine(laptop/Fedora). I'm still investigating what might be different and will do the bisect, etc....
Hm... this looks like possible dup of #18856
The text was updated successfully, but these errors were encountered: