Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: TestCPUProfileInlining failures with "got separate Location entries" #50996

Closed
bcmills opened this issue Feb 3, 2022 · 15 comments
Assignees
Labels
NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Feb 3, 2022

greplogs --dashboard -md -l -e 'FAIL: TestCPUProfileInlining .*(?:\n .*)* got separate Location entries' --since=2021-01-01

2022-02-03T16:58:29-f9b761a/windows-amd64-longtest

--- FAIL: TestCPUProfileInlining (5.13s)
    pprof_test.go:524: total 500 CPU profile samples collected:
        366: 0x94c524 (runtime/pprof.cpuHog0:64 runtime/pprof.inlinedCallee:304 runtime/pprof.inlinedCaller:299) 0x94add3 (runtime/pprof.cpuHogger:41) 0x94c189 (runtime/pprof.TestCPUProfileInlining.func1:278) 0x94cda2 (runtime/pprof.testCPUProfile:451) 0x94c2bb (runtime/pprof.TestCPUProfileInlining:277) 0x8dbf01 (testing.tRunner:1440) labels: map[]
        
        116: 0x94c51c (runtime/pprof.cpuHog0:64 runtime/pprof.inlinedCallee:304 runtime/pprof.inlinedCaller:299) 0x94add3 (runtime/pprof.cpuHogger:41) 0x94c189 (runtime/pprof.TestCPUProfileInlining.func1:278) 0x94cda2 (runtime/pprof.testCPUProfile:451) 0x94c2bb (runtime/pprof.TestCPUProfileInlining:277) 0x8dbf01 (testing.tRunner:1440) labels: map[]
        
        1: 0x857169 (runtime.nanotime:18) 0x8864b3 (time.Since:878) 0x94ae12 (runtime/pprof.cpuHogger:40) 0x94c189 (runtime/pprof.TestCPUProfileInlining.func1:278) 0x94cda2 (runtime/pprof.testCPUProfile:451) 0x94c2bb (runtime/pprof.TestCPUProfileInlining:277) 0x8dbf01 (testing.tRunner:1440) labels: map[]
        
        1: 0x94c501 (runtime/pprof.inlinedCaller:299) 0x94add3 (runtime/pprof.cpuHogger:41) 0x94c189 (runtime/pprof.TestCPUProfileInlining.func1:278) 0x94cda2 (runtime/pprof.testCPUProfile:451) 0x94c2bb (runtime/pprof.TestCPUProfileInlining:277) 0x8dbf01 (testing.tRunner:1440) labels: map[]
        
        14: 0x94c520 (runtime/pprof.cpuHog0:64 runtime/pprof.inlinedCallee:304) 0x94c501 (runtime/pprof.inlinedCaller:299) 0x94add3 (runtime/pprof.cpuHogger:41) 0x94c189 (runtime/pprof.TestCPUProfileInlining.func1:278) 0x94cda2 (runtime/pprof.testCPUProfile:451) 0x94c2bb (runtime/pprof.TestCPUProfileInlining:277) 0x8dbf01 (testing.tRunner:1440) labels: map[]
        
        1: 0x84722a (runtime.runqsteal:6108) 0x83fe91 (runtime.stealWork:3080) 0x83f375 (runtime.findrunnable:2788) 0x840918 (runtime.schedule:3376) 0x840e6c (runtime.park_m:3525) 0x865589 (runtime.mcall:425) labels: map[]
        
        1: 0x840b01 (runtime.checkTimers:3445) 0x8407d2 (runtime.schedule:3337) 0x840f64 (runtime.goschedImpl:3540) 0x841254 (runtime.gopreempt_m:3568) 0x84ee39 (runtime.newstack:1068) 0x8656f2 (runtime.morestack:547) labels: map[]
        
    pprof_test.go:595: runtime/pprof.inlinedCallee: 496
    pprof_test.go:595: runtime/pprof.inlinedCaller: 497
    pprof_test.go:293: want inlinedCallee followed by inlinedCaller, got separate Location entries:
        PeriodType: cpu nanoseconds
        Period: 10000000
        Time: 2022-02-03 17:20:19.3752399 +0000 GMT
        Duration: 5.1318594s
        Samples:
        samples/count cpu/nanoseconds
                366 3660000000: 1 2 3 4 5 6 
                116 1160000000: 7 2 3 4 5 6 
                  1   10000000: 8 9 10 3 4 5 6 
                  1   10000000: 11 2 3 4 5 6 
                 14  140000000: 12 11 2 3 4 5 6 
                  1   10000000: 13 14 15 16 17 18 
                  1   10000000: 19 20 21 22 23 24 
        Locations
             1: 0x94c524 M=1 runtime/pprof.cpuHog0 C:/workdir/go/src/runtime/pprof/pprof_test.go:64 s=0
                     runtime/pprof.inlinedCallee C:/workdir/go/src/runtime/pprof/pprof_test.go:304 s=0
                     runtime/pprof.inlinedCaller C:/workdir/go/src/runtime/pprof/pprof_test.go:299 s=0
             2: 0x94add3 M=1 runtime/pprof.cpuHogger C:/workdir/go/src/runtime/pprof/pprof_test.go:41 s=0
             3: 0x94c189 M=1 runtime/pprof.TestCPUProfileInlining.func1 C:/workdir/go/src/runtime/pprof/pprof_test.go:278 s=0
             4: 0x94cda2 M=1 runtime/pprof.testCPUProfile C:/workdir/go/src/runtime/pprof/pprof_test.go:451 s=0
             5: 0x94c2bb M=1 runtime/pprof.TestCPUProfileInlining C:/workdir/go/src/runtime/pprof/pprof_test.go:277 s=0
             6: 0x8dbf01 M=1 testing.tRunner C:/workdir/go/src/testing/testing.go:1440 s=0
             7: 0x94c51c M=1 runtime/pprof.cpuHog0 C:/workdir/go/src/runtime/pprof/pprof_test.go:64 s=0
                     runtime/pprof.inlinedCallee C:/workdir/go/src/runtime/pprof/pprof_test.go:304 s=0
                     runtime/pprof.inlinedCaller C:/workdir/go/src/runtime/pprof/pprof_test.go:299 s=0
             8: 0x857169 M=1 runtime.nanotime C:/workdir/go/src/runtime/time_nofake.go:18 s=0
             9: 0x8864b3 M=1 time.Since C:/workdir/go/src/time/time.go:878 s=0
            10: 0x94ae12 M=1 runtime/pprof.cpuHogger C:/workdir/go/src/runtime/pprof/pprof_test.go:40 s=0
            11: 0x94c501 M=1 runtime/pprof.inlinedCaller C:/workdir/go/src/runtime/pprof/pprof_test.go:299 s=0
            12: 0x94c520 M=1 runtime/pprof.cpuHog0 C:/workdir/go/src/runtime/pprof/pprof_test.go:64 s=0
                     runtime/pprof.inlinedCallee C:/workdir/go/src/runtime/pprof/pprof_test.go:304 s=0
            13: 0x84722a M=1 runtime.runqsteal C:/workdir/go/src/runtime/proc.go:6108 s=0
            14: 0x83fe91 M=1 runtime.stealWork C:/workdir/go/src/runtime/proc.go:3080 s=0
            15: 0x83f375 M=1 runtime.findrunnable C:/workdir/go/src/runtime/proc.go:2788 s=0
            16: 0x840918 M=1 runtime.schedule C:/workdir/go/src/runtime/proc.go:3376 s=0
            17: 0x840e6c M=1 runtime.park_m C:/workdir/go/src/runtime/proc.go:3525 s=0
            18: 0x865589 M=1 runtime.mcall C:/workdir/go/src/runtime/asm_amd64.s:425 s=0
            19: 0x840b01 M=1 runtime.checkTimers C:/workdir/go/src/runtime/proc.go:3445 s=0
            20: 0x8407d2 M=1 runtime.schedule C:/workdir/go/src/runtime/proc.go:3337 s=0
            21: 0x840f64 M=1 runtime.goschedImpl C:/workdir/go/src/runtime/proc.go:3540 s=0
            22: 0x841254 M=1 runtime.gopreempt_m C:/workdir/go/src/runtime/proc.go:3568 s=0
            23: 0x84ee39 M=1 runtime.newstack C:/workdir/go/src/runtime/stack.go:1068 s=0
            24: 0x8656f2 M=1 runtime.morestack C:/workdir/go/src/runtime/asm_amd64.s:547 s=0
        Mappings
        1: 0x0/0x0/0x0   [FN]
FAIL
FAIL	runtime/pprof	59.856s

2021-11-05T17:23:06-37951d8/linux-386-longtest
2021-10-31T02:24:29-0bef30d/linux-amd64-nocgo
2021-10-28T20:39:36-834e36e/windows-amd64-longtest
2021-08-06T20:44:03-63b968f/openbsd-mips64-jsing
2021-05-08T17:03:18-b38b1b2/linux-amd64-staticlockranking

May be related to #42862.

CC @cherrymui @prattmic @rhysh

@bcmills
Copy link
Member Author

bcmills commented Feb 3, 2022

(Note that most of these failures are on first-class ports.)

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 3, 2022
@bcmills bcmills added this to the Go1.19 milestone Feb 3, 2022
@rhysh
Copy link
Contributor

rhysh commented Feb 3, 2022

Looking at the linux-amd64-nocgo failure, https://build.golang.org/log/1d40b595779b94ddca0ff6d018d5378dea23f3f6 that log shows three call stacks, all with root ends of locations 2,3,4,5,6. The kind of leaf end the test is looking for is "1", PC 0x539604, encoded as inlinedCaller calling inlinedCallee calling cpuHog0. But the test fails because some of the samples have leaf ends of location 7 (PC 0x5395e1, inlinedCaller), and of location 7 calling location 8 (PC 0x5395fc, inlinedCallee calling cpuHog0).

        Samples:
        samples/count cpu/nanoseconds
                  7   70000000: 1 2 3 4 5 6 
                  1   10000000: 7 2 3 4 5 6 
                  2   20000000: 8 7 2 3 4 5 6 
        Locations
             1: 0x539604 M=1 runtime/pprof.cpuHog0 /workdir/go/src/runtime/pprof/pprof_test.go:63 s=0
                     runtime/pprof.inlinedCallee /workdir/go/src/runtime/pprof/pprof_test.go:269 s=0
                     runtime/pprof.inlinedCaller /workdir/go/src/runtime/pprof/pprof_test.go:264 s=0
             2: 0x537fb3 M=1 runtime/pprof.cpuHogger /workdir/go/src/runtime/pprof/pprof_test.go:40 s=0
             3: 0x539269 M=1 runtime/pprof.TestCPUProfileInlining.func1 /workdir/go/src/runtime/pprof/pprof_test.go:243 s=0
             4: 0x539f46 M=1 runtime/pprof.testCPUProfile /workdir/go/src/runtime/pprof/pprof_test.go:415 s=0
             5: 0x5393ab M=1 runtime/pprof.TestCPUProfileInlining /workdir/go/src/runtime/pprof/pprof_test.go:242 s=0
             6: 0x4c9c81 M=1 testing.tRunner /workdir/go/src/testing/testing.go:1409 s=0
             7: 0x5395e1 M=1 runtime/pprof.inlinedCaller /workdir/go/src/runtime/pprof/pprof_test.go:264 s=0
             8: 0x5395fc M=1 runtime/pprof.cpuHog0 /workdir/go/src/runtime/pprof/pprof_test.go:63 s=0
                     runtime/pprof.inlinedCallee /workdir/go/src/runtime/pprof/pprof_test.go:269 s=0

When I checked out the commit from that log and rebuilt the test command, I got the following disassembly. (Very very nice that the PCs match what's in the build log. Wow.)

$ go version
go version devel go1.18-0bef30d28a Sun Oct 31 02:24:29 2021 +0000 darwin/arm64
$ GOOS=linux GOARCH=amd64 go test -c -o /tmp/pprof.linux_amd64.0bef30d28a.test runtime/pprof
$ go tool objdump -s '^runtime/pprof.inlinedCaller$' /tmp/pprof.linux_amd64.0bef30d28a.test
TEXT runtime/pprof.inlinedCaller(SB) /usr/local/go/src/runtime/pprof/pprof_test.go
  pprof_test.go:269	0x5395e0		90			NOPL			
  pprof_test.go:264	0x5395e1		31c9			XORL CX, CX		
  pprof_test.go:59	0x5395e3		eb03			JMP 0x5395e8		
  pprof_test.go:59	0x5395e5		48ffc1			INCQ CX			
  pprof_test.go:59	0x5395e8		4881f9a0860100		CMPQ $0x186a0, CX	
  pprof_test.go:59	0x5395ef		7d15			JGE 0x539606		
  pprof_test.go:60	0x5395f1		4885c0			TESTQ AX, AX		
  pprof_test.go:60	0x5395f4		7e06			JLE 0x5395fc		
  pprof_test.go:61	0x5395f6		480fafc0		IMULQ AX, AX		
  pprof_test.go:61	0x5395fa		ebe9			JMP 0x5395e5		
  pprof_test.go:63	0x5395fc		488d5001		LEAQ 0x1(AX), DX	
  pprof_test.go:63	0x539600		480fafc2		IMULQ DX, AX		
  pprof_test.go:63	0x539604		ebdf			JMP 0x5395e5		
  pprof_test.go:265	0x539606		c3			RET			

Here's the part of the test that fails. It looks at the Location table, not the samples themselves.

// Check if inlined function locations are encoded correctly. The inlinedCalee and inlinedCaller should be in one location.
for _, loc := range p.Location {
hasInlinedCallerAfterInlinedCallee, hasInlinedCallee := false, false
for _, line := range loc.Line {
if line.Function.Name == "runtime/pprof.inlinedCallee" {
hasInlinedCallee = true
}
if hasInlinedCallee && line.Function.Name == "runtime/pprof.inlinedCaller" {
hasInlinedCallerAfterInlinedCallee = true
}
}
if hasInlinedCallee != hasInlinedCallerAfterInlinedCallee {
t.Fatalf("want inlinedCallee followed by inlinedCaller, got separate Location entries:\n%v", p)
}
}


  • How does the profile include samples that show a location stack of 8 7 2 3 4 5 6? Location 7 is PC 0x5395e1, and the address right before that is a NOPL instruction, not a CALL instruction.
  • How does the symbolization of location 8, PC 0x5395fc, not include inlinedCaller?

@rhysh
Copy link
Contributor

rhysh commented Feb 5, 2022

runtime.gentraceback expands inlined frames. A comment in runtime/pprof/proto.go refers to "sequence of PCs, including the fake PCs returned by the traceback to represent inlined functions". The functions around runtime/pprof.(*profileBuilder).appendLocsForStack try to re-compress those fake PCs back into the original call stack.

It looks like this gets into trouble when one of the "fake" PCs for an outer frame shows up in an actual profile sample. I don't understand all of the profileBuilder code yet, but it looks like the presence of that real PC in the under-construction location map prevents the code from interpreting that same PC as a "fake" PC when it appears mid-stack in a later sample.


I've reproduced the failure on linux/amd64 with with go version devel go1.18-0bef30d28a Sun Oct 31 02:24:29 2021 +0000. I added this patch to see the stacks that appendLocsForStack processes.

diff --git a/src/runtime/pprof/proto.go b/src/runtime/pprof/proto.go
index 54e7a80183..6d87329e24 100644
--- a/src/runtime/pprof/proto.go
+++ b/src/runtime/pprof/proto.go
@@ -388,6 +388,12 @@ func (b *profileBuilder) appendLocsForStack(locs []uint64, stk []uintptr) (newLo
        // The last frame might be truncated. Recover lost inline frames.
        stk = runtime_expandFinalInlineFrame(stk)
 
+       fmt.Printf("appendLocsForStack")
+       for _, pc := range stk {
+               fmt.Printf(" %#x", pc)
+       }
+       fmt.Printf("\n")
+
        for len(stk) > 0 {
                addr := stk[0]
                if l, ok := b.locs[addr]; ok {

Here are the failures I've seen with the patch:

$ cat /tmp/go-stress-20220204T234002-1660603581
appendLocsForStack 0x539725 0x539701 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
appendLocsForStack 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
appendLocsForStack 0x53971d 0x539701 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
--- FAIL: TestCPUProfileInlining (0.20s)
    pprof_test.go:505: total 10 CPU profile samples collected:
        7: 0x539724 (runtime/pprof.cpuHog0:63 runtime/pprof.inlinedCallee:269 runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
        1: 0x539701 (runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
        2: 0x53971c (runtime/pprof.cpuHog0:63 runtime/pprof.inlinedCallee:269) 0x539701 (runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
    pprof_test.go:540: runtime/pprof.inlinedCallee: 9
    pprof_test.go:540: runtime/pprof.inlinedCaller: 10
    pprof_test.go:258: want inlinedCallee followed by inlinedCaller, got separate Location entries:
        PeriodType: cpu nanoseconds
        Period: 10000000
        Time: 2022-02-04 23:44:26.390203172 +0000 UTC
        Duration: 200.502158ms
        Samples:
        samples/count cpu/nanoseconds
                  7   70000000: 1 2 3 4 5 6 
                  1   10000000: 7 2 3 4 5 6 
                  2   20000000: 8 7 2 3 4 5 6 
        Locations
             1: 0x539724 M=1 runtime/pprof.cpuHog0 /usr/local/go/src/runtime/pprof/pprof_test.go:63 s=0
                     runtime/pprof.inlinedCallee /usr/local/go/src/runtime/pprof/pprof_test.go:269 s=0
                     runtime/pprof.inlinedCaller /usr/local/go/src/runtime/pprof/pprof_test.go:264 s=0
             2: 0x5380d3 M=1 runtime/pprof.cpuHogger /usr/local/go/src/runtime/pprof/pprof_test.go:40 s=0
             3: 0x539389 M=1 runtime/pprof.TestCPUProfileInlining.func1 /usr/local/go/src/runtime/pprof/pprof_test.go:243 s=0
             4: 0x53a066 M=1 runtime/pprof.testCPUProfile /usr/local/go/src/runtime/pprof/pprof_test.go:415 s=0
             5: 0x5394cb M=1 runtime/pprof.TestCPUProfileInlining /usr/local/go/src/runtime/pprof/pprof_test.go:242 s=0
             6: 0x4c9c81 M=1 testing.tRunner /usr/local/go/src/testing/testing.go:1409 s=0
             7: 0x539701 M=1 runtime/pprof.inlinedCaller /usr/local/go/src/runtime/pprof/pprof_test.go:264 s=0
             8: 0x53971c M=1 runtime/pprof.cpuHog0 /usr/local/go/src/runtime/pprof/pprof_test.go:63 s=0
                     runtime/pprof.inlinedCallee /usr/local/go/src/runtime/pprof/pprof_test.go:269 s=0
        Mappings
        1: 0x400000/0x54c000/0x0 /home/ec2-user/pprof.mod.0bef30d28a.test  [FN]
        2: 0x7ffc335e5000/0x7ffc335e7000/0x0 [vdso]  
        3: 0xffffffffff600000/0xffffffffff601000/0x0 [vsyscall]  
FAIL


ERROR: exit status 1
appendLocsForStack 0x53971d 0x539701 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
appendLocsForStack 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
appendLocsForStack 0x539725 0x539701 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
--- FAIL: TestCPUProfileInlining (0.20s)
    pprof_test.go:505: total 10 CPU profile samples collected:
        5: 0x53971c (runtime/pprof.cpuHog0:63 runtime/pprof.inlinedCallee:269 runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
        1: 0x539701 (runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
        4: 0x539724 (runtime/pprof.cpuHog0:63 runtime/pprof.inlinedCallee:269) 0x539701 (runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
    pprof_test.go:540: runtime/pprof.inlinedCallee: 9
    pprof_test.go:540: runtime/pprof.inlinedCaller: 10
    pprof_test.go:258: want inlinedCallee followed by inlinedCaller, got separate Location entries:
        PeriodType: cpu nanoseconds
        Period: 10000000
        Time: 2022-02-04 23:58:55.08537043 +0000 UTC
        Duration: 200.519029ms
        Samples:
        samples/count cpu/nanoseconds
                  5   50000000: 1 2 3 4 5 6 
                  1   10000000: 7 2 3 4 5 6 
                  4   40000000: 8 7 2 3 4 5 6 
        Locations
             1: 0x53971c M=1 runtime/pprof.cpuHog0 /usr/local/go/src/runtime/pprof/pprof_test.go:63 s=0
                     runtime/pprof.inlinedCallee /usr/local/go/src/runtime/pprof/pprof_test.go:269 s=0
                     runtime/pprof.inlinedCaller /usr/local/go/src/runtime/pprof/pprof_test.go:264 s=0
             2: 0x5380d3 M=1 runtime/pprof.cpuHogger /usr/local/go/src/runtime/pprof/pprof_test.go:40 s=0
             3: 0x539389 M=1 runtime/pprof.TestCPUProfileInlining.func1 /usr/local/go/src/runtime/pprof/pprof_test.go:243 s=0
             4: 0x53a066 M=1 runtime/pprof.testCPUProfile /usr/local/go/src/runtime/pprof/pprof_test.go:415 s=0
             5: 0x5394cb M=1 runtime/pprof.TestCPUProfileInlining /usr/local/go/src/runtime/pprof/pprof_test.go:242 s=0
             6: 0x4c9c81 M=1 testing.tRunner /usr/local/go/src/testing/testing.go:1409 s=0
             7: 0x539701 M=1 runtime/pprof.inlinedCaller /usr/local/go/src/runtime/pprof/pprof_test.go:264 s=0
             8: 0x539724 M=1 runtime/pprof.cpuHog0 /usr/local/go/src/runtime/pprof/pprof_test.go:63 s=0
                     runtime/pprof.inlinedCallee /usr/local/go/src/runtime/pprof/pprof_test.go:269 s=0
        Mappings
        1: 0x400000/0x54c000/0x0 /home/ec2-user/pprof.mod.0bef30d28a.test  [FN]
        2: 0x7fffae1df000/0x7fffae1e1000/0x0 [vdso]  
        3: 0xffffffffff600000/0xffffffffff601000/0x0 [vsyscall]  
FAIL


ERROR: exit status 1

Here's one of the success cases with the patch:

$ ./pprof.mod.0bef30d28a.test -test.run=TestCPUProfileInlining -test.short -test.cpu=4 -test.v
=== RUN   TestCPUProfileInlining
appendLocsForStack 0x539725 0x539701 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
appendLocsForStack 0x53971d 0x539701 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
appendLocsForStack 0x539721 0x539701 0x539702 0x5380d4 0x53938a 0x53a067 0x5394cc 0x4c9c82 0x464da1
    pprof_test.go:505: total 9 CPU profile samples collected:
        6: 0x539724 (runtime/pprof.cpuHog0:63 runtime/pprof.inlinedCallee:269 runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
        2: 0x53971c (runtime/pprof.cpuHog0:63 runtime/pprof.inlinedCallee:269 runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
        1: 0x539720 (runtime/pprof.cpuHog0:63 runtime/pprof.inlinedCallee:269 runtime/pprof.inlinedCaller:264) 0x5380d3 (runtime/pprof.cpuHogger:40) 0x539389 (runtime/pprof.TestCPUProfileInlining.func1:243) 0x53a066 (runtime/pprof.testCPUProfile:415) 0x5394cb (runtime/pprof.TestCPUProfileInlining:242) 0x4c9c81 (testing.tRunner:1409)
        
    pprof_test.go:540: runtime/pprof.inlinedCallee: 9
    pprof_test.go:540: runtime/pprof.inlinedCaller: 9
--- PASS: TestCPUProfileInlining (0.20s)
PASS

Note that the failures include PC 0x539702 as a leaf, but the behavior of the successful case suggests that appendLocsForStack believes that gentraceback constructed it as a "fake" PC.

@gopherbot
Copy link

Change https://go.dev/cl/384239 mentions this issue: runtime/pprof: check if PC is reused for inlining

@prattmic
Copy link
Member

prattmic commented Feb 9, 2022

"The fake PCs that the runtime creates to represent inlined calls can also appear in the outer function as real PCs."

This sounds wrong to me. I think (though I'm not certain) that the compiler should insert enough NOPs as necessary to ensure that the tracepc for inlined functions will always be less than the PC of the next instruction in the calling function. @cherrymui does this sound right to you?

If that is correct, then it would seem the bug is either in:

  • The compiler not inserting enough NOPs.
  • The compiler generating incorrect (off-by-one?) inline tree funcdata.
  • The runtime reading the inline tree funcdata incorrectly (off-by-one?).

i.e., it looks like your CL would suppress the issue, but if I'm not misunderstanding it may be fixing it in the wrong place.

@rhysh
Copy link
Contributor

rhysh commented Feb 10, 2022

If all of those mechanisms work, it seems like the result is an outer function that contains a sequence of NOP instructions leading up to the instructions that form the body of the inlined function. What stops those NOP instructions from executing, from taking CPU time to execute, and from showing up as the leaf-most PC in a CPU profile?

(Edit: the example program I wrote to test the presentation of NOPs uses mid-stack inlining, so the inlined body is a CALL instruction, as reported by go tool objdump).

@prattmic
Copy link
Member

Nothing stops them from naturally appearing in a profile sample. When they do they should be recorded as part of the inlined function.

@rhysh
Copy link
Contributor

rhysh commented Feb 10, 2022

When I ran the test with the patch to print the arguments to appendLocsForStack, the inlinedCaller function started at address 0x539700 (in my second post). The failing tests included leaf stacks of 0x539725 0x539701 0x539702 and 0x53971d 0x539701 0x539702, and also of 0x539702 (all called by 0x5380d4). I've rebuilt the test program a few times since, and the machine code remains the same but the entry address has of course changed—so we'll have to recalculate some offsets.

I think at that point we're supposed to subtract one from each address passed in to appendLocsForStack. Doing so gives PCs that point to the start of instructions. (Is that the right way to interpret the addresses?)

Here's objdump from the program I have running now, CL 384239 PS 1.

$ objdump -d ./pprof.fix6.9e0de1fe7b.test | sed -n -e '/inlinedCaller>/,/\(^$\|int3\)/p'
000000000053a8a0 <runtime/pprof.inlinedCaller>:
  53a8a0:	90                   	nop
  53a8a1:	31 c9                	xor    %ecx,%ecx
  53a8a3:	eb 03                	jmp    53a8a8 <runtime/pprof.inlinedCaller+0x8>
  53a8a5:	48 ff c1             	inc    %rcx
  53a8a8:	48 81 f9 a0 86 01 00 	cmp    $0x186a0,%rcx
  53a8af:	7d 15                	jge    53a8c6 <runtime/pprof.inlinedCaller+0x26>
  53a8b1:	48 85 c0             	test   %rax,%rax
  53a8b4:	7e 06                	jle    53a8bc <runtime/pprof.inlinedCaller+0x1c>
  53a8b6:	48 0f af c0          	imul   %rax,%rax
  53a8ba:	eb e9                	jmp    53a8a5 <runtime/pprof.inlinedCaller+0x5>
  53a8bc:	48 8d 50 01          	lea    0x1(%rax),%rdx
  53a8c0:	48 0f af c2          	imul   %rdx,%rax
  53a8c4:	eb df                	jmp    53a8a5 <runtime/pprof.inlinedCaller+0x5>
  53a8c6:	c3                   	retq   
  53a8c7:	cc                   	int3   

And here is how those addresses show up with binutils' addr2line when I ask it to describe inlining:

$ echo 0x53a8a0 | addr2line -i -e ./pprof.fix6.9e0de1fe7b.test
/usr/local/go/src/runtime/pprof/pprof_test.go:304
/usr/local/go/src/runtime/pprof/pprof_test.go:299
$ echo 0x53a8a1 | addr2line -i -e ./pprof.fix6.9e0de1fe7b.test
/usr/local/go/src/runtime/pprof/pprof_test.go:299
$ echo 0x53a8bc | addr2line -i -e ./pprof.fix6.9e0de1fe7b.test
/usr/local/go/src/runtime/pprof/pprof_test.go:64
/usr/local/go/src/runtime/pprof/pprof_test.go:304
/usr/local/go/src/runtime/pprof/pprof_test.go:299
$ echo 0x53a8c4 | addr2line -i -e ./pprof.fix6.9e0de1fe7b.test
/usr/local/go/src/runtime/pprof/pprof_test.go:64
/usr/local/go/src/runtime/pprof/pprof_test.go:304
/usr/local/go/src/runtime/pprof/pprof_test.go:299

# and because I'm not certain when it's right to add/subtract 1
$ echo 0x53a8a2 | addr2line -i -e ./pprof.fix6.9e0de1fe7b.test
/usr/local/go/src/runtime/pprof/pprof_test.go:299
$ echo 0x53a8a3 | addr2line -i -e ./pprof.fix6.9e0de1fe7b.test
/usr/local/go/src/runtime/pprof/pprof_test.go:60
/usr/local/go/src/runtime/pprof/pprof_test.go:304
/usr/local/go/src/runtime/pprof/pprof_test.go:299

So it looks to me like a leaf of 0x539725 0x539701 0x539702 means "+0x1 calls +0x0 calls +0x24", and the level of inlining that binutils' addr2line reports for those offsets increases by one each time. ("line 299" calls "line 304/299" calls "line 64/304/299")

It looks like all of that works well. And appendLocsForStack is able to detect when a sequence of PCs should be collapsed into a single Location in the protobuf-formatted profile.

But when offset +0x1 shows up by itself at the leaf ("Nothing stops them from naturally appearing in a profile sample"), it gets its own Location and then appears in b.locs. It looks to me like the bug is that the presence of the +0x1 offset in b.locs breaks the ability of appendLocsForStack to correctly detect sequences of "+0x1 calls +0x0 calls +0x24" as a single Location. Once a PC ends up in that map, appendLocsForStack will no longer attempt to consider it as a candidate for the inlining deck.

@rhysh
Copy link
Contributor

rhysh commented Feb 14, 2022

That is to say, based on what I know, the behavior and the fix I proposed seem consistent with how I understand these parts. But I don't know enough about the funcdata info to consider whether this is how a bug in those parts would present itself.

It looks like each NOP instruction only gets used to show a single layer of inlining—so if a SIGPROF hitting on one of those should be reported as all of the layers of inlining (rather than only those up to that particular NOP), then yes, there are more pieces missing in order to make that work. Is that what you're describing @prattmic ?

@prattmic
Copy link
Member

That is what I am thinking, but I'll need to find time to think about this more deeply. I've been swamped dealing with other issues for the past few days. :)

@cherrymui
Copy link
Member

If I understand it correctly I think the NOPs have inlining information attached to them up to the level its line number represents, e.g. for

  pprof_test.go:269	0x5395e0		90			NOPL			
  pprof_test.go:264	0x5395e1		31c9			XORL CX, CX		

the NOPL is associated with inlinedCaller, the XORL is associated with inlinedCallee inlined into inlinedCaller.

I'd still need to understand what went wrong...

@cherrymui
Copy link
Member

runtime.gentraceback expands inlined frames. A comment in runtime/pprof/proto.go refers to "sequence of PCs, including the fake PCs returned by the traceback to represent inlined functions". The functions around runtime/pprof.(*profileBuilder).appendLocsForStack try to re-compress those fake PCs back into the original call stack.

I agree that expansion and re-compression is weird. Maybe ideally we could find a way to avoid the expansion then re-compression.

I think I understand the failure mode. The fix in the CL probably makes sense, at least for the short time if we cannot avoid the expansion and re-compression. Thanks.

@cherrymui
Copy link
Member

Maybe another idea is to have the compiler to insert the NOPs at the end of a function so they never get executed.

@bcmills
Copy link
Member Author

bcmills commented Mar 8, 2022

One more on darwin/amd64:
greplogs --dashboard -md -l -e 'FAIL: TestCPUProfileInlining .*(?:\n .*)* got separate Location entries' --since=2022-02-04

2022-03-08T15:18:16-0b76afc/darwin-amd64-race

I don't have a strong opinion about how to prioritize the fix, but if it's going to be more than a few more weeks I'd at least appreciate adding a test-skip in the meantime.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 8, 2022
@gopherbot
Copy link

Change https://go.dev/cl/404995 mentions this issue: runtime/pprof: fix allFrames cache

gopherbot pushed a commit that referenced this issue May 13, 2022
The compiler may choose to inline multiple layers of function call, such
that A calling B calling C may end up with all of the instructions for B
and C written as part of A's function body.

Within that function body, some PCs will represent code from function A.
Some will represent code from function B, and for each of those the
runtime will have an instruction attributable to A that it can report as
its caller. Others will represent code from function C, and for each of
those the runtime will have an instruction attributable to B and an
instruction attributable to A that it can report as callers.

When a profiling signal arrives at an instruction in B (as inlined in A)
that the runtime also uses to describe calls to C, the profileBuilder
ends up with an incorrect cache of allFrames results. That PC should
lead to a location record in the profile that represents the frames
B<-A, but the allFrames cache's view should expand the PC only to the B
frame.

Otherwise, when a profiling signal arrives at an instruction in C (as
inlined in B in A), the PC stack C,B,A can get expanded to the frames
C,B<-A,A as follows: The inlining deck starts empty. The first tryAdd
call proposes PC C and frames C, which the deck accepts. The second
tryAdd call proposes PC B and, due to the incorrect caching, frames B,A.
(A fresh call to allFrames with PC B would return the frame list B.) The
deck accepts that PC and frames. The third tryAdd call proposes PC A and
frames A. The deck rejects those because a call from A to A cannot
possibly have been inlined. This results in a new location record in the
profile representing the frames C<-B<-A (good), as called by A (bad).

The bug is the cached expansion of PC B to frames B<-A. That mapping is
only appropriate for the resulting protobuf-format profile. The cache
needs to reflect the results of a call to allFrames, which expands the
PC B to the single frame B.

For #50996
For #52693
Fixes #52764

Change-Id: I36d080f3c8a05650cdc13ced262189c33b0083b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/404995
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@rsc rsc unassigned rhysh Jun 22, 2022
@prattmic prattmic self-assigned this Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
Status: Done
Development

No branches or pull requests

6 participants