Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: TestLabelSystemstack due to sample with no location #51550

Open
bcmills opened this issue Mar 8, 2022 · 8 comments
Open

runtime/pprof: TestLabelSystemstack due to sample with no location #51550

bcmills opened this issue Mar 8, 2022 · 8 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Mar 8, 2022

#!watchflakes
post <- pkg == "runtime/pprof" && test == "TestLabelSystemstack"

This one is a doozy. The failure message comes from here:
https://cs.opensource.google/go/go/+/master:src/runtime/pprof/pprof_test.go;l=1535;drc=18c2033ba587ce63fc9f2d6f52b8bb2e395c561f

That seems to imply that the sample was labeled but its stack was empty(!!), which does seem to be the case in the dump of collected profiles:

        1: labels: map[key:[value]]

(attn @prattmic, CC @golang/runtime)

--- FAIL: TestLabelSystemstack (0.32s)
    pprof_test.go:524: total 85 CPU profile samples collected:
        1: 0x866eef (_ZN6__tsanL14MemoryRangeSetEPNS_11ThreadStateEyyyy.isra.39.part.40:0) 0x8e1ec0 (runtime._System:4432) labels: map[]
        
        2: 0x86a6bc (_ZN6__tsan7MetaMap9FreeRangeEPNS_9ProcessorEyy:0) 0x8e1ec0 (runtime._System:4432) labels: map[key:[value]]
        
        1: 0x8d2f04 (runtime.stdcall1:1090) 0x8d23a9 (runtime.semawakeup:871) 0x8a96ad (runtime.notewakeup:161) 0x8dce48 (runtime.startm:2324) 0x8ded1e (runtime.injectglist.func1:3076 runtime.injectglist:3100) 0x8bf5b6 (runtime.wakeScavenger:222) 0x8bf6d6 (runtime.bgscavenge.func1:268) 0x8f6d21 (runtime.runOneTimer:867) 0x8f6ad2 (runtime.runtimer:775) 0x8df2cf (runtime.checkTimers:3286) 0x8de484 (runtime.stealWork:2868) 0x8dda35 (runtime.findrunnable:2599) 0x8defd8 (runtime.schedule:3187) 0x8df52c (runtime.park_m:3336) 0x905f89 (runtime.mcall:425) labels: map[]
        
        13: 0x907ee6 (runtime.procyield:733) 0x8a9475 (runtime.lock2:69) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0x866ef7 (_ZN6__tsanL14MemoryRangeSetEPNS_11ThreadStateEyyyy.isra.39.part.40:0) 0x8e1ec0 (runtime._System:4432) labels: map[]
        
        1: 0x8bf0d0 (runtime.(*gcControllerState).commit:1089) 0x8bf195 (runtime.(*gcControllerState).setGCPercent:1246) 0x8bf204 (runtime.setGCPercent.func1:1256) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        5: 0x8a9402 (runtime.lock2:61) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        27: 0x8d2f84 (runtime.stdcall2:1099) 0x8d204e (runtime.semasleep:819) 0x8a94f7 (runtime.lock2:89) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) labels: map[key:[value]]
        
        4: 0x8a9419 (runtime.lock2:63) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        11: 0x8e1f00 (runtime._ExternalCode:4433) 0x8e1ec0 (runtime._System:4432) labels: map[key:[value]]
        
        10: 0x8d2f04 (runtime.stdcall1:1090) 0x8d23a9 (runtime.semawakeup:871) 0x8a95f5 (runtime.unlock2:117) 0x8bf238 (runtime.unlockWithRank:31 runtime.unlock:97 runtime.setGCPercent.func1:1259) 0x90600d (runtime.systemstack:469) labels: map[key:[value]]
        
        1: 0x8d3004 (runtime.stdcall3:1108) 0x8b5ba4 (runtime.sysUnused:33) 0x8c098a (runtime.(*pageAlloc).scavengeRangeLocked:775) 0x8c07cd (runtime.(*pageAlloc).scavengeOneFast:726) 0x8c0324 (runtime.(*pageAlloc).scavengeOne:637) 0x8bfcfc (runtime.(*pageAlloc).scavenge.func1:454) 0x90600d (runtime.systemstack:469) labels: map[]
        
        3: 0x907ee4 (runtime.procyield:732) 0x8a9475 (runtime.lock2:69) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0x8d2037 (runtime.semasleep:819) 0x8a94f7 (runtime.lock2:89) 0x8bf1f1 (runtime.lockWithRank:22 runtime.lock:36 runtime.setGCPercent.func1:1255) 0x90600d (runtime.systemstack:469) 0x9020a5 (runtime/debug.setGCPercent:1254) 0xa8de99 (runtime/debug.SetGCPercent:92 runtime/pprof.labelHog:1552) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0xa8de9a (runtime/pprof.labelHog:1549) 0xa8e152 (runtime/pprof.parallelLabelHog.func1:1565) labels: map[key:[value]]
        
        1: 0x8d2f04 (runtime.stdcall1:1090) 0x8d23a9 (runtime.semawakeup:871) 0x8a95f5 (runtime.unlock2:117) 0x8c07e7 (runtime.unlockWithRank:31 runtime.unlock:97 runtime.(*pageAlloc).scavengeOneFast:727) 0x8c0324 (runtime.(*pageAlloc).scavengeOne:637) 0x8bfcfc (runtime.(*pageAlloc).scavenge.func1:454) 0x90600d (runtime.systemstack:469) labels: map[]
        
        1: labels: map[key:[value]]
        
        1: 0x89a3a8 (.text$_ZN6__tsan14DenseSlabAllocINS_10ClockBlockELy65536ELy1024EE6RefillEPNS_19DenseSlabAllocCacheE:0) 0x8e1ec0 (runtime._System:4432) labels: map[]
        
    pprof_test.go:595: runtime.systemstack;key=value: 64
    pprof_test.go:1535: Sample labeled got true want false: 
FAIL
FAIL	runtime/pprof	19.664s

greplogs --dashboard -md -l -e '(?ms)FAIL: TestLabelSystemstack.*Sample labeled got true want false:\s*$'

2022-03-05T08:36:13-55a60ca/windows-amd64-race

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 8, 2022
@bcmills bcmills added this to the Go1.19 milestone Mar 8, 2022
@prattmic
Copy link
Member

prattmic commented Mar 8, 2022

This is indeed interesting. I suspect that a sample with no location is not a new issue, just that this test is one of few to care.

Unfortunately, from this information, we can't tell if this is an issue with the runtime internals failing to collect even a single stack frame, or with proto encoding failing to emit locations from frames.

For this test, the best workaround is to skip samples with no locations. But I would like a reproducer with which to investigate the underlying issue, and notably I'd like to know if this is limited to windows. So for now I'd like to leave the test enabled to gather more data. I am also running a more strict version of this test (fail on any samples with no location) on a windows-amd64-race gomote overnight to see if I get anything.

@prattmic prattmic changed the title runtime/pprof: TestLabelSystemstack failure with " runtime/pprof: TestLabelSystemstack due to sample with no location Mar 8, 2022
@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. OS-Windows and removed release-blocker labels Mar 8, 2022
@gopherbot
Copy link

gopherbot commented Apr 8, 2022

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@bcmills
Copy link
Member Author

bcmills commented Apr 11, 2022

No hits since then, but no reason to suspect that it is fixed either.

greplogs --dashboard -md -l -e '(?ms)FAIL: TestLabelSystemstack.*Sample labeled got true want false:\s*$' --since=2022-03-08

@bcmills bcmills reopened this Apr 11, 2022
@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Apr 11, 2022
@cuonglm
Copy link
Member

cuonglm commented Sep 2, 2022

Just hit this once today in https://storage.googleapis.com/go-build-log/d8188e3c/linux-amd64-nounified_2330af42.log when testing CL https://go-review.googlesource.com/c/go/+/426156. Maybe the CL is the culprit, but not sure either, so post it here and re-run the trybot to see if the problem disappear.

@cuonglm
Copy link
Member

cuonglm commented Sep 2, 2022

Just hit this once today in https://storage.googleapis.com/go-build-log/d8188e3c/linux-amd64-nounified_2330af42.log when testing CL https://go-review.googlesource.com/c/go/+/426156. Maybe the CL is the culprit, but not sure either, so post it here and re-run the trybot to see if the problem disappear.

Re-run trybot successfully, so sounds like the test still flaky somehow.

@bcmills
Copy link
Member Author

bcmills commented Sep 6, 2022

greplogs -l -e '(?ms)FAIL: TestLabelSystemstack.*Sample labeled got true want false:\s*$' --since=2022-03-08
2022-09-06T15:49:33-e3885c4/linux-amd64-longtest
2022-04-29T02:01:27-e7c56fe/windows-amd64-longtest

@bcmills bcmills reopened this Sep 6, 2022
@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 6, 2022
@bcmills bcmills modified the milestones: Go1.19, Go1.20 Sep 6, 2022
@bcmills bcmills reopened this Sep 6, 2022
@prattmic
Copy link
Member

prattmic commented Sep 28, 2022

cc @felixge

@felixge
Copy link
Contributor

felixge commented Sep 28, 2022

@prattmic thanks for the ping, and sorry for this test causing issues again :(. Your last comment indicated you were waiting for more data, but otherwise had an idea for a workaround/fix. If you'd like my help with submitting a patch for that I can take a look. Otherwise I'm not sure if I'll be able to shed further light on the root cause of the empty stack. But I could try to take a closer look at that as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
Status: Todo
Status: No status
Development

No branches or pull requests

5 participants