Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: MutexProfile has multiple records with equal stack #67548

Open
korniltsev opened this issue May 21, 2024 · 3 comments
Open

runtime: MutexProfile has multiple records with equal stack #67548

korniltsev opened this issue May 21, 2024 · 3 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@korniltsev
Copy link
Contributor

Go version

go1.23-f4494522dc

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/korniltsev/.cache/go-build'
GOENV='/home/korniltsev/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/korniltsev/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/korniltsev/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/korniltsev/github/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/korniltsev/github/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.23-f4494522dc Mon May 13 16:42:35 2024 +0000'
GODEBUG=''
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/korniltsev/github/go/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1551311124=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I run this program on gotip https://go.dev/play/p/YD8N3UVACna?v=gotip and I get different results on gotip and 1.22

I think the reason is before the change this line of code

b := stkbucket(which, 0, stk, true)

used stack from callers / gcallers which does skip logical frames and performs inline unwinding and looks like this

[
   0x48f931, //  sync.(*Mutex).Unlock
   0x48f918,  // main.main.func1
   0x46ba81,  // runtime.goexit
]

And after the change the same line uses stack from fpTracebackPCs
which looks like this:

[
   0x4,      // skip value
   0x467405, // sync.event 
   0x44ad65, // runtime.semrelease1
   0x468533, // sync.runtime_Semrelease 
   0x476b5b, // sync.(*Mutex).unlockSlow << this function has a branch, so value here depends on Mutex state
   0x48f931, // main.main.func1
   0x46ba81, // runtime.goexit
]

Because of that, runtime.stkbucket allocates two bucket s for the same logical stack and therefore later we get two BlockProfileRecord with equal stack.

I am not exactly sure this is a bug and how severe it is. It looks like it may lead to memory usage increase.

What did you see happen?

I see multiple BlockProfileRecord with equal stack.

=========================
4905d1  sync.(*Mutex).Unlock
4905b8  main.main.func1
46b241  runtime.goexit
=========================
4905d1  sync.(*Mutex).Unlock
4905b8  main.main.func1
46b241  runtime.goexit

What did you expect to see?

If I run on 1.22 I get only one BlockProfileRecord

=========================
482751  sync.(*Mutex).Unlock
482738  main.main.func1
4622c1  runtime.goexit
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 21, 2024
@korniltsev
Copy link
Contributor Author

cc @nsrip-dd

@nsrip-dd
Copy link
Contributor

Thanks for raising this, I hadn't considered that when sending the CL. Technically these are different stacks, as you observe, it's just that the choice of how much to skip makes them look the same. There is a similar issue with the heap profile, where map internals are elided from the end of call stacks and we can get multiple entries with the same stack in the pprof.

A few things come to mind for how to address this:

  • We could deduplicate/merge the records before returning them.
  • We could handle skipping/inlining at sample time again. This would somewhat defeat the purpose of switching to frame pointer unwinding, where we ideally would keep the recording overhead low and defer the more expensive stack handling logic to reporting time, after we've already aggregated the data.
  • We could skip less, maybe even just up to mutexevent where the sample is recorded, and these would no longer be duplicates. That might warrant a proposal, though.

@korniltsev
Copy link
Contributor Author

I'll just braindump some more ideas:

  • We can change (m *Mutex) unlockSlow implementation to call runtime_Semrelease only once, unconditionally as the last expression before exit. This way return address on the stack will be the same for both starving and non starving mutexes. Not sure about other cases (rwmutex, waitgroup)
  • We can mark (m *Mutex) unlockSlow as noinline and teach saveblockevent to skip until specific function instead of number frames

@mknyszek mknyszek added this to the Backlog milestone May 22, 2024
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

5 participants