-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
What version of Go are you using (go version)?
$ go version go version go1.15.7 darwin/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env)?
go env Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/felix.geisendoerfer/Library/Caches/go-build" GOENV="/Users/felix.geisendoerfer/Library/Application Support/go/env" GOEXE="" GOFLAGS="-count=1" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/felix.geisendoerfer/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/felix.geisendoerfer/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/Cellar/go/1.15.7_1/libexec" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/Cellar/go/1.15.7_1/libexec/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/Users/felix.geisendoerfer/go/src/github.com/felixge/go-profiler-notes/examples/block-bias/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/mg/xwzs985x35789zk2l14nk8k80000gn/T/go-build460068460=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
I created a program with 2 goroutines. One is blocked on a slow time.NewTicker() loop, the other on one that is 10x faster. Then I observed the program using the block profiler with the duration of the slow ticker as the rate.
Code: https://gist.github.com/felixge/6f9701dd26707dbe7d351805860a4608
What did you expect to see?
Both goroutines are blocked 100% of their time, so I'd expect both of them to show up equally in the block profile like shown below:
What did you see instead?
The block profiler sampling seems to be biased, preferring infrequent long block events over frequent short events, even when they contribute to the same amount of overall blocking. This can be seen in the block profile result I actually got:
I've done a lot of additional research on this which can be found here: https://github.com/felixge/go-profiler-notes/blob/main/block.md#sampling-bias
Proposal
I propose to fix the profiler in a way that removes the sampling bias without changing the API or increasing overhead. Concretely this might be done with a patch like this:
felixge/go@master...debias-blockprofile-rate
I've described this in more detail along with a simulation here: https://github.com/felixge/go-profiler-notes/blob/main/block.md#simulation--proposal-for-improvement
The downside to this approach is that it will obscure the "average event duration" which could currently be determined by dividing the delay/nanoseconds by the contentions/count values in the profile. I have a few ideas around this that I'd be happy to discuss.
Additionally I propose to improve the documentation for the block profiler to give people a better idea what the rate value means and perhaps also make some recommendations about overhead.
I'm happy to work on all of the above and submit the patches for it.

