Running pprof on a program that has O(10^6) cpu-bound goroutines loses ~50% of samples.
Code is github.com/btracey/stackmc/examples/paper/rosen_unif
Svg of pprof output: http://stanford.edu/~btracey/gobench/stackmc/graph.svg
Working now on a shorter reproducer
[btracey@zion ~]$ go env
GOARCH="amd64"
GOBIN=""
GOCHAR="6"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/ADL/btracey/mygo"
GORACE=""
GOROOT="/ADL/btracey/gover/go_tip/go"
GOTOOLDIR="/ADL/btracey/gover/go_tip/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"