-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
What version of Go are you using (go version
)?
$ go version go version go1.15rc1 darwin/amd64 $ uname -a Linux ip-172-31-18-196.us-west-2.compute.internal 4.14.123-111.109.amzn2.x86_64 #1 SMP Mon Jun 10 19:37:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Does this issue reproduce with the latest release?
Yes this is present in go1.13.14 and go1.14.6, though with slightly different shapes. When I ran with Go 1.13, the profile contained zero samples. When I ran with Go 1.14, the process took about 3 seconds longer to run (7.8s vs 5s) and included 2.2s worth of samples for the function of interest. (Though if this worked as I expect, the profiles would cover the same amount of time as the "time" shell built-in reported.)
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/rhys/go/pkg/mod" GONOPROXY="*" GONOSUMDB="*" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/usr/local/go" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build464588215=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
On a linux/amd64 machine, I ran a program that does 5ms of work at a time on each of 1000 worker goroutines, and uses runtime.LockOSThread
and runtime.Goexit
to force each worker goroutine to be on a different OS thread.
package main
import (
"flag"
"fmt"
"log"
"os"
"runtime"
"runtime/pprof"
"sync"
"time"
)
func main() {
profileName := flag.String("profile", "./hops.pb.gz", "File name for CPU profile")
tickTime := flag.Duration("tick", 5*time.Millisecond, "How much work to run on each thread")
iterations := flag.Int("iterations", 1000, "Number of threads to use for work")
flag.Parse()
pf, err := os.Create(*profileName)
if err != nil {
log.Fatalf("Create; err = %v", err)
}
defer func() {
err := pf.Close()
if err != nil {
log.Fatalf("Close; err = %v", err)
}
}()
err = pprof.StartCPUProfile(pf)
if err != nil {
log.Fatalf("StartCPUProfile; err = %v", err)
}
defer pprof.StopCPUProfile()
tick := time.Tick(*tickTime)
var p int64 = 1
var wg sync.WaitGroup
for i := 0; i < *iterations; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// Have this code run on its own thread, force the runtime to
// terminate the thread when this goroutine completes (so the next
// run will be on a different thread).
runtime.LockOSThread()
defer runtime.Goexit()
for {
select {
case <-tick:
return
default:
}
work(&p)
}
}()
wg.Wait()
}
fmt.Printf("%d\n", p)
}
func work(p *int64) {
for i := 0; i < 1e5; i++ {
*p *= 3
}
}
What did you expect to see?
I expected the CPU time reported by the shell's time
built-in and the CPU time reported by go tool pprof
to roughly match.
What did you see instead?
When doing 5ms of work on each of 1000 unique threads, the shell's time
built-in reports 5103ms of user-space CPU time, and go tool pprof
reports only 50ms of CPU time.
$ time ./hops.linux_amd64 -profile=./hops.pb.gz -tick=5ms -iterations=1000
-6036671839377369855
real 0m5.105s
user 0m5.103s
sys 0m0.085s
$ go tool pprof -top ./hops.pb.gz
File: hops.linux_amd64
Type: cpu
Time: Jul 27, 2020 at 2:04pm (PDT)
Duration: 5.10s, Total samples = 50ms ( 0.98%)
Showing nodes accounting for 50ms, 100% of 50ms total
flat flat% sum% cum cum%
30ms 60.00% 60.00% 30ms 60.00% main.work
10ms 20.00% 80.00% 10ms 20.00% runtime.(*randomEnum).next (inline)
10ms 20.00% 100% 10ms 20.00% runtime.futex
0 0% 100% 10ms 20.00% main.main
0 0% 100% 30ms 60.00% main.main.func2
0 0% 100% 10ms 20.00% runtime.convT64
0 0% 100% 10ms 20.00% runtime.findrunnable
0 0% 100% 10ms 20.00% runtime.futexsleep
0 0% 100% 10ms 20.00% runtime.gcBgMarkStartWorkers
0 0% 100% 10ms 20.00% runtime.gcStart
0 0% 100% 10ms 20.00% runtime.main
0 0% 100% 10ms 20.00% runtime.mallocgc
0 0% 100% 10ms 20.00% runtime.mstart
0 0% 100% 10ms 20.00% runtime.mstart1
0 0% 100% 10ms 20.00% runtime.notetsleep_internal
0 0% 100% 10ms 20.00% runtime.notetsleepg
0 0% 100% 10ms 20.00% runtime.schedule