-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
What version of Go are you using (go version)?
$ go1.15 version go version go1.15 darwin/amd64 $ uname -a Linux ip-172-31-18-196.us-west-2.compute.internal 4.14.123-111.109.amzn2.x86_64 #1 SMP Mon Jun 10 19:37:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ cat /boot/config* | grep HIGH_RES_TIMERS CONFIG_HIGH_RES_TIMERS=y
I'm cross-compiling to GOOS=linux GOARCH=amd64 for the test.
Does this issue reproduce with the latest release?
Yes, this is present in Go 1.15 which is currently the latest release.
What operating system and processor architecture are you using (go env)?
Here's the view from my laptop; the bug is on linux/amd64.
go env Output
$ go1.15 env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/Users/rhys/Library/Caches/go-build" GOENV="/Users/rhys/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/rhys/go/pkg/mod" GONOPROXY="*" GONOSUMDB="*" GOOS="darwin" GOPATH="/Users/rhys/go" GOPRIVATE="*" GOPROXY="direct" GOROOT="/Users/rhys/go/version/go1.15" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/Users/rhys/go/version/go1.15/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build016610480=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
This program does work in a single goroutine and collects a CPU profile of itself. It periodically forces a thread to be created to do a trivial amount of processing in a separate goroutine.
I altered the period between new thread creations to be less than the CPU profiling rate (100 Hz, 10ms), somewhat longer than that period, and much longer than that period.
The code for the test is in the "details" immediately below:
bumps.go
package main
import (
"context"
"flag"
"fmt"
"log"
"os"
"runtime"
"runtime/pprof"
"sync"
"time"
)
func main() {
profileName := flag.String("profile", "./bumps.pb.gz", "File name for CPU profile")
tickTime := flag.Duration("tick", 5*time.Millisecond, "How long to work before creating a new thread")
duration := flag.Duration("duration", 5*time.Second, "Total duration of the test")
flag.Parse()
pf, err := os.Create(*profileName)
if err != nil {
log.Fatalf("Create; err = %v", err)
}
defer func() {
err := pf.Close()
if err != nil {
log.Fatalf("Close; err = %v", err)
}
}()
err = pprof.StartCPUProfile(pf)
if err != nil {
log.Fatalf("StartCPUProfile; err = %v", err)
}
defer pprof.StopCPUProfile()
ctx, cancel := context.WithTimeout(context.Background(), *duration)
defer cancel()
tick := time.Tick(*tickTime)
var p int64 = 1
workUntilTick := func() {
for {
select {
case <-tick:
return
default:
}
work(&p)
}
}
// All work takes place on goroutine 1. Lock it to the current thread so
// we're sure the
runtime.LockOSThread()
var wg sync.WaitGroup
for ctx.Err() == nil {
workUntilTick()
wg.Add(1)
go func() {
defer wg.Done()
// Have this code run on its own thread, force the runtime to
// terminate the thread when this goroutine completes (so the next
// run will be on a different thread).
runtime.LockOSThread()
defer runtime.Goexit()
// Yield for a moment, try to trigger a call to runtime·execute on
// this thread.
runtime.Gosched()
}()
wg.Wait()
}
fmt.Printf("%d\n", p)
}
func work(p *int64) {
for i := 0; i < 1e5; i++ {
*p *= 3
}
}
What did you expect to see?
I expected the profile to show the CPU cost of the program's work. I expected it to show the CPU cost of the work equally well as I varied the "tick" parameter: without regard to how often the program created new OS threads.
What did you see instead?
When the program creates threads infrequently, its CPU profile reports a number of samples (each representing 10ms of work) that corresponds to the OS's view of how much CPU time the program spent.
When the program creates threads every 20ms, the resulting CPU profile has about half as many 10ms samples as I'd expect.
When the program creates threads faster than it earns SIGPROF deliveries (less than 10ms), the CPU profile ends up with almost no samples.
I think the problem is that runtime.execute calls runtime.setThreadCPUProfiler when the current thread's profilehz is different from what the program requests, which leads to a setitimer syscall to adjust the profiling rate for the entire process. The test program creates new threads while the program is being profiled, and because they've never before run Go code their profilehz values are still 0 .. and that the result of the setitimer syscall they make is to reset the kernel's account of how much CPU time the process has spent.
$ time ./bumps -tick=1s
40153154464199553
real 0m5.118s
user 0m4.978s
sys 0m0.036s
$ go tool pprof -top /tmp/bumps.pb.gz
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:05pm (PDT)
Duration: 5.11s, Total samples = 4.97s (97.19%)
Showing nodes accounting for 4.92s, 98.99% of 4.97s total
Dropped 14 nodes (cum <= 0.02s)
flat flat% sum% cum cum%
4.92s 98.99% 98.99% 4.93s 99.20% main.work
0 0% 98.99% 4.93s 99.20% main.main
0 0% 98.99% 4.93s 99.20% main.main.func2
0 0% 98.99% 4.93s 99.20% runtime.main
0 0% 98.99% 0.04s 0.8% runtime.mstart
0 0% 98.99% 0.04s 0.8% runtime.mstart1
0 0% 98.99% 0.03s 0.6% runtime.sysmon
$ time ./bumps -tick=20ms
-5752685429589199231
real 0m5.135s
user 0m5.057s
sys 0m0.008s
$ go tool pprof -top ./bumps.pb.gz
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:06pm (PDT)
Duration: 5.13s, Total samples = 2.68s (52.25%)
Showing nodes accounting for 2.66s, 99.25% of 2.68s total
Dropped 8 nodes (cum <= 0.01s)
flat flat% sum% cum cum%
2.65s 98.88% 98.88% 2.65s 98.88% main.work
0.01s 0.37% 99.25% 0.02s 0.75% runtime.findrunnable
0 0% 99.25% 2.65s 98.88% main.main
0 0% 99.25% 2.65s 98.88% main.main.func2
0 0% 99.25% 2.65s 98.88% runtime.main
0 0% 99.25% 0.02s 0.75% runtime.mstart
0 0% 99.25% 0.02s 0.75% runtime.mstart1
0 0% 99.25% 0.03s 1.12% runtime.schedule
$ time ./bumps -tick=3ms
7665148222646782337
real 0m5.007s
user 0m4.977s
sys 0m0.344s
$ go tool pprof -top ./bumps.pb.gz
File: bumps
Type: cpu
Time: Aug 21, 2020 at 12:08pm (PDT)
Duration: 5s, Total samples = 0
Showing nodes accounting for 0, 0% of 0 total
flat flat% sum% cum cum%