Skip to content

runtime/pprof: under-samples work on short-lived threads #40435

@rhysh

Description

@rhysh

What version of Go are you using (go version)?

$ go version
go version go1.15rc1 darwin/amd64

$ uname -a
Linux ip-172-31-18-196.us-west-2.compute.internal 4.14.123-111.109.amzn2.x86_64 #1 SMP Mon Jun 10 19:37:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Does this issue reproduce with the latest release?

Yes this is present in go1.13.14 and go1.14.6, though with slightly different shapes. When I ran with Go 1.13, the profile contained zero samples. When I ran with Go 1.14, the process took about 3 seconds longer to run (7.8s vs 5s) and included 2.2s worth of samples for the function of interest. (Though if this worked as I expect, the profiles would cover the same amount of time as the "time" shell built-in reported.)

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOENV="/Users/rhys/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/rhys/go/pkg/mod"
GONOPROXY="*"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPRIVATE="*"
GOPROXY="direct"
GOROOT="/usr/local/go"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build464588215=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

On a linux/amd64 machine, I ran a program that does 5ms of work at a time on each of 1000 worker goroutines, and uses runtime.LockOSThread and runtime.Goexit to force each worker goroutine to be on a different OS thread.

package main

import (
	"flag"
	"fmt"
	"log"
	"os"
	"runtime"
	"runtime/pprof"
	"sync"
	"time"
)

func main() {
	profileName := flag.String("profile", "./hops.pb.gz", "File name for CPU profile")
	tickTime := flag.Duration("tick", 5*time.Millisecond, "How much work to run on each thread")
	iterations := flag.Int("iterations", 1000, "Number of threads to use for work")
	flag.Parse()

	pf, err := os.Create(*profileName)
	if err != nil {
		log.Fatalf("Create; err = %v", err)
	}
	defer func() {
		err := pf.Close()
		if err != nil {
			log.Fatalf("Close; err = %v", err)
		}
	}()
	err = pprof.StartCPUProfile(pf)
	if err != nil {
		log.Fatalf("StartCPUProfile; err = %v", err)
	}
	defer pprof.StopCPUProfile()

	tick := time.Tick(*tickTime)

	var p int64 = 1
	var wg sync.WaitGroup
	for i := 0; i < *iterations; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()

			// Have this code run on its own thread, force the runtime to
			// terminate the thread when this goroutine completes (so the next
			// run will be on a different thread).
			runtime.LockOSThread()
			defer runtime.Goexit()

			for {
				select {
				case <-tick:
					return
				default:
				}

				work(&p)
			}
		}()
		wg.Wait()
	}

	fmt.Printf("%d\n", p)
}

func work(p *int64) {
	for i := 0; i < 1e5; i++ {
		*p *= 3
	}
}

What did you expect to see?

I expected the CPU time reported by the shell's time built-in and the CPU time reported by go tool pprof to roughly match.

What did you see instead?

When doing 5ms of work on each of 1000 unique threads, the shell's time built-in reports 5103ms of user-space CPU time, and go tool pprof reports only 50ms of CPU time.

$ time ./hops.linux_amd64 -profile=./hops.pb.gz -tick=5ms -iterations=1000
-6036671839377369855

real	0m5.105s
user	0m5.103s
sys	0m0.085s

$ go tool pprof -top ./hops.pb.gz
File: hops.linux_amd64
Type: cpu
Time: Jul 27, 2020 at 2:04pm (PDT)
Duration: 5.10s, Total samples = 50ms ( 0.98%)
Showing nodes accounting for 50ms, 100% of 50ms total
      flat  flat%   sum%        cum   cum%
      30ms 60.00% 60.00%       30ms 60.00%  main.work
      10ms 20.00% 80.00%       10ms 20.00%  runtime.(*randomEnum).next (inline)
      10ms 20.00%   100%       10ms 20.00%  runtime.futex
         0     0%   100%       10ms 20.00%  main.main
         0     0%   100%       30ms 60.00%  main.main.func2
         0     0%   100%       10ms 20.00%  runtime.convT64
         0     0%   100%       10ms 20.00%  runtime.findrunnable
         0     0%   100%       10ms 20.00%  runtime.futexsleep
         0     0%   100%       10ms 20.00%  runtime.gcBgMarkStartWorkers
         0     0%   100%       10ms 20.00%  runtime.gcStart
         0     0%   100%       10ms 20.00%  runtime.main
         0     0%   100%       10ms 20.00%  runtime.mallocgc
         0     0%   100%       10ms 20.00%  runtime.mstart
         0     0%   100%       10ms 20.00%  runtime.mstart1
         0     0%   100%       10ms 20.00%  runtime.notetsleep_internal
         0     0%   100%       10ms 20.00%  runtime.notetsleepg
         0     0%   100%       10ms 20.00%  runtime.schedule

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions