Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: cgo calls are way faster when enabling CPU profile #38325

Open
didrocks opened this issue Apr 9, 2020 · 1 comment
Open

runtime: cgo calls are way faster when enabling CPU profile #38325

didrocks opened this issue Apr 9, 2020 · 1 comment

Comments

@didrocks
Copy link

@didrocks didrocks commented Apr 9, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.2 linux/amd64

Does this issue reproduce with the latest release?

It was present with 1.13.4 and uprading to latest release confirms it

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/j-lallement/.cache/go-build"
GOENV="/home/j-lallement/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/j-lallement/.go/"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/j-lallement/Ubuntu/Dev/zfs/zsys/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build141739689=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  1. We started having a contention on some of our code (https://github.com/ubuntu/zsys) which is using go-libzfs when opening the ZFS pools.

This issues a lot of cgo calls to a library which then proceed to syscall. We expected some performance issues.

  1. The following code:
dsZFS, err := newZ.libzfs.DatasetOpenAll()
if err != nil {
    return fmt.Errorf(i18n.G("can't list datasets: %v"), err)
}

Was running for 13.5s in average in our system (with a lot of datasets)

  1. We thus wanted to start doing CPU profile and did:
f, _ := os.Create("/tmp/cpuprofile")
defer f.Close()
pprof.StartCPUProfile(f)
dsZFS, err := newZ.libzfs.DatasetOpenAll()
if err != nil {
    return fmt.Errorf(i18n.G("can't list datasets: %v"), err)
}
pprof.StopCPUProfile()

The surprise was that the whole call was then taking less than a second! This is fully reproducible on our machine: No change apart from enabling or disabling CPU profile goes from 13.5s without CPU profile to one second with it enabled.
cpu.pprof.txt

The result of DatasetOpenAll isn’t identical with and without CPU profile enabled, there are less objects returns (see the difference in call stack of number of openchildren)

  1. We started to trace without and with CPU profile enabled. This confirms the time difference:

a. Without CPU Profile enabled:

t, _ := os.Create("/tmp/trace")
defer t.Close()
//f, _ := os.Create("/tmp/cpuprofile")
//defer f.Close()
//pprof.StartCPUProfile(f)
trace.Start(t)
dsZFS, err := newZ.libzfs.DatasetOpenAll()
if err != nil {
    return fmt.Errorf(i18n.G("can't list datasets: %v"), err)
}
trace.Stop()
//pprof.StopCPUProfile()

Here is the trace (taking 15s): trace_without_cpuprofile.txt

b. With CPU Profile enabled:

t, _ := os.Create("/tmp/trace")
defer t.Close()
f, _ := os.Create("/tmp/cpuprofile")
defer f.Close()
pprof.StartCPUProfile(f)
trace.Start(t)
dsZFS, err := newZ.libzfs.DatasetOpenAll()
if err != nil {
    return fmt.Errorf(i18n.G("can't list datasets: %v"), err)
}
trace.Stop()
pprof.StopCPUProfile()

Here is the trace (taking 3s): trace_with_cpuprofile.txt

There is a little bit more on the 15s trace of GC calls, but nothing drastic that can explain the difference (38ms vs 4.4ms). We can see there is a lot more gaps though in Execution time.
The other surprise is that we are blocking way more in syscalls (2.3s vs 0.4s).

https://github.com/bicomsystems/go-libzfs doesn’t seem to do anything particular in case it detects it’s being under profile, is there anything CPU profiler is doing to influence the Cgo calls or am I missing anything obvious?

@didrocks didrocks changed the title Enabling CPU profile impacts (in a very positive way) Cgo performance Cgo calls are way faster when enabling CPU profile Apr 9, 2020
@ALTree ALTree changed the title Cgo calls are way faster when enabling CPU profile runtime: cgo calls are way faster when enabling CPU profile Apr 9, 2020
@navytux
Copy link
Contributor

@navytux navytux commented Apr 21, 2020

Another example where CGo things were also running much faster with tracing enabled: #21827 (comment) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.