Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: cgo performance tracking bug #9704

Open
minux opened this issue Jan 28, 2015 · 12 comments
Labels
Milestone

Comments

@minux
Copy link
Member

@minux minux commented Jan 28, 2015

Running this stupid microbenchmark on linux/amd64, with different version of Go.
http://play.golang.org/p/5U0i26sA8U

package main

// int rand() { return 42; }
import "C"

import "testing"

func BenchmarkCgo(b *testing.B) {
    for i := 0; i < b.N; i++ {
        C.rand()
    }
}

func main() {
    testing.Main(func(string, string) (bool, error) {
        return true, nil
    }, nil, []testing.InternalBenchmark{
        {"BenchmarkCgo", BenchmarkCgo},
    }, nil)
}
$ go1 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            30.8 ns/op
$ go112 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            40.9 ns/op
$ go121 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            46.1 ns/op
$ go133 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            48.3 ns/op
$ go141 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    10000000           160 ns/op
$ go run cgobench.go -test.bench=. # today's Go tip, f4a2617
testing: warning: no tests to run
PASS
BenchmarkCgo    10000000           203 ns/op

Why? Go 1.4 is much worse than any of the previous releases.
And Go tip is even worse than Go 1.4. This might be understandable,
but I wonder why Go 1.4 is that much slower than 1.3.3?

@minux minux added this to the Go1.5 milestone Jan 28, 2015
@dvyukov

This comment has been minimized.

Copy link
Member

@dvyukov dvyukov commented Jan 28, 2015

@RLH @randall77 @rsc

It is combined effect of C->Go conversion, write barriers and atomic manipulation of goroutine statuses:

10.45% cgo cgo [.] runtime.cas
8.23% cgo cgo [.] runtime.deferreturn
7.21% cgo cgo [.] runtime.writebarrierptr
6.60% cgo cgo [.] runtime.reentersyscall
6.29% cgo cgo [.] runtime.newdefer
4.07% cgo cgo [.] runtime.getg
3.77% cgo cgo [.] runtime.exitsyscall
3.53% cgo cgo [.] main.BenchmarkCgo
3.50% cgo cgo [.] runtime.systemstack
3.20% cgo cgo [.] runtime.cgocall_errno
3.16% cgo cgo [.] main._Cfunc_rand
2.94% cgo cgo [.] runtime.freedefer
2.93% cgo cgo [.] runtime.deferproc
2.84% cgo cgo [.] runtime.exitsyscallfast
2.69% cgo cgo [.] runtime.casgstatus
2.58% cgo cgo [.] runtime.func.042
2.53% cgo cgo [.] runtime.atomicstore
2.50% cgo cgo [.] runtime.acquirem
2.42% cgo cgo [.] runtime.releasem

@rsc rsc removed cgo labels Apr 10, 2015
@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Jun 8, 2015

Or maybe it's a sinister plot to encourage people to write Go code.

Either way, too late for Go 1.5.

@capnm

This comment has been minimized.

Copy link

@capnm capnm commented Aug 26, 2015

linux/amd64, linux/arm:

amd64  go-13      BenchmarkCgo  10000000               179 ns/op
amd64  go-14      BenchmarkCgo   3000000               494 ns/op
amd64  go-15      BenchmarkCgo   5000000               354 ns/op ~2x

arm go-13         BenchmarkCgo   2000000               821 ns/op
arm go-14         BenchmarkCgo   1000000              2359 ns/op
arm go-15         BenchmarkCgo    500000              2570 ns/op ~3x

@mwhudson

This comment has been minimized.

Copy link
Contributor

@mwhudson mwhudson commented Oct 2, 2016

Going by BenchmarkCgoCall in misc/cgo/test, on my linux/amd64 system, current tip is about twice as fast as go 1.7. Go 1.7, 1.6 and 1.5 are all about the same and Go 1.4 is slightly slower (I lack the energy to get 1.3 cgo working on this system). So the thing the bug specifically complains about ("decrease of cgocall performance") is probably fixed, but is this fast enough? (It's always hard to know when a bug like this can be closed).

@crawshaw

This comment has been minimized.

Copy link
Contributor

@crawshaw crawshaw commented Oct 2, 2016

http://golang.org/cl/30080 ~halved the cgo overhead between 1.7 and tip.

As reported this can be closed, but how about we keep it as a cgo performance tracking bug? I don't expect any more improvements for 1.8, but in the future a runtime accounting overhaul could make it cheaper.

@minux

This comment has been minimized.

Copy link
Member Author

@minux minux commented Oct 2, 2016

@minux minux changed the title runtime: steady decrease of cgocall performance runtime: cgo performance tracking bug Oct 2, 2016
@minux minux added the Performance label Oct 2, 2016
@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Oct 3, 2016

@minux, what benchmark were you running?

@dgryski

This comment has been minimized.

Copy link
Contributor

@dgryski dgryski commented Jan 3, 2018

@aclements I think @minux was running the Go program at the top of the bug that just calls into a C function returning an int.

@navytux

This comment has been minimized.

Copy link
Contributor

@navytux navytux commented Feb 18, 2018

Note that in addition to serial CGo slowness (i.e. ~ 60ns for 1 call) making several Cgo calls in sequence in presence of other goroutines can bring more slowdown: #19574 (comment).

@thepudds

This comment has been minimized.

Copy link

@thepudds thepudds commented Jun 4, 2019

Now that https://golang.org/cl/171758 is merged for 1.13 fixing #6980, is that expected to help here as well?

When a defer is executed at most once in a function body,
we can allocate the defer record for it on the stack instead
of on the heap.
...
name     old time/op  new time/op  delta
Defer-4  52.2ns ± 5%  36.2ns ± 3%  -30.70%  (p=0.000 n=10+10)
@thepudds

This comment has been minimized.

Copy link

@thepudds thepudds commented Jun 4, 2019

FWIW, from a very quick test, using tip does not seem to be significantly faster than 1.12.5. Also, I don't know if this is already tracked elsewhere or perhaps not a meaningful result, but this quick test seems to show go1.10 improving, but go1.12 slowing down again.

===============
go1.7.6
BenchmarkCgo-8          100000000              197 ns/op
===============
go1.8.7
BenchmarkCgo-8          200000000               79.7 ns/op
===============
go1.9.4
BenchmarkCgo-8          200000000               81.3 ns/op
===============
go1.10.4
BenchmarkCgo-8          200000000               73.4 ns/op
===============
go1.11.4
BenchmarkCgo-8          200000000               73.2 ns/op
===============
go1.12.5
BenchmarkCgo-8          200000000               81.7 ns/op
===============
gotip
BenchmarkCgo-8          152019986               79.0 ns/op

This is running the benchmark from #9704 (comment) via
go run cgobench.go -test.bench=. -test.benchtime=10s on linux/amd64.

gotip is devel +fff4f59 Tue Jun 4 17:35:20 2019 +0000

@navytux

This comment has been minimized.

Copy link
Contributor

@navytux navytux commented Jun 5, 2019

I confirm what @thepudds reports - there is improvement with go1.10, but slowdown back with go1.12, and things are not getting faster with tip:

$ benchstat go17.txt go18.txt go19.txt go110.txt go111.txt go112.txt gotip.txt 
name \ time/op  go17.txt    go18.txt   go19.txt     go110.txt    go111.txt    go112.txt    gotip.txt
Cgo-4           186ns ± 1%  82ns ± 0%
goos:linux goarch:amd64
Cgo-4                                  84.5ns ± 0%  74.9ns ± 0%  76.6ns ± 0%  81.1ns ± 0%  81.4ns ± 0%

All measurements were done on unloaded machine with CPU frequency fixed and CPU idle states except C1 disabled (see http://navytux.spb.ru/~kirr/neo.html#measurements-stability for details)

$ ./neotest info-local
date:   Wed, 05 Jun 2019 11:44:55 +0300
xnode:  ...
uname:  Linux deco 4.19.0-5-amd64 #1 SMP Debian 4.19.37-3 (2019-05-15) x86_64 GNU/Linux
cpu:    Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
cpu/[0-3]/freq: intel_pstate/performance [2.60GHz - 2.60GHz]
cpu/[0-3]/idle: intel_idle/menu: POLL·0/0 C1·2/2 !C1E·10/20 !C3·70/100 !C6·85/200 !C7s·124/800 !C8·200/800 !C9·480/5000 !C10·890/5000 # elat/tres µs
...
$ gotip version
go version devel +5f509148b1 Wed Jun 5 00:53:25 2019 +0000 linux/amd64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.