runtime: unexpectedly large slowdown with runtime.LockOSThread #18023
Comments
The golang-nuts thread about this says that using 'perf stat' shows approximately 1000X more thread context switches with LockOSThread than without. Is that enough confirmation? |
Somehow I don't see that, but, yes, there is probably nothing that can be done here. Sorry for the noise. |
Let me chime in a bit. On Linux the context switch can happen, if my reading of futex_wake() is correct (which is probably not), because e.g. wake_up_q() via calling
for woken process. The Go runtime calls When LockOSThread is used an M is dedicated to G so when that G blocks, e.g. on chan send, that M, if I undestand correctly, has high chances to stop. And if it stops it goes to With this thinking the following patch: diff --git a/src/runtime/lock_futex.go b/src/runtime/lock_futex.go
index 9d55bd129c..418fe1b845 100644
--- a/src/runtime/lock_futex.go
+++ b/src/runtime/lock_futex.go
@@ -146,7 +157,13 @@ func notesleep(n *note) {
// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
ns = 10e6
}
- for atomic.Load(key32(&n.key)) == 0 {
+ for spin := 0; atomic.Load(key32(&n.key)) == 0; spin++ {
+ // spin a bit hoping we'll get wakup soon
+ if spin < 10000 {
+ continue
+ }
+
+ // no luck -> go to sleep heavily to kernel
gp.m.blocked = true
futexsleep(key32(&n.key), 0, ns)
if *cgo_yield != nil { makes BenchmarkLocked much faster on my computer:
I also looked around and found: essentially at every CGo call lockOSThread is used: https://github.com/golang/go/blob/ab401077/src/runtime/cgocall.go#L107 With this in mind I modified the benchmark a bit so that no LockOSThread is explicitly used, but server performs 1 and 10 simple C calls for every request:
which shows the change brings quite visible speedup. This way I'm not saying my patch is right, but at least it shows that much can be improved. So I suggest to reopen the issue. Thanks beforehand, /cc @dvyukov, @aclements, @bcmills full benchmark source: ( package tmp
import (
"runtime"
"testing"
)
type in struct {
c chan *out
arg int
}
type out struct {
ret int
}
func client(c chan *in, arg int) int {
rc := make(chan *out)
c <- &in{
c: rc,
arg: arg,
}
ret := <-rc
return ret.ret
}
func _server(c chan *in, argadjust func(int) int) {
for r := range c {
r.c <- &out{ret: argadjust(r.arg)}
}
}
func server(c chan *in) {
_server(c, func(arg int) int {
return 3 + arg
})
}
func lockedServer(c chan *in) {
runtime.LockOSThread()
server(c)
runtime.UnlockOSThread()
}
// server with 1 C call per request
func cserver(c chan *in) {
_server(c, cargadjust)
}
// server with 10 C calls per request
func cserver10(c chan *in) {
_server(c, func(arg int) int {
for i := 0; i < 10; i++ {
arg = cargadjust(arg)
}
return arg
})
}
func benchmark(b *testing.B, srv func(chan *in)) {
inc := make(chan *in)
go srv(inc)
for i := 0; i < b.N; i++ {
client(inc, i)
}
close(inc)
}
func BenchmarkUnlocked(b *testing.B) { benchmark(b, server) }
func BenchmarkLocked(b *testing.B) { benchmark(b, lockedServer) }
func BenchmarkCGo(b *testing.B) { benchmark(b, cserver) }
func BenchmarkCGo10(b *testing.B) { benchmark(b, cserver10) } ( package tmp
// int argadjust(int arg) { return 3 + arg; }
import "C"
// XXX here because cannot use C in tests directly
func cargadjust(arg int) int {
return int(C.argadjust(C.int(arg)))
} |
@navytux This issue is closed. If you want to discuss your patch, please open a new issue or raise a topic on the golang-dev mailing list. Thanks. |
@ianlancetaylor thanks for feedback. I've opened #21827 and posted update to original thread on golang-nuts. |
The benchmark https://play.golang.org/p/nI4LO1wu17 shows a significant slowdown when using
runtime.LockOSThread
.On amd64 at current tip I see this output from
go test -test.cpu=1,2,4,8 -test.bench=.
:The problem may be entirely due to the extra thread context switches required by
runtime.LockOSThread
. When the thread is not locked, the channel communication may be done in practice by a simple goroutine switch. When usingruntime.LockOSThread
a full thread context switch is required.Still, we should make sure we have the tools to confirm that.
The text was updated successfully, but these errors were encountered: