Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: possible contention on epollwait #48428

Open
zizon opened this issue Sep 17, 2021 · 3 comments
Open

runtime: possible contention on epollwait #48428

zizon opened this issue Sep 17, 2021 · 3 comments

Comments

@zizon
Copy link

@zizon zizon commented Sep 17, 2021

What version of Go are you using (go version)?

$ go version go1.17.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build771282332=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I had implement some kind of redis proxy which dose encode/decode on behave of each command.
By doing some benchmark against this tiny project, I found some maybe interesting things.

With redis pipeline disabled(memtier_benchmark --pipeline 1), as concurrent connection increase,
the throughput continue to increasing , until a certain level, with whatever GOMAXPROCESS.

Taking a look at /proc/pid/stack by a bit sampling, I found all of the Ms are sitting in syscall(entry_SYSCALL_64_after_hwframe namely).

Using ebpf to step down the sample call stack:

#!/usr/local/bin/bpftrace

kprobe:osq_lock 
{
  if( comm=="some_command" ){
   @stack[kstack()] += 1;
  }
}

It appears to be contention on the global single epoll fd used by netpoller.

@stack[
  osq_lock+1
  __mutext_lock_slowpath+19
  mutxt_lock+47
  ep_scan_ready_list.constprop.17+497
  ep_poll+483
  sys_epoll_pwait+414
  do_syscall_64+115
  entry_SYSCALL_64_after_hwframe+61
] : 17325

My suspicion is that:

  1. When there are relative small packet and high concurrent connection involved
  2. most Gs are eager to do net IOs
  3. thus all Ps are eager to do netpoll
  4. which will epollwait on the single epfd
  5. and contention on lock associated with it
  6. It should become worse as the core of CPU increased.

A possible solution is to associate each M with its private epfd?

Note:
Running on Ubuntu Xenial(kernel 4.15.0)

What did you expect to see?

What did you see instead?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 17, 2021

That description doesn't really match how the Go runtime works. At most one M will sleep in epoll_pwait. When there is an M sleeping in epoll_pwait, other M's will not call the poller.

That said, if there is no M sleeping in epoll_pwait, then as each M looks for which goroutine to run next, it will call epoll_pwait with a zero timeout. It is possible that in this case multiple M's will compete for the poller descriptor.

Normally it doesn't help to set GOMAXPROCS to be much larger than the number of cores on the system. How many cores do you have, and what are you setting GOMAXPROCS to?

@ianlancetaylor ianlancetaylor changed the title Possible contention on epollwait runtime: possible contention on epollwait Sep 17, 2021
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Sep 17, 2021
@zizon
Copy link
Author

@zizon zizon commented Sep 17, 2021

The benchmark run at a 64 Core CPU with GOMAXPROCS settting ranging from 4 to 64, with 4 as step.
It reach ~64w QPS at GOMAXPROCS=40, and hardly improve as more are given.

However, If running in muti-process mode, such as 20 Cores per Process but with 2 Process listening on the same port, it can keep scale as perform under GOMAXPROCS<40(as close as projected data).

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 17, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants