Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: cancelCtx exclusive lock causes extreme contention #42564

Open
niaow opened this issue Nov 12, 2020 · 3 comments
Open

context: cancelCtx exclusive lock causes extreme contention #42564

niaow opened this issue Nov 12, 2020 · 3 comments

Comments

@niaow
Copy link

@niaow niaow commented Nov 12, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niaow/.cache/go-build"
GOENV="/home/niaow/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/niaow/go/pkg/mod"
GONOPROXY="github.com/molecula"
GONOSUMDB="github.com/molecula"
GOOS="linux"
GOPATH="/home/niaow/go"
GOPRIVATE="github.com/molecula"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build497931594=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Passed a single context with cancellation to a bunch of goroutines.
These goroutines had a cold-path compute task, interlaced with calls to context.Err() to detect cancellation.
The loop looks something like:

var out []Thing
for iterator.Next() {
	if err := ctx.Err(); err != nil {
		// caller doesn't need a result anymore.
		return nil, err
	}

	// Fetch thing from iterator, apply some filtering functions, and append it to out.
}

What did you expect to see?

A bit of a slowdown from the context check maybe?

What did you see instead?

Slightly over 50% of the CPU time was spent in runtime.findrunnable. The cancelContext struct uses a sync.Mutex, and due to extreme lock contention (64 CPU threads spamming it), this was triggering lockSlow. From poking at pprof, it appears that about 86% of CPU time was spent in functions related to this lock acquire.

I was able to work around this by adding a counter and checking it less frequently. However, I do not think that this is an intended performance degradation path. Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Nov 12, 2020

Thank you for raising this issue. Can you please provide more information, a sample program which demonstrates this, or if that is not possible, the pprof trace you used to identify the lock contention.

Thank you

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 13, 2020

Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.

If you have 64 hardware threads, sync.RWMutex is unlikely to do any better than sync.Mutex: you'll just transform the lock contention to cache contention. (See #17973.)

@bcmills bcmills added the Performance label Nov 13, 2020
@muirdm
Copy link

@muirdm muirdm commented Nov 13, 2020

As a workaround, you can guard slow Err() call with a non-blocking Done() read:

done := ctx.Done()
for something {
  select {
  case <-done:
    return ctx.Err()
  default:
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.