context: cancelCtx exclusive lock causes extreme contention

### What version of Go are you using (`go version`)?

<pre>
$ go version
go version go1.15.3 linux/amd64
</pre>

### Does this issue reproduce with the latest release?
Yes.


### What operating system and processor architecture are you using (`go env`)?

<details><summary><code>go env</code> Output</summary><br><pre>
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niaow/.cache/go-build"
GOENV="/home/niaow/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/niaow/go/pkg/mod"
GONOPROXY="github.com/molecula"
GONOSUMDB="github.com/molecula"
GOOS="linux"
GOPATH="/home/niaow/go"
GOPRIVATE="github.com/molecula"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build497931594=/tmp/go-build -gno-record-gcc-switches"
</pre></details>

### What did you do?

Passed a single context with cancellation to a bunch of goroutines.
These goroutines had a cold-path compute task, interlaced with calls to `context.Err()` to detect cancellation.
The loop looks something like:
```
var out []Thing
for iterator.Next() {
	if err := ctx.Err(); err != nil {
		// caller doesn't need a result anymore.
		return nil, err
	}

	// Fetch thing from iterator, apply some filtering functions, and append it to out.
}
```

### What did you expect to see?
A bit of a slowdown from the context check maybe?

### What did you see instead?
Slightly over 50% of the CPU time was spent in `runtime.findrunnable`. The `cancelContext` struct uses a `sync.Mutex`, and due to extreme lock contention (64 CPU threads spamming it), this was triggering `lockSlow`. From poking at pprof, it appears that about 86% of CPU time was spent in functions related to this lock acquire.

I was able to work around this by adding a counter and checking it less frequently. However, I do not think that this is an intended performance degradation path. Theoretically this could be made more efficient with `sync/atomic`, although I think a `sync.RWMutex` would still be more than sufficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

context: cancelCtx exclusive lock causes extreme contention #42564

What version of Go are you using (`go version`)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

context: cancelCtx exclusive lock causes extreme contention #42564

Description

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?