Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: cancelCtx exclusive lock causes extreme contention #42564

Closed
niaow opened this issue Nov 12, 2020 · 8 comments
Closed

context: cancelCtx exclusive lock causes extreme contention #42564

niaow opened this issue Nov 12, 2020 · 8 comments

Comments

@niaow
Copy link

@niaow niaow commented Nov 12, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niaow/.cache/go-build"
GOENV="/home/niaow/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/niaow/go/pkg/mod"
GONOPROXY="github.com/molecula"
GONOSUMDB="github.com/molecula"
GOOS="linux"
GOPATH="/home/niaow/go"
GOPRIVATE="github.com/molecula"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build497931594=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Passed a single context with cancellation to a bunch of goroutines.
These goroutines had a cold-path compute task, interlaced with calls to context.Err() to detect cancellation.
The loop looks something like:

var out []Thing
for iterator.Next() {
	if err := ctx.Err(); err != nil {
		// caller doesn't need a result anymore.
		return nil, err
	}

	// Fetch thing from iterator, apply some filtering functions, and append it to out.
}

What did you expect to see?

A bit of a slowdown from the context check maybe?

What did you see instead?

Slightly over 50% of the CPU time was spent in runtime.findrunnable. The cancelContext struct uses a sync.Mutex, and due to extreme lock contention (64 CPU threads spamming it), this was triggering lockSlow. From poking at pprof, it appears that about 86% of CPU time was spent in functions related to this lock acquire.

I was able to work around this by adding a counter and checking it less frequently. However, I do not think that this is an intended performance degradation path. Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Nov 12, 2020

Thank you for raising this issue. Can you please provide more information, a sample program which demonstrates this, or if that is not possible, the pprof trace you used to identify the lock contention.

Thank you

Loading

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 13, 2020

Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.

If you have 64 hardware threads, sync.RWMutex is unlikely to do any better than sync.Mutex: you'll just transform the lock contention to cache contention. (See #17973.)

Loading

@muirdm
Copy link

@muirdm muirdm commented Nov 13, 2020

As a workaround, you can guard slow Err() call with a non-blocking Done() read:

done := ctx.Done()
for something {
  select {
  case <-done:
    return ctx.Err()
  default:
  }
}

Loading

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 16, 2021

FWIW, this just showed up in some of our profiles recently. It was only 1% of our overall CPU time, but I expected 0.

Here's a benchmark:

func BenchmarkContextCancelDone(b *testing.B) {
        ctx, cancel := context.WithCancel(context.Background())
        defer cancel()
        b.RunParallel(func(pb *testing.PB) {
                for pb.Next() {
                        select {
                        case <-ctx.Done():
                        default:
                        }
                }
        })
}

Profile:

dev:issue42564_cancelctx $ go tool pprof prof.cpu
File: issue42564_cancelctx.test
Type: cpu
Time: Jan 15, 2021 at 8:41pm (PST)
Duration: 1.51s, Total samples = 3.40s (225.88%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2630ms, 77.35% of 3400ms total
Dropped 12 nodes (cum <= 17ms)
Showing top 10 nodes out of 67
      flat  flat%   sum%        cum   cum%
     890ms 26.18% 26.18%     1280ms 37.65%  sync.(*Mutex).lockSlow
     590ms 17.35% 43.53%     2500ms 73.53%  context.(*cancelCtx).Done
     250ms  7.35% 50.88%      620ms 18.24%  sync.(*Mutex).Unlock (inline)
     220ms  6.47% 57.35%      220ms  6.47%  runtime.usleep
     150ms  4.41% 61.76%      150ms  4.41%  runtime.cansemacquire (inline)
     140ms  4.12% 65.88%      280ms  8.24%  runtime.lock2
     120ms  3.53% 69.41%      120ms  3.53%  runtime.futex
      90ms  2.65% 72.06%       90ms  2.65%  runtime.osyield
      90ms  2.65% 74.71%       90ms  2.65%  runtime.procyield
      90ms  2.65% 77.35%      120ms  3.53%  runtime.unlock2

If you cache the ctx.Done() value, though:

func BenchmarkContextCancelDone_Workaround(b *testing.B) {
        ctx, cancel := context.WithCancel(context.Background())
        defer cancel()
        done := ctx.Done()
        b.RunParallel(func(pb *testing.PB) {
                for pb.Next() {
                        select {
                        case <-done:
                        default:
                        }
                }
        })
}

Then it's much faster:

BenchmarkContextCancelDone
BenchmarkContextCancelDone-8                     8253436               164.1 ns/op
BenchmarkContextCancelDone_Workaround
BenchmarkContextCancelDone_Workaround-8         504514413                2.365 ns/op

Looks like this is due to @josharian's 986768d (context: lazily initialize cancelCtx done channel) 😄 which had good numbers, but didn't consider mutex contention.

@josharian elsewhere suggested we could switch cancelCtx to use an atomic.Value.

/cc @crawshaw

Loading

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 16, 2021

And fortunately #39351 (sync/atomic: add (*Value).Swap and (*Value).CompareAndSwap) was finally accepted after 5 years of requests (and has open CL https://go-review.googlesource.com/c/go/+/241678/), so this could use that.

I'll mark this as Go 1.17, with the assumption that that'll work out.

Loading

@bradfitz bradfitz removed this from the Backlog milestone Jan 16, 2021
@bradfitz bradfitz added this to the Go1.17 milestone Jan 16, 2021
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 16, 2021

(It'd probably need to CompareAndSwap with an old value of an package-unexported sentinel channel that means "uninitialized", as CompareAndSwap per that proposal/CL doesn't accept a nil value for old)

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 29, 2021

Change https://golang.org/cl/288193 mentions this issue: context: reduce contention in cancelCtx.Done

Loading

@josharian
Copy link
Contributor

@josharian josharian commented Jan 29, 2021

This doesn't need compare and swap; we already have a mutex available for other reasons to guard writes. I took a first stab in CL 288193, but apparently I screwed something up. Will aim to fix next week...

Loading

@gopherbot gopherbot closed this in ae1fa08 Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants