sync: RWMutex scales poorly with CPU count

On a machine with many cores, the performance of `sync.RWMutex.R{Lock,Unlock}` degrades dramatically as `GOMAXPROCS` increases.

This test program:

```go
package benchmarks_test

import (
	"fmt"
	"sync"
	"testing"
)

func BenchmarkRWMutex(b *testing.B) {
	for ng := 1; ng <= 256; ng <<= 2 {
		b.Run(fmt.Sprint(ng), func(b *testing.B) {
			var mu sync.RWMutex
			mu.Lock()

			var wg sync.WaitGroup
			wg.Add(ng)

			n := b.N
			quota := n / ng

			for g := ng; g > 0; g-- {
				if g == 1 {
					quota = n
				}

				go func(quota int) {
					for i := 0; i < quota; i++ {
						mu.RLock()
						mu.RUnlock()
					}
					wg.Done()
				}(quota)

				n -= quota
			}

			if n != 0 {
				b.Fatalf("Incorrect quota assignments: %v remaining", n)
			}

			b.StartTimer()
			mu.Unlock()
			wg.Wait()
			b.StopTimer()
		})
	}
}
```

degrades by a factor of 8x as it saturates threads and cores, presumably due to cache contention on &rw.readerCount:

```
# ./benchmarks.test -test.bench . -test.cpu 1,4,16,64
testing: warning: no tests to run
BenchmarkRWMutex/1      20000000                72.6 ns/op
BenchmarkRWMutex/1-4    20000000                72.4 ns/op
BenchmarkRWMutex/1-16   20000000                72.8 ns/op
BenchmarkRWMutex/1-64   20000000                72.5 ns/op
BenchmarkRWMutex/4      20000000                72.6 ns/op
BenchmarkRWMutex/4-4    20000000               105 ns/op
BenchmarkRWMutex/4-16   10000000               130 ns/op
BenchmarkRWMutex/4-64   20000000               160 ns/op
BenchmarkRWMutex/16     20000000                72.4 ns/op
BenchmarkRWMutex/16-4   10000000               125 ns/op
BenchmarkRWMutex/16-16  10000000               263 ns/op
BenchmarkRWMutex/16-64   5000000               287 ns/op
BenchmarkRWMutex/64     20000000                72.6 ns/op
BenchmarkRWMutex/64-4   10000000               137 ns/op
BenchmarkRWMutex/64-16   5000000               306 ns/op
BenchmarkRWMutex/64-64   3000000               517 ns/op
BenchmarkRWMutex/256                    20000000                72.4 ns/op
BenchmarkRWMutex/256-4                  20000000               137 ns/op
BenchmarkRWMutex/256-16                  5000000               280 ns/op
BenchmarkRWMutex/256-64                  3000000               602 ns/op
PASS
```

A "control" test, calling a no-op function instead of `RWMutex` methods, displays no such degradation: the problem does not appear to be due to runtime scheduling overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync: RWMutex scales poorly with CPU count #17973

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sync: RWMutex scales poorly with CPU count #17973

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions