Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: RWMutex scales poorly with CPU count #17973

Open
bcmills opened this issue Nov 18, 2016 · 40 comments

Comments

@bcmills
Copy link
Member

commented Nov 18, 2016

On a machine with many cores, the performance of sync.RWMutex.R{Lock,Unlock} degrades dramatically as GOMAXPROCS increases.

This test program:

package benchmarks_test

import (
	"fmt"
	"sync"
	"testing"
)

func BenchmarkRWMutex(b *testing.B) {
	for ng := 1; ng <= 256; ng <<= 2 {
		b.Run(fmt.Sprint(ng), func(b *testing.B) {
			var mu sync.RWMutex
			mu.Lock()

			var wg sync.WaitGroup
			wg.Add(ng)

			n := b.N
			quota := n / ng

			for g := ng; g > 0; g-- {
				if g == 1 {
					quota = n
				}

				go func(quota int) {
					for i := 0; i < quota; i++ {
						mu.RLock()
						mu.RUnlock()
					}
					wg.Done()
				}(quota)

				n -= quota
			}

			if n != 0 {
				b.Fatalf("Incorrect quota assignments: %v remaining", n)
			}

			b.StartTimer()
			mu.Unlock()
			wg.Wait()
			b.StopTimer()
		})
	}
}

degrades by a factor of 8x as it saturates threads and cores, presumably due to cache contention on &rw.readerCount:

# ./benchmarks.test -test.bench . -test.cpu 1,4,16,64
testing: warning: no tests to run
BenchmarkRWMutex/1      20000000                72.6 ns/op
BenchmarkRWMutex/1-4    20000000                72.4 ns/op
BenchmarkRWMutex/1-16   20000000                72.8 ns/op
BenchmarkRWMutex/1-64   20000000                72.5 ns/op
BenchmarkRWMutex/4      20000000                72.6 ns/op
BenchmarkRWMutex/4-4    20000000               105 ns/op
BenchmarkRWMutex/4-16   10000000               130 ns/op
BenchmarkRWMutex/4-64   20000000               160 ns/op
BenchmarkRWMutex/16     20000000                72.4 ns/op
BenchmarkRWMutex/16-4   10000000               125 ns/op
BenchmarkRWMutex/16-16  10000000               263 ns/op
BenchmarkRWMutex/16-64   5000000               287 ns/op
BenchmarkRWMutex/64     20000000                72.6 ns/op
BenchmarkRWMutex/64-4   10000000               137 ns/op
BenchmarkRWMutex/64-16   5000000               306 ns/op
BenchmarkRWMutex/64-64   3000000               517 ns/op
BenchmarkRWMutex/256                    20000000                72.4 ns/op
BenchmarkRWMutex/256-4                  20000000               137 ns/op
BenchmarkRWMutex/256-16                  5000000               280 ns/op
BenchmarkRWMutex/256-64                  3000000               602 ns/op
PASS

A "control" test, calling a no-op function instead of RWMutex methods, displays no such degradation: the problem does not appear to be due to runtime scheduling overhead.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2016

@josharian

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2016

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2016

It may be difficult to apply the algorithm described in that paper to our existing sync.RWMutex type. The algorithm requires an association between the read-lock operation and the read-unlock operation. It can be implemented by having read-lock/read-unlock always occur on the same thread or goroutine, or by having the read-lock operation return a pointer that is passed to the read-unlock operation. Basically the algorithm builds a tree to avoid contention, and requires each read-lock/read-unlock pair to operate on the same node of the tree.

It would be feasible to implement the algorithm as part of a new type, in which the read-lock operation returned a pointer to be passed to the read-unlock operation. I think that new type could be implemented entirely in terms of sync/atomic functions, sync.Mutex, and sync.Cond. That is, it doesn't seem to require any special relationship with the runtime package.

@dvyukov

This comment has been minimized.

Copy link
Member

commented Nov 18, 2016

Locking per-P slot may be enough and is much simpler:
https://codereview.appspot.com/4850045/diff2/1:3001/src/pkg/co/distributedrwmutex.go

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2016

What happens when a goroutine moves to a different P between read-lock and read-unlock?

@dvyukov

This comment has been minimized.

Copy link
Member

commented Nov 18, 2016

RLock must return a proxy object to unlock, that object must hold the locked P index.

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Nov 18, 2016

The existing RWMutex API allows the RLock call to occur on a different goroutine from the RUnlock call. We can certainly assume that most RLock / RUnlock pairs occur on the same goroutine (and optimize for that case), but I think there needs to be a slow-path fallback for the general case.

(For example, you could envision an algorithm that attempts to unlock the slot for the current P, then falls back to a linear scan if the current P's slot wasn't already locked.)

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Nov 18, 2016

At any rate: general application code can work around the problem (in part) by using per-goroutine or per-goroutine-pool caches rather than global caches shared throughout the process.

The bigger issue is that sync.RWMutex is used fairly extensively within the standard library for package-level locks (the various caches in reflect, http.statusMu, json.encoderCache, mime.mimeLock, etc.), so it's easy for programs to fall into contention traps and hard to apply workarounds without avoiding large portions of the standard library. For those use-cases, it might actually be feasible to switch to something with a different API (such as having RLock return an unlocker).

@dvyukov

This comment has been minimized.

Copy link
Member

commented Nov 18, 2016

For these cases in std lib atomic.Value is much better fit. It is already used in json, gob and http. atomic.Value is perfectly scalable and virtually zero overhead for readers.

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Nov 18, 2016

I agree in general, but it's not obvious to me how one could use atomic.Value to guard lookups in a map acting as a cache. (It's perfect for maps which do not change, but how would you add new entries to the caches with that approach?)

@dvyukov

This comment has been minimized.

Copy link
Member

commented Nov 18, 2016

What kind of cache do you mean?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2016

E.g., the one in reflect.ptrTo.

@dvyukov

This comment has been minimized.

Copy link
Member

commented Nov 18, 2016

See e.g. encoding/json/encode.go:cachedTypeFields

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Nov 18, 2016

Hmm... that trades a higher allocation rate (and O(N) insertion cost) in exchange for getting the lock out of the reader path. (It basically pushes the "read lock" out to the garbage collector.)

And since most of these maps are insert-only (never deleted from), you can at least suspect that the O(N) insert won't be a huge drag: if there were many inserts, the maps would end up enormously large.

It would be interesting to see whether the latency tradeoff favors the RWMutex overhead or the O(N) insert overhead for more of the standard library.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 18, 2016

@dvyukov Thanks. For the reflect.ptrTo case I wrote it up as https://golang.org/cl/33411. It needs some realistic benchmarks--microbenchmarks won't prove anything one way or another.

@quentinmit quentinmit added the NeedsFix label Nov 18, 2016

@quentinmit quentinmit added this to the Go1.8Maybe milestone Nov 18, 2016

@josharian

This comment has been minimized.

Copy link
Contributor

commented Nov 19, 2016

It would be feasible to implement the algorithm as part of a new type, in which the read-lock operation returned a pointer to be passed to the read-unlock operation. I think that new type could be implemented entirely in terms of sync/atomic functions, sync.Mutex, and sync.Cond.

Indeed. Seems like it might be worth an experiment. If nothing else, might end up being useful at go4.org.

For these cases in std lib atomic.Value is much better fit.

Not always. atomic.Value can end up being a lot more code and complication. See CL 2641 for a worked example. For low level performance critical things like reflect, I'm all for atomic.Value, but much of the rest of the standard library, it'd be nice to fix the scalability of RWMutex (or have a comparably easy to use alternative).

@dvyukov

This comment has been minimized.

Copy link
Member

commented Nov 21, 2016

Note that in most of these cases insertions happen very, very infrequently. Only during server warmup when it receives a first request of a new type or something. While reads happen all the time. Also, no matter how scalable RWMutex is, it still blocks all readers during updates increasing latency and causing large overheads for blocking/unblocking.

For the reflect.ptrTo case I wrote it up as https://golang.org/cl/33411. It needs some realistic benchmarks--microbenchmarks won't prove anything one way or another.

Just benchmarked it on realistic benchmarks in my head. It is good :)

@dvyukov

This comment has been minimized.

Copy link
Member

commented Nov 21, 2016

See CL 2641 for a worked example.

I would not say that it is radically more code and complication. Provided that one does it right the first time, rather than do it non-scalable first and then refactor everything.

@rsc rsc modified the milestones: Go1.9Early, Go1.8Maybe Nov 21, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Nov 21, 2016

CL https://golang.org/cl/33411 mentions this issue.

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Dec 1, 2016

https://go-review.googlesource.com/#/c/33852/ has a draft for a more general API for maps of the sort used in the standard library; should I send that for review? (I'd put it in the x/sync repo for now so we can do some practical experiments.)

@davidlazar

This comment has been minimized.

Copy link
Member

commented Dec 2, 2016

@jonhoo built a more scalable RWMutex here: https://github.com/jonhoo/drwmutex/

@minux

This comment has been minimized.

Copy link
Member

commented Dec 3, 2016

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Dec 3, 2016

@minux That would be easy enough to fix by using runtime.NumCPU() instead.

@minux

This comment has been minimized.

Copy link
Member

commented Dec 3, 2016

@jonhoo

This comment has been minimized.

Copy link

commented Dec 3, 2016

@minux I did at some point have a benchmark running on a modified version of Go that used P instead of CPUID. Unfortunately, I can't find that code any more, but from memory it got strictly worse performance than the CPUID-based solution. The situation could of course be different now though.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 27, 2017

CL https://golang.org/cl/41930 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 27, 2017

CL https://golang.org/cl/41931 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 27, 2017

CL https://golang.org/cl/41990 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 27, 2017

CL https://golang.org/cl/41991 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 28, 2017
encoding/xml: replace tinfoMap RWMutex with sync.Map
This simplifies the code a bit and provides a modest speedup for
Marshal with many CPUs.

updates #17973
updates #18177

name          old time/op    new time/op    delta
Marshal         15.8µs ± 1%    15.9µs ± 1%   +0.67%  (p=0.021 n=8+7)
Marshal-6       5.76µs ±11%    5.17µs ± 2%  -10.36%  (p=0.002 n=8+8)
Marshal-48      9.88µs ± 5%    7.31µs ± 6%  -26.04%  (p=0.000 n=8+8)
Unmarshal       44.7µs ± 3%    45.1µs ± 5%     ~     (p=0.645 n=8+8)
Unmarshal-6     12.1µs ± 7%    11.8µs ± 8%     ~     (p=0.442 n=8+8)
Unmarshal-48    18.7µs ± 3%    18.2µs ± 4%     ~     (p=0.054 n=7+8)

name          old alloc/op   new alloc/op   delta
Marshal         5.78kB ± 0%    5.78kB ± 0%     ~     (all equal)
Marshal-6       5.78kB ± 0%    5.78kB ± 0%     ~     (all equal)
Marshal-48      5.78kB ± 0%    5.78kB ± 0%     ~     (all equal)
Unmarshal       8.58kB ± 0%    8.58kB ± 0%     ~     (all equal)
Unmarshal-6     8.58kB ± 0%    8.58kB ± 0%     ~     (all equal)
Unmarshal-48    8.58kB ± 0%    8.58kB ± 0%     ~     (p=1.000 n=8+8)

name          old allocs/op  new allocs/op  delta
Marshal           23.0 ± 0%      23.0 ± 0%     ~     (all equal)
Marshal-6         23.0 ± 0%      23.0 ± 0%     ~     (all equal)
Marshal-48        23.0 ± 0%      23.0 ± 0%     ~     (all equal)
Unmarshal          189 ± 0%       189 ± 0%     ~     (all equal)
Unmarshal-6        189 ± 0%       189 ± 0%     ~     (all equal)
Unmarshal-48       189 ± 0%       189 ± 0%     ~     (all equal)

https://perf.golang.org/search?q=upload:20170427.5

Change-Id: I4ee95a99540d3e4e47e056fff18357efd2cd340a
Reviewed-on: https://go-review.googlesource.com/41991
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Apr 28, 2017

CL https://golang.org/cl/42110 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 28, 2017
encoding/json: replace encoderCache RWMutex with a sync.Map
This provides a moderate speedup for encoding when using many CPU cores.

name                    old time/op    new time/op    delta
CodeEncoder               14.1ms ±10%    13.5ms ± 4%      ~     (p=0.867 n=8+7)
CodeEncoder-6             2.58ms ± 8%    2.72ms ± 6%      ~     (p=0.065 n=8+8)
CodeEncoder-48             629µs ± 1%     629µs ± 1%      ~     (p=0.867 n=8+7)
CodeMarshal               14.9ms ± 5%    14.9ms ± 5%      ~     (p=0.721 n=8+8)
CodeMarshal-6             3.28ms ±11%    3.24ms ±12%      ~     (p=0.798 n=8+8)
CodeMarshal-48             739µs ± 1%     745µs ± 2%      ~     (p=0.328 n=8+8)
CodeDecoder               49.7ms ± 4%    49.2ms ± 4%      ~     (p=0.463 n=7+8)
CodeDecoder-6             10.1ms ± 8%    10.4ms ± 3%      ~     (p=0.232 n=7+8)
CodeDecoder-48            2.60ms ± 3%    2.61ms ± 2%      ~     (p=1.000 n=8+8)
DecoderStream              352ns ± 5%     344ns ± 4%      ~     (p=0.077 n=8+8)
DecoderStream-6            485ns ± 8%     503ns ± 6%      ~     (p=0.123 n=8+8)
DecoderStream-48           522ns ± 7%     520ns ± 5%      ~     (p=0.959 n=8+8)
CodeUnmarshal             52.2ms ± 5%    54.4ms ±18%      ~     (p=0.955 n=7+8)
CodeUnmarshal-6           12.4ms ± 6%    12.3ms ± 6%      ~     (p=0.878 n=8+8)
CodeUnmarshal-48          3.46ms ± 7%    3.40ms ± 9%      ~     (p=0.442 n=8+8)
CodeUnmarshalReuse        48.9ms ± 6%    50.3ms ± 7%      ~     (p=0.279 n=8+8)
CodeUnmarshalReuse-6      10.3ms ±11%    10.3ms ±10%      ~     (p=0.959 n=8+8)
CodeUnmarshalReuse-48     2.68ms ± 3%    2.67ms ± 4%      ~     (p=0.878 n=8+8)
UnmarshalString            476ns ± 7%     474ns ± 7%      ~     (p=0.644 n=8+8)
UnmarshalString-6          164ns ± 9%     160ns ±10%      ~     (p=0.556 n=8+8)
UnmarshalString-48         181ns ± 0%     177ns ± 2%    -2.36%  (p=0.001 n=7+7)
UnmarshalFloat64           414ns ± 4%     418ns ± 4%      ~     (p=0.382 n=8+8)
UnmarshalFloat64-6         147ns ± 9%     143ns ±16%      ~     (p=0.457 n=8+8)
UnmarshalFloat64-48        176ns ± 2%     174ns ± 2%      ~     (p=0.118 n=8+8)
UnmarshalInt64             369ns ± 4%     354ns ± 1%    -3.85%  (p=0.005 n=8+7)
UnmarshalInt64-6           132ns ±11%     132ns ±10%      ~     (p=0.982 n=8+8)
UnmarshalInt64-48          177ns ± 3%     174ns ± 2%    -1.84%  (p=0.028 n=8+7)
Issue10335                 540ns ± 5%     535ns ± 0%      ~     (p=0.330 n=7+7)
Issue10335-6               159ns ± 8%     164ns ± 8%      ~     (p=0.246 n=8+8)
Issue10335-48              186ns ± 1%     182ns ± 2%    -1.89%  (p=0.010 n=8+8)
Unmapped                  1.74µs ± 2%    1.76µs ± 6%      ~     (p=0.181 n=6+8)
Unmapped-6                 414ns ± 5%     402ns ±10%      ~     (p=0.244 n=7+8)
Unmapped-48                226ns ± 2%     224ns ± 2%      ~     (p=0.144 n=7+8)
NumberIsValid             20.1ns ± 4%    19.7ns ± 3%      ~     (p=0.204 n=8+8)
NumberIsValid-6           20.4ns ± 8%    22.2ns ±16%      ~     (p=0.129 n=7+8)
NumberIsValid-48          23.1ns ±12%    23.8ns ± 8%      ~     (p=0.104 n=8+8)
NumberIsValidRegexp        629ns ± 5%     622ns ± 0%      ~     (p=0.148 n=7+7)
NumberIsValidRegexp-6      757ns ± 2%     725ns ±14%      ~     (p=0.351 n=8+7)
NumberIsValidRegexp-48     757ns ± 2%     723ns ±13%      ~     (p=0.521 n=8+8)
SkipValue                 13.2ms ± 9%    13.3ms ± 1%      ~     (p=0.130 n=8+8)
SkipValue-6               15.1ms ±10%    14.8ms ± 2%      ~     (p=0.397 n=7+8)
SkipValue-48              13.9ms ±12%    14.3ms ± 1%      ~     (p=0.694 n=8+7)
EncoderEncode              433ns ± 4%     410ns ± 3%    -5.48%  (p=0.001 n=8+8)
EncoderEncode-6            221ns ±15%      75ns ± 5%   -66.15%  (p=0.000 n=7+8)
EncoderEncode-48           161ns ± 4%      19ns ± 7%   -88.29%  (p=0.000 n=7+8)

name                    old speed      new speed      delta
CodeEncoder              139MB/s ±10%   144MB/s ± 4%      ~     (p=0.844 n=8+7)
CodeEncoder-6            756MB/s ± 8%   714MB/s ± 6%      ~     (p=0.065 n=8+8)
CodeEncoder-48          3.08GB/s ± 1%  3.09GB/s ± 1%      ~     (p=0.867 n=8+7)
CodeMarshal              130MB/s ± 5%   130MB/s ± 5%      ~     (p=0.721 n=8+8)
CodeMarshal-6            594MB/s ±10%   601MB/s ±11%      ~     (p=0.798 n=8+8)
CodeMarshal-48          2.62GB/s ± 1%  2.60GB/s ± 2%      ~     (p=0.328 n=8+8)
CodeDecoder             39.0MB/s ± 4%  39.5MB/s ± 4%      ~     (p=0.463 n=7+8)
CodeDecoder-6            189MB/s ±13%   187MB/s ± 3%      ~     (p=0.505 n=8+8)
CodeDecoder-48           746MB/s ± 2%   745MB/s ± 2%      ~     (p=1.000 n=8+8)
CodeUnmarshal           37.2MB/s ± 5%  35.9MB/s ±16%      ~     (p=0.955 n=7+8)
CodeUnmarshal-6          157MB/s ± 6%   158MB/s ± 6%      ~     (p=0.878 n=8+8)
CodeUnmarshal-48         561MB/s ± 7%   572MB/s ±10%      ~     (p=0.442 n=8+8)
SkipValue                141MB/s ±10%   139MB/s ± 1%      ~     (p=0.130 n=8+8)
SkipValue-6              131MB/s ± 3%   133MB/s ± 2%      ~     (p=0.662 n=6+8)
SkipValue-48             138MB/s ±11%   132MB/s ± 1%      ~     (p=0.281 n=8+7)

name                    old alloc/op   new alloc/op   delta
CodeEncoder               45.9kB ± 0%    45.9kB ± 0%    -0.02%  (p=0.002 n=7+8)
CodeEncoder-6             55.1kB ± 0%    55.1kB ± 0%    -0.01%  (p=0.002 n=7+8)
CodeEncoder-48             110kB ± 0%     110kB ± 0%    -0.00%  (p=0.030 n=7+8)
CodeMarshal               4.59MB ± 0%    4.59MB ± 0%    -0.00%  (p=0.000 n=8+8)
CodeMarshal-6             4.59MB ± 0%    4.59MB ± 0%    -0.00%  (p=0.000 n=8+8)
CodeMarshal-48            4.59MB ± 0%    4.59MB ± 0%    -0.00%  (p=0.001 n=7+8)
CodeDecoder               2.28MB ± 5%    2.21MB ± 0%      ~     (p=0.257 n=8+7)
CodeDecoder-6             2.43MB ±11%    2.51MB ± 0%      ~     (p=0.473 n=8+8)
CodeDecoder-48            2.93MB ± 0%    2.93MB ± 0%      ~     (p=0.554 n=7+8)
DecoderStream              16.0B ± 0%     16.0B ± 0%      ~     (all equal)
DecoderStream-6            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
DecoderStream-48           16.0B ± 0%     16.0B ± 0%      ~     (all equal)
CodeUnmarshal             3.28MB ± 0%    3.28MB ± 0%      ~     (p=1.000 n=7+7)
CodeUnmarshal-6           3.28MB ± 0%    3.28MB ± 0%      ~     (p=0.593 n=8+8)
CodeUnmarshal-48          3.28MB ± 0%    3.28MB ± 0%      ~     (p=0.670 n=8+8)
CodeUnmarshalReuse        1.87MB ± 0%    1.88MB ± 1%    +0.48%  (p=0.011 n=7+8)
CodeUnmarshalReuse-6      1.90MB ± 1%    1.90MB ± 1%      ~     (p=0.589 n=8+8)
CodeUnmarshalReuse-48     1.96MB ± 0%    1.96MB ± 0%    +0.00%  (p=0.002 n=7+8)
UnmarshalString             304B ± 0%      304B ± 0%      ~     (all equal)
UnmarshalString-6           304B ± 0%      304B ± 0%      ~     (all equal)
UnmarshalString-48          304B ± 0%      304B ± 0%      ~     (all equal)
UnmarshalFloat64            292B ± 0%      292B ± 0%      ~     (all equal)
UnmarshalFloat64-6          292B ± 0%      292B ± 0%      ~     (all equal)
UnmarshalFloat64-48         292B ± 0%      292B ± 0%      ~     (all equal)
UnmarshalInt64              289B ± 0%      289B ± 0%      ~     (all equal)
UnmarshalInt64-6            289B ± 0%      289B ± 0%      ~     (all equal)
UnmarshalInt64-48           289B ± 0%      289B ± 0%      ~     (all equal)
Issue10335                  312B ± 0%      312B ± 0%      ~     (all equal)
Issue10335-6                312B ± 0%      312B ± 0%      ~     (all equal)
Issue10335-48               312B ± 0%      312B ± 0%      ~     (all equal)
Unmapped                    344B ± 0%      344B ± 0%      ~     (all equal)
Unmapped-6                  344B ± 0%      344B ± 0%      ~     (all equal)
Unmapped-48                 344B ± 0%      344B ± 0%      ~     (all equal)
NumberIsValid              0.00B          0.00B           ~     (all equal)
NumberIsValid-6            0.00B          0.00B           ~     (all equal)
NumberIsValid-48           0.00B          0.00B           ~     (all equal)
NumberIsValidRegexp        0.00B          0.00B           ~     (all equal)
NumberIsValidRegexp-6      0.00B          0.00B           ~     (all equal)
NumberIsValidRegexp-48     0.00B          0.00B           ~     (all equal)
SkipValue                  0.00B          0.00B           ~     (all equal)
SkipValue-6                0.00B          0.00B           ~     (all equal)
SkipValue-48              15.0B ±167%      0.0B           ~     (p=0.200 n=8+8)
EncoderEncode              8.00B ± 0%     0.00B       -100.00%  (p=0.000 n=8+8)
EncoderEncode-6            8.00B ± 0%     0.00B       -100.00%  (p=0.000 n=8+8)
EncoderEncode-48           8.00B ± 0%     0.00B       -100.00%  (p=0.000 n=8+8)

name                    old allocs/op  new allocs/op  delta
CodeEncoder                 1.00 ± 0%      0.00       -100.00%  (p=0.000 n=8+8)
CodeEncoder-6               1.00 ± 0%      0.00       -100.00%  (p=0.000 n=8+8)
CodeEncoder-48              1.00 ± 0%      0.00       -100.00%  (p=0.000 n=8+8)
CodeMarshal                 17.0 ± 0%      16.0 ± 0%    -5.88%  (p=0.000 n=8+8)
CodeMarshal-6               17.0 ± 0%      16.0 ± 0%    -5.88%  (p=0.000 n=8+8)
CodeMarshal-48              17.0 ± 0%      16.0 ± 0%    -5.88%  (p=0.000 n=8+8)
CodeDecoder                89.6k ± 0%     89.5k ± 0%      ~     (p=0.154 n=8+7)
CodeDecoder-6              89.8k ± 0%     89.9k ± 0%      ~     (p=0.467 n=8+8)
CodeDecoder-48             90.5k ± 0%     90.5k ± 0%      ~     (p=0.533 n=8+7)
DecoderStream               2.00 ± 0%      2.00 ± 0%      ~     (all equal)
DecoderStream-6             2.00 ± 0%      2.00 ± 0%      ~     (all equal)
DecoderStream-48            2.00 ± 0%      2.00 ± 0%      ~     (all equal)
CodeUnmarshal               105k ± 0%      105k ± 0%      ~     (all equal)
CodeUnmarshal-6             105k ± 0%      105k ± 0%      ~     (all equal)
CodeUnmarshal-48            105k ± 0%      105k ± 0%      ~     (all equal)
CodeUnmarshalReuse         89.5k ± 0%     89.6k ± 0%      ~     (p=0.246 n=7+8)
CodeUnmarshalReuse-6       89.8k ± 0%     89.8k ± 0%      ~     (p=1.000 n=8+8)
CodeUnmarshalReuse-48      90.5k ± 0%     90.5k ± 0%      ~     (all equal)
UnmarshalString             2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalString-6           2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalString-48          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalFloat64            2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalFloat64-6          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalFloat64-48         2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalInt64              2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalInt64-6            2.00 ± 0%      2.00 ± 0%      ~     (all equal)
UnmarshalInt64-48           2.00 ± 0%      2.00 ± 0%      ~     (all equal)
Issue10335                  3.00 ± 0%      3.00 ± 0%      ~     (all equal)
Issue10335-6                3.00 ± 0%      3.00 ± 0%      ~     (all equal)
Issue10335-48               3.00 ± 0%      3.00 ± 0%      ~     (all equal)
Unmapped                    4.00 ± 0%      4.00 ± 0%      ~     (all equal)
Unmapped-6                  4.00 ± 0%      4.00 ± 0%      ~     (all equal)
Unmapped-48                 4.00 ± 0%      4.00 ± 0%      ~     (all equal)
NumberIsValid               0.00           0.00           ~     (all equal)
NumberIsValid-6             0.00           0.00           ~     (all equal)
NumberIsValid-48            0.00           0.00           ~     (all equal)
NumberIsValidRegexp         0.00           0.00           ~     (all equal)
NumberIsValidRegexp-6       0.00           0.00           ~     (all equal)
NumberIsValidRegexp-48      0.00           0.00           ~     (all equal)
SkipValue                   0.00           0.00           ~     (all equal)
SkipValue-6                 0.00           0.00           ~     (all equal)
SkipValue-48                0.00           0.00           ~     (all equal)
EncoderEncode               1.00 ± 0%      0.00       -100.00%  (p=0.000 n=8+8)
EncoderEncode-6             1.00 ± 0%      0.00       -100.00%  (p=0.000 n=8+8)
EncoderEncode-48            1.00 ± 0%      0.00       -100.00%  (p=0.000 n=8+8)

https://perf.golang.org/search?q=upload:20170427.2

updates #17973
updates #18177

Change-Id: I5881c7a2bfad1766e6aa3444bb630883e0be467b
Reviewed-on: https://go-review.googlesource.com/41931
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
gopherbot pushed a commit that referenced this issue Apr 28, 2017
mime: use sync.Map instead of RWMutex for type lookups
This provides a significant speedup for TypeByExtension and
ExtensionsByType when using many CPU cores.

updates #17973
updates #18177

name                                          old time/op    new time/op    delta
QEncodeWord                                      526ns ± 3%     525ns ± 3%     ~     (p=0.990 n=15+28)
QEncodeWord-6                                    945ns ± 7%     913ns ±20%     ~     (p=0.220 n=14+28)
QEncodeWord-48                                  1.02µs ± 2%    1.00µs ± 6%   -2.22%  (p=0.036 n=13+27)
QDecodeWord                                      311ns ±18%     323ns ±20%     ~     (p=0.107 n=16+28)
QDecodeWord-6                                    595ns ±12%     612ns ±11%     ~     (p=0.093 n=15+27)
QDecodeWord-48                                   592ns ± 6%     606ns ± 8%   +2.39%  (p=0.045 n=16+26)
QDecodeHeader                                    389ns ± 4%     394ns ± 8%     ~     (p=0.161 n=12+26)
QDecodeHeader-6                                  685ns ±12%     674ns ±20%     ~     (p=0.773 n=14+27)
QDecodeHeader-48                                 658ns ±13%     669ns ±14%     ~     (p=0.457 n=16+28)
TypeByExtension/.html                           77.4ns ±15%    55.5ns ±13%  -28.35%  (p=0.000 n=8+8)
TypeByExtension/.html-6                          263ns ± 9%      10ns ±21%  -96.29%  (p=0.000 n=8+8)
TypeByExtension/.html-48                         175ns ± 5%       2ns ±16%  -98.88%  (p=0.000 n=8+8)
TypeByExtension/.HTML                            113ns ± 6%      97ns ± 6%  -14.37%  (p=0.000 n=8+8)
TypeByExtension/.HTML-6                          273ns ± 7%      17ns ± 4%  -93.93%  (p=0.000 n=7+8)
TypeByExtension/.HTML-48                         175ns ± 4%       4ns ± 4%  -97.73%  (p=0.000 n=8+8)
TypeByExtension/.unused                          116ns ± 4%      90ns ± 4%  -22.89%  (p=0.001 n=7+7)
TypeByExtension/.unused-6                        262ns ± 5%      15ns ± 4%  -94.17%  (p=0.000 n=8+8)
TypeByExtension/.unused-48                       176ns ± 4%       3ns ±10%  -98.10%  (p=0.000 n=8+8)
ExtensionsByType/text/html                       630ns ± 5%     522ns ± 5%  -17.19%  (p=0.000 n=8+7)
ExtensionsByType/text/html-6                     314ns ±20%     136ns ± 6%  -56.80%  (p=0.000 n=8+8)
ExtensionsByType/text/html-48                    298ns ± 4%     104ns ± 6%  -65.06%  (p=0.000 n=8+8)
ExtensionsByType/text/html;_charset=utf-8       1.12µs ± 3%    1.05µs ± 7%   -6.19%  (p=0.004 n=8+7)
ExtensionsByType/text/html;_charset=utf-8-6      402ns ±11%     307ns ± 4%  -23.77%  (p=0.000 n=8+8)
ExtensionsByType/text/html;_charset=utf-8-48     422ns ± 3%     309ns ± 4%  -26.86%  (p=0.000 n=8+8)
ExtensionsByType/application/octet-stream        810ns ± 2%     747ns ± 5%   -7.74%  (p=0.000 n=8+8)
ExtensionsByType/application/octet-stream-6      289ns ± 9%     185ns ± 8%  -36.15%  (p=0.000 n=7+8)
ExtensionsByType/application/octet-stream-48     267ns ± 6%      94ns ± 2%  -64.91%  (p=0.000 n=8+7)

name                                          old alloc/op   new alloc/op   delta
QEncodeWord                                      48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QEncodeWord-6                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QEncodeWord-48                                   48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QDecodeWord                                      48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QDecodeWord-6                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QDecodeWord-48                                   48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QDecodeHeader                                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QDecodeHeader-6                                  48.0B ± 0%     48.0B ± 0%     ~     (all equal)
QDecodeHeader-48                                 48.0B ± 0%     48.0B ± 0%     ~     (all equal)
TypeByExtension/.html                            0.00B          0.00B          ~     (all equal)
TypeByExtension/.html-6                          0.00B          0.00B          ~     (all equal)
TypeByExtension/.html-48                         0.00B          0.00B          ~     (all equal)
TypeByExtension/.HTML                            0.00B          0.00B          ~     (all equal)
TypeByExtension/.HTML-6                          0.00B          0.00B          ~     (all equal)
TypeByExtension/.HTML-48                         0.00B          0.00B          ~     (all equal)
TypeByExtension/.unused                          0.00B          0.00B          ~     (all equal)
TypeByExtension/.unused-6                        0.00B          0.00B          ~     (all equal)
TypeByExtension/.unused-48                       0.00B          0.00B          ~     (all equal)
ExtensionsByType/text/html                        192B ± 0%      176B ± 0%   -8.33%  (p=0.000 n=8+8)
ExtensionsByType/text/html-6                      192B ± 0%      176B ± 0%   -8.33%  (p=0.000 n=8+8)
ExtensionsByType/text/html-48                     192B ± 0%      176B ± 0%   -8.33%  (p=0.000 n=8+8)
ExtensionsByType/text/html;_charset=utf-8         480B ± 0%      464B ± 0%   -3.33%  (p=0.000 n=8+8)
ExtensionsByType/text/html;_charset=utf-8-6       480B ± 0%      464B ± 0%   -3.33%  (p=0.000 n=8+8)
ExtensionsByType/text/html;_charset=utf-8-48      480B ± 0%      464B ± 0%   -3.33%  (p=0.000 n=8+8)
ExtensionsByType/application/octet-stream         160B ± 0%      160B ± 0%     ~     (all equal)
ExtensionsByType/application/octet-stream-6       160B ± 0%      160B ± 0%     ~     (all equal)
ExtensionsByType/application/octet-stream-48      160B ± 0%      160B ± 0%     ~     (all equal)

name                                          old allocs/op  new allocs/op  delta
QEncodeWord                                       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
QEncodeWord-6                                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
QEncodeWord-48                                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
QDecodeWord                                       2.00 ± 0%      2.00 ± 0%     ~     (all equal)
QDecodeWord-6                                     2.00 ± 0%      2.00 ± 0%     ~     (all equal)
QDecodeWord-48                                    2.00 ± 0%      2.00 ± 0%     ~     (all equal)
QDecodeHeader                                     2.00 ± 0%      2.00 ± 0%     ~     (all equal)
QDecodeHeader-6                                   2.00 ± 0%      2.00 ± 0%     ~     (all equal)
QDecodeHeader-48                                  2.00 ± 0%      2.00 ± 0%     ~     (all equal)
TypeByExtension/.html                             0.00           0.00          ~     (all equal)
TypeByExtension/.html-6                           0.00           0.00          ~     (all equal)
TypeByExtension/.html-48                          0.00           0.00          ~     (all equal)
TypeByExtension/.HTML                             0.00           0.00          ~     (all equal)
TypeByExtension/.HTML-6                           0.00           0.00          ~     (all equal)
TypeByExtension/.HTML-48                          0.00           0.00          ~     (all equal)
TypeByExtension/.unused                           0.00           0.00          ~     (all equal)
TypeByExtension/.unused-6                         0.00           0.00          ~     (all equal)
TypeByExtension/.unused-48                        0.00           0.00          ~     (all equal)
ExtensionsByType/text/html                        3.00 ± 0%      3.00 ± 0%     ~     (all equal)
ExtensionsByType/text/html-6                      3.00 ± 0%      3.00 ± 0%     ~     (all equal)
ExtensionsByType/text/html-48                     3.00 ± 0%      3.00 ± 0%     ~     (all equal)
ExtensionsByType/text/html;_charset=utf-8         4.00 ± 0%      4.00 ± 0%     ~     (all equal)
ExtensionsByType/text/html;_charset=utf-8-6       4.00 ± 0%      4.00 ± 0%     ~     (all equal)
ExtensionsByType/text/html;_charset=utf-8-48      4.00 ± 0%      4.00 ± 0%     ~     (all equal)
ExtensionsByType/application/octet-stream         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
ExtensionsByType/application/octet-stream-6       2.00 ± 0%      2.00 ± 0%     ~     (all equal)
ExtensionsByType/application/octet-stream-48      2.00 ± 0%      2.00 ± 0%     ~     (all equal)

https://perf.golang.org/search?q=upload:20170427.4

Change-Id: I35438be087ad6eb3d5da9119b395723ea5babaf6
Reviewed-on: https://go-review.googlesource.com/41990
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
gopherbot pushed a commit that referenced this issue Apr 28, 2017
expvar: replace RWMutex usage with sync.Map and atomics
Int and Float already used atomics.

When many goroutines on many CPUs concurrently update a StringSet or a
Map with different keys per goroutine, this change results in dramatic
steady-state speedups.

This change does add some overhead for single-CPU and ephemeral maps.
I believe that is mostly due to an increase in allocations per call
(to pack the map keys and values into interface{} values that may
escape into the heap). With better inlining and/or escape analysis,
the single-CPU penalty may decline somewhat.

There are still two RWMutexes in the package: one for the keys in the
global "vars" map, and one for the keys in individual Map variables.

Those RWMutexes could also be eliminated, but avoiding excessive
allocations when adding new keys would require care. The remaining
RWMutexes are only acquired in Do functions, which I believe are not
typically on the fast path.

updates #17973
updates #18177

name             old time/op    new time/op    delta
StringSet          65.9ns ± 8%    55.7ns ± 1%   -15.46%  (p=0.000 n=8+7)
StringSet-6         416ns ±22%     127ns ±19%   -69.37%  (p=0.000 n=8+8)
StringSet-48        309ns ± 8%      94ns ± 3%   -69.43%  (p=0.001 n=7+7)

name             old alloc/op   new alloc/op   delta
StringSet           0.00B         16.00B ± 0%     +Inf%  (p=0.000 n=8+8)
StringSet-6         0.00B         16.00B ± 0%     +Inf%  (p=0.000 n=8+8)
StringSet-48        0.00B         16.00B ± 0%     +Inf%  (p=0.000 n=8+8)

name             old allocs/op  new allocs/op  delta
StringSet            0.00           1.00 ± 0%     +Inf%  (p=0.000 n=8+8)
StringSet-6          0.00           1.00 ± 0%     +Inf%  (p=0.000 n=8+8)
StringSet-48         0.00           1.00 ± 0%     +Inf%  (p=0.000 n=8+8)

https://perf.golang.org/search?q=upload:20170427.3

name                           old time/op    new time/op    delta
IntAdd                           5.64ns ± 3%    5.58ns ± 1%      ~     (p=0.185 n=8+8)
IntAdd-6                         18.6ns ±32%    21.4ns ±21%      ~     (p=0.078 n=8+8)
IntAdd-48                        19.6ns ±13%    20.6ns ±19%      ~     (p=0.702 n=8+8)
IntSet                           5.50ns ± 1%    5.48ns ± 0%      ~     (p=0.222 n=7+8)
IntSet-6                         18.5ns ±16%    20.4ns ±30%      ~     (p=0.314 n=8+8)
IntSet-48                        19.7ns ±12%    20.4ns ±16%      ~     (p=0.522 n=8+8)
FloatAdd                         14.5ns ± 1%    14.6ns ± 2%      ~     (p=0.237 n=7+8)
FloatAdd-6                       69.9ns ±13%    68.4ns ± 7%      ~     (p=0.557 n=7+7)
FloatAdd-48                       110ns ± 9%     109ns ± 6%      ~     (p=0.667 n=8+8)
FloatSet                         7.62ns ± 3%    7.64ns ± 5%      ~     (p=0.939 n=8+8)
FloatSet-6                       20.7ns ±22%    21.0ns ±23%      ~     (p=0.959 n=8+8)
FloatSet-48                      20.4ns ±24%    20.8ns ±19%      ~     (p=0.899 n=8+8)
MapSet                           88.1ns ±15%   200.9ns ± 7%  +128.11%  (p=0.000 n=8+8)
MapSet-6                          453ns ±12%     202ns ± 8%   -55.43%  (p=0.000 n=8+8)
MapSet-48                         432ns ±12%     240ns ±15%   -44.49%  (p=0.000 n=8+8)
MapSetDifferent                   349ns ± 1%     876ns ± 2%  +151.08%  (p=0.001 n=6+7)
MapSetDifferent-6                1.74µs ±32%    0.25µs ±17%   -85.71%  (p=0.000 n=8+8)
MapSetDifferent-48               1.77µs ±10%    0.14µs ± 2%   -91.84%  (p=0.000 n=8+8)
MapSetString                     88.1ns ± 7%   205.3ns ± 5%  +132.98%  (p=0.001 n=7+7)
MapSetString-6                    438ns ±30%     205ns ± 9%   -53.15%  (p=0.000 n=8+8)
MapSetString-48                   419ns ±14%     241ns ±15%   -42.39%  (p=0.000 n=8+8)
MapAddSame                        686ns ± 9%    1010ns ± 5%   +47.41%  (p=0.000 n=8+8)
MapAddSame-6                      238ns ±10%     300ns ±11%   +26.22%  (p=0.000 n=8+8)
MapAddSame-48                     366ns ± 4%     483ns ± 3%   +32.06%  (p=0.000 n=8+8)
MapAddDifferent                  1.96µs ± 4%    3.24µs ± 6%   +65.58%  (p=0.000 n=8+8)
MapAddDifferent-6                 553ns ± 3%     948ns ± 8%   +71.43%  (p=0.000 n=7+8)
MapAddDifferent-48                548ns ± 4%    1242ns ±10%  +126.81%  (p=0.000 n=8+8)
MapAddSameSteadyState            31.5ns ± 7%    41.7ns ± 6%   +32.61%  (p=0.000 n=8+8)
MapAddSameSteadyState-6           239ns ± 7%     101ns ±30%   -57.53%  (p=0.000 n=7+8)
MapAddSameSteadyState-48          152ns ± 4%      85ns ±13%   -43.84%  (p=0.000 n=8+7)
MapAddDifferentSteadyState        151ns ± 5%     177ns ± 1%   +17.32%  (p=0.001 n=8+6)
MapAddDifferentSteadyState-6      861ns ±15%      62ns ±23%   -92.85%  (p=0.000 n=8+8)
MapAddDifferentSteadyState-48     617ns ± 2%      20ns ±14%   -96.75%  (p=0.000 n=8+8)
RealworldExpvarUsage             4.33µs ± 4%    4.48µs ± 6%      ~     (p=0.336 n=8+7)
RealworldExpvarUsage-6           2.12µs ±20%    2.28µs ±10%      ~     (p=0.228 n=8+6)
RealworldExpvarUsage-48          1.23µs ±19%    1.36µs ±16%      ~     (p=0.152 n=7+8)

name                           old alloc/op   new alloc/op   delta
IntAdd                            0.00B          0.00B           ~     (all equal)
IntAdd-6                          0.00B          0.00B           ~     (all equal)
IntAdd-48                         0.00B          0.00B           ~     (all equal)
IntSet                            0.00B          0.00B           ~     (all equal)
IntSet-6                          0.00B          0.00B           ~     (all equal)
IntSet-48                         0.00B          0.00B           ~     (all equal)
FloatAdd                          0.00B          0.00B           ~     (all equal)
FloatAdd-6                        0.00B          0.00B           ~     (all equal)
FloatAdd-48                       0.00B          0.00B           ~     (all equal)
FloatSet                          0.00B          0.00B           ~     (all equal)
FloatSet-6                        0.00B          0.00B           ~     (all equal)
FloatSet-48                       0.00B          0.00B           ~     (all equal)
MapSet                            0.00B         48.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSet-6                          0.00B         48.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSet-48                         0.00B         48.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetDifferent                   0.00B        192.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetDifferent-6                 0.00B        192.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetDifferent-48                0.00B        192.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetString                      0.00B         48.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetString-6                    0.00B         48.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetString-48                   0.00B         48.00B ± 0%     +Inf%  (p=0.000 n=8+8)
MapAddSame                         456B ± 0%      480B ± 0%    +5.26%  (p=0.000 n=8+8)
MapAddSame-6                       456B ± 0%      480B ± 0%    +5.26%  (p=0.000 n=8+8)
MapAddSame-48                      456B ± 0%      480B ± 0%    +5.26%  (p=0.000 n=8+8)
MapAddDifferent                    672B ± 0%     1088B ± 0%   +61.90%  (p=0.000 n=8+8)
MapAddDifferent-6                  672B ± 0%     1088B ± 0%   +61.90%  (p=0.000 n=8+8)
MapAddDifferent-48                 672B ± 0%     1088B ± 0%   +61.90%  (p=0.000 n=8+8)
MapAddSameSteadyState             0.00B          0.00B           ~     (all equal)
MapAddSameSteadyState-6           0.00B          0.00B           ~     (all equal)
MapAddSameSteadyState-48          0.00B          0.00B           ~     (all equal)
MapAddDifferentSteadyState        0.00B          0.00B           ~     (all equal)
MapAddDifferentSteadyState-6      0.00B          0.00B           ~     (all equal)
MapAddDifferentSteadyState-48     0.00B          0.00B           ~     (all equal)
RealworldExpvarUsage              0.00B          0.00B           ~     (all equal)
RealworldExpvarUsage-6            0.00B          0.00B           ~     (all equal)
RealworldExpvarUsage-48           0.00B          0.00B           ~     (all equal)

name                           old allocs/op  new allocs/op  delta
IntAdd                             0.00           0.00           ~     (all equal)
IntAdd-6                           0.00           0.00           ~     (all equal)
IntAdd-48                          0.00           0.00           ~     (all equal)
IntSet                             0.00           0.00           ~     (all equal)
IntSet-6                           0.00           0.00           ~     (all equal)
IntSet-48                          0.00           0.00           ~     (all equal)
FloatAdd                           0.00           0.00           ~     (all equal)
FloatAdd-6                         0.00           0.00           ~     (all equal)
FloatAdd-48                        0.00           0.00           ~     (all equal)
FloatSet                           0.00           0.00           ~     (all equal)
FloatSet-6                         0.00           0.00           ~     (all equal)
FloatSet-48                        0.00           0.00           ~     (all equal)
MapSet                             0.00           3.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSet-6                           0.00           3.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSet-48                          0.00           3.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetDifferent                    0.00          12.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetDifferent-6                  0.00          12.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetDifferent-48                 0.00          12.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetString                       0.00           3.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetString-6                     0.00           3.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapSetString-48                    0.00           3.00 ± 0%     +Inf%  (p=0.000 n=8+8)
MapAddSame                         6.00 ± 0%     11.00 ± 0%   +83.33%  (p=0.000 n=8+8)
MapAddSame-6                       6.00 ± 0%     11.00 ± 0%   +83.33%  (p=0.000 n=8+8)
MapAddSame-48                      6.00 ± 0%     11.00 ± 0%   +83.33%  (p=0.000 n=8+8)
MapAddDifferent                    14.0 ± 0%      31.0 ± 0%  +121.43%  (p=0.000 n=8+8)
MapAddDifferent-6                  14.0 ± 0%      31.0 ± 0%  +121.43%  (p=0.000 n=8+8)
MapAddDifferent-48                 14.0 ± 0%      31.0 ± 0%  +121.43%  (p=0.000 n=8+8)
MapAddSameSteadyState              0.00           0.00           ~     (all equal)
MapAddSameSteadyState-6            0.00           0.00           ~     (all equal)
MapAddSameSteadyState-48           0.00           0.00           ~     (all equal)
MapAddDifferentSteadyState         0.00           0.00           ~     (all equal)
MapAddDifferentSteadyState-6       0.00           0.00           ~     (all equal)
MapAddDifferentSteadyState-48      0.00           0.00           ~     (all equal)
RealworldExpvarUsage               0.00           0.00           ~     (all equal)
RealworldExpvarUsage-6             0.00           0.00           ~     (all equal)
RealworldExpvarUsage-48            0.00           0.00           ~     (all equal)

https://perf.golang.org/search?q=upload:20170427.1

Change-Id: I388b2e8a3cadb84fc1418af8acfc27338f799273
Reviewed-on: https://go-review.googlesource.com/41930
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

Copy link

commented Apr 28, 2017

CL https://golang.org/cl/42113 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 29, 2017
archive/zip: replace RWMutex with sync.Map
This change replaces the compressors and decompressors maps with
instances of sync.Map, eliminating the need for Mutex locking in
NewReader and NewWriter.

The impact for encoding large payloads is miniscule, but as the
payload size decreases, the reduction in setup costs becomes
measurable.

updates #17973
updates #18177

name                        old time/op    new time/op    delta
CompressedZipGarbage          13.6ms ± 3%    13.8ms ± 4%    ~     (p=0.275 n=14+16)
CompressedZipGarbage-6        2.81ms ±10%    2.80ms ± 9%    ~     (p=0.616 n=16+16)
CompressedZipGarbage-48        606µs ± 4%     600µs ± 3%    ~     (p=0.110 n=16+15)
Zip64Test                     88.7ms ± 5%    87.5ms ± 5%    ~     (p=0.150 n=14+14)
Zip64Test-6                   88.6ms ± 8%    94.5ms ±13%    ~     (p=0.070 n=14+16)
Zip64Test-48                   102ms ±19%     101ms ±19%    ~     (p=0.599 n=16+15)
Zip64TestSizes/4096           21.7µs ±10%    23.0µs ± 2%    ~     (p=0.076 n=14+12)
Zip64TestSizes/4096-6         7.58µs ±13%    7.49µs ±18%    ~     (p=0.752 n=16+16)
Zip64TestSizes/4096-48        19.5µs ± 8%    18.0µs ± 4%  -7.74%  (p=0.000 n=16+15)
Zip64TestSizes/1048576        1.36ms ± 9%    1.40ms ± 8%  +2.79%  (p=0.029 n=24+25)
Zip64TestSizes/1048576-6       262µs ±11%     260µs ±10%    ~     (p=0.506 n=24+24)
Zip64TestSizes/1048576-48      120µs ± 7%     116µs ± 7%  -3.05%  (p=0.006 n=24+25)
Zip64TestSizes/67108864       86.8ms ± 6%    85.1ms ± 5%    ~     (p=0.149 n=14+17)
Zip64TestSizes/67108864-6     15.9ms ± 2%    16.1ms ± 6%    ~     (p=0.279 n=14+17)
Zip64TestSizes/67108864-48    4.51ms ± 5%    4.53ms ± 4%    ~     (p=0.766 n=15+17)

name                        old alloc/op   new alloc/op   delta
CompressedZipGarbage          5.63kB ± 0%    5.63kB ± 0%    ~     (all equal)
CompressedZipGarbage-6        15.4kB ± 0%    15.4kB ± 0%    ~     (all equal)
CompressedZipGarbage-48       25.5kB ± 3%    25.6kB ± 2%    ~     (p=0.450 n=16+16)
Zip64Test                     20.0kB ± 0%    20.0kB ± 0%    ~     (p=0.060 n=16+13)
Zip64Test-6                   20.0kB ± 0%    20.0kB ± 0%    ~     (p=0.136 n=16+14)
Zip64Test-48                  20.0kB ± 0%    20.0kB ± 0%    ~     (p=1.000 n=16+16)
Zip64TestSizes/4096           20.0kB ± 0%    20.0kB ± 0%    ~     (all equal)
Zip64TestSizes/4096-6         20.0kB ± 0%    20.0kB ± 0%    ~     (all equal)
Zip64TestSizes/4096-48        20.0kB ± 0%    20.0kB ± 0%  -0.00%  (p=0.002 n=16+13)
Zip64TestSizes/1048576        20.0kB ± 0%    20.0kB ± 0%    ~     (all equal)
Zip64TestSizes/1048576-6      20.0kB ± 0%    20.0kB ± 0%    ~     (all equal)
Zip64TestSizes/1048576-48     20.1kB ± 0%    20.1kB ± 0%    ~     (p=0.775 n=24+25)
Zip64TestSizes/67108864       20.0kB ± 0%    20.0kB ± 0%    ~     (all equal)
Zip64TestSizes/67108864-6     20.0kB ± 0%    20.0kB ± 0%    ~     (p=0.272 n=16+17)
Zip64TestSizes/67108864-48    20.1kB ± 0%    20.1kB ± 0%    ~     (p=0.098 n=14+15)

name                        old allocs/op  new allocs/op  delta
CompressedZipGarbage            44.0 ± 0%      44.0 ± 0%    ~     (all equal)
CompressedZipGarbage-6          44.0 ± 0%      44.0 ± 0%    ~     (all equal)
CompressedZipGarbage-48         44.0 ± 0%      44.0 ± 0%    ~     (all equal)
Zip64Test                       53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64Test-6                     53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64Test-48                    53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/4096             53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/4096-6           53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/4096-48          53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/1048576          53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/1048576-6        53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/1048576-48       53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/67108864         53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/67108864-6       53.0 ± 0%      53.0 ± 0%    ~     (all equal)
Zip64TestSizes/67108864-48      53.0 ± 0%      53.0 ± 0%    ~     (all equal)

https://perf.golang.org/search?q=upload:20170428.4

Change-Id: Idb7bec091a210aba833066f8d083d66e27788286
Reviewed-on: https://go-review.googlesource.com/42113
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

@bradfitz bradfitz added the Performance label May 3, 2017

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Early May 3, 2017

@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017

@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

@baryluk

This comment has been minimized.

Copy link

commented Mar 21, 2019

					for i := 0; i < quota; i++ {
						mu.RLock()
						mu.RUnlock()
					}

It is pointless to use RWMutex when you do nothing under the lock.

For RLock to be useful and performant/scalable, you really need to have some work between RLock and RUnlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.