-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: add -benchsplit to get more data points #19128
Comments
CL https://golang.org/cl/37153 mentions this issue. |
CL https://golang.org/cl/37342 mentions this issue. |
/cc @aclements |
Related: #7465 |
One possibility is to crank up N only far enough to eliminate clock measurement overhead and then use caller-driven iteration to drive up the number of samples/total iterations to the desired measurement interval. However, this may not work well with benchmarks that perform non-trivial setup and use B.ResetTimer. |
Also related: #10930. I bring it up because I learned there that you can in fact detect this case--both the case in which b.ResetTimer is called and the case in which b.ResetTimer is (erroneously) not called. Just look at how the total wall time to run the benchmark compares to the time as recorded by the timer. |
We can also tell whether b.ResetTimer is in use because we control the implementation of b.ResetTimer. One of the reasons we've moved from benchcmp to benchstat is that the latter allows multiple data points for a given benchmark. Today those data points are all for roughly the same large size N, but the eventual plan was to emit more lines about smaller N, so that benchstat can do more work with distributions. If anyone wants to experiment with that and let us know how it goes, please do. I'd also like to have a check somewhere (in package testing or benchstat, probably the former) that b.N really is scaling linearly. If the time for b.N is 4X that of b.N/2 and 16X that of b.N/4, the benchmark function is using b.N incorrectly and shouldn't be reported at all. Doing this in package testing, without printing the smaller runs at all, would let benchstat continue to assume that all the reported runs are about the same size. |
Adding a correlation coefficient test to benchmarks seems like a good idea. We could also use the estimated slope of a (weighted?) linear regression as the ns/op instead of straight division, which in theory might yield better results. I tried hacking in a quick-and-dirty linear regression and got plausible slope numbers. (I didn't mail it because I used a naive rather than a numerically stable calculation.) I don't plan to work on this further at the moment. |
A few years ago I spent a while looking at what Haskell's benchmark tool does. I convinced myself that trying to be clever about slope vs N=0 cost was probably not worthwhile - better to brute force enough iterations that the slope is dominant. The initial version of Haskell's tool (I forget the name) apparently did fancy things like boosting too, but that seems like overkill too, super subtle and easy to quietly get wrong. Later versions seemed to have backed off some of the complexity. |
Fair enough. Looking at the N=0 cost may still be worth doing, if only to catch bugs like dbf533a. (Of course, one could brute force enough iterations for that, but it seems inadvisable.) |
In the previous version, Loads of keys with unchanged values would block on the Mutex if the map was dirty, leading to potentially long stalls while the read-only map is being copied to the dirty map. This change adds a double pointer indirection to each entry and an extra atomic load on the fast path. In exchange, it removes the need to lock the map for nearly all operations on existing keys. Each entry is represented by an atomically-updated pointer to an interface{} value. The same entries are stored in both the read-only and dirty maps. Keys deleted before the dirty map was last copied are marked as "expunged" and omitted from the dirty map until the next Store to that key. Newly-stored values exist only in the dirty map. The existence of new keys in the dirty map is indicated by an "amended" bit attached to the read-only map, allowing Load or Delete calls for nonexistent keys to sometimes return immediately (without acquiring the Mutex). This trades off a bit of steady-state throughput in the "mostly hits" case in exchange for less contention (and lower Load tail latencies) for mixed Load/Store/Delete usage. Unfortunately, the latency impact is currently difficult to measure (#19128). updates golang/go#19128 updates golang/go#18177 https://perf.golang.org/search?q=upload:20170315.5 name old time/op new time/op delta LoadMostlyHits/*syncmap_test.DeepCopyMap 70.3ns ± 6% 70.2ns ± 4% ~ (p=0.886 n=4+4) LoadMostlyHits/*syncmap_test.DeepCopyMap-48 10.9ns ± 4% 13.6ns ±24% ~ (p=0.314 n=4+4) LoadMostlyHits/*syncmap_test.RWMutexMap 86.0ns ± 9% 86.0ns ± 4% ~ (p=1.000 n=4+4) LoadMostlyHits/*syncmap_test.RWMutexMap-48 140ns ± 3% 139ns ± 2% ~ (p=0.686 n=4+4) LoadMostlyHits/*syncmap.Map 70.6ns ± 8% 70.3ns ± 1% ~ (p=0.571 n=4+4) LoadMostlyHits/*syncmap.Map-48 10.3ns ± 7% 11.2ns ± 5% ~ (p=0.114 n=4+4) LoadMostlyMisses/*syncmap_test.DeepCopyMap 59.4ns ± 4% 59.4ns ± 4% ~ (p=1.000 n=4+4) LoadMostlyMisses/*syncmap_test.DeepCopyMap-48 10.4ns ±30% 12.6ns ±29% ~ (p=0.486 n=4+4) LoadMostlyMisses/*syncmap_test.RWMutexMap 64.8ns ± 6% 63.8ns ± 4% ~ (p=0.686 n=4+4) LoadMostlyMisses/*syncmap_test.RWMutexMap-48 138ns ± 3% 139ns ± 2% ~ (p=0.800 n=4+4) LoadMostlyMisses/*syncmap.Map 51.5ns ± 7% 50.4ns ± 2% ~ (p=0.371 n=4+4) LoadMostlyMisses/*syncmap.Map-48 9.37ns ± 3% 9.40ns ± 3% ~ (p=0.886 n=4+4) LoadOrStoreBalanced/*syncmap_test.RWMutexMap 820ns ±17% 834ns ±14% ~ (p=1.000 n=4+4) LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48 1.38µs ± 4% 1.38µs ± 3% ~ (p=0.886 n=4+4) LoadOrStoreBalanced/*syncmap.Map 709ns ±13% 1085ns ± 9% +53.09% (p=0.029 n=4+4) LoadOrStoreBalanced/*syncmap.Map-48 1.30µs ±13% 1.08µs ± 8% -17.40% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap 1.26µs ±14% 1.16µs ±29% ~ (p=0.343 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap-48 1.82µs ±15% 1.98µs ± 6% ~ (p=0.143 n=4+4) LoadOrStoreUnique/*syncmap.Map 1.06µs ± 7% 1.86µs ±11% +76.09% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap.Map-48 2.00µs ± 4% 1.62µs ± 4% -19.20% (p=0.029 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap 32.9ns ± 8% 32.6ns ± 9% ~ (p=0.686 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48 2.26ns ±136% 3.41ns ±62% ~ (p=0.114 n=4+4) LoadOrStoreCollision/*syncmap_test.RWMutexMap 58.0ns ±20% 54.6ns ±17% ~ (p=0.343 n=4+4) LoadOrStoreCollision/*syncmap_test.RWMutexMap-48 458ns ± 2% 445ns ± 6% ~ (p=0.229 n=4+4) LoadOrStoreCollision/*syncmap.Map 35.8ns ± 5% 39.6ns ± 9% ~ (p=0.114 n=4+4) LoadOrStoreCollision/*syncmap.Map-48 1.24ns ± 2% 1.58ns ± 4% +27.16% (p=0.029 n=4+4) Range/*syncmap_test.DeepCopyMap 19.7µs ± 4% 19.8µs ± 3% ~ (p=0.686 n=4+4) Range/*syncmap_test.DeepCopyMap-48 763ns ± 1% 864ns ± 3% +13.24% (p=0.029 n=4+4) Range/*syncmap_test.RWMutexMap 20.9µs ± 3% 20.4µs ± 4% ~ (p=0.343 n=4+4) Range/*syncmap_test.RWMutexMap-48 764ns ± 1% 870ns ± 1% +13.77% (p=0.029 n=4+4) Range/*syncmap.Map 20.4µs ± 5% 22.7µs ± 5% +10.98% (p=0.029 n=4+4) Range/*syncmap.Map-48 776ns ± 3% 954ns ± 4% +22.95% (p=0.029 n=4+4) AdversarialAlloc/*syncmap_test.DeepCopyMap 206ns ± 5% 199ns ± 2% ~ (p=0.200 n=4+4) AdversarialAlloc/*syncmap_test.DeepCopyMap-48 8.94µs ± 5% 9.21µs ± 4% ~ (p=0.200 n=4+4) AdversarialAlloc/*syncmap_test.RWMutexMap 63.4ns ± 4% 63.8ns ± 3% ~ (p=0.686 n=4+4) AdversarialAlloc/*syncmap_test.RWMutexMap-48 184ns ±10% 198ns ±11% ~ (p=0.343 n=4+4) AdversarialAlloc/*syncmap.Map 213ns ± 3% 264ns ± 3% +23.97% (p=0.029 n=4+4) AdversarialAlloc/*syncmap.Map-48 556ns ± 5% 1389ns ± 8% +149.93% (p=0.029 n=4+4) AdversarialDelete/*syncmap_test.DeepCopyMap 300ns ± 6% 304ns ± 7% ~ (p=0.886 n=4+4) AdversarialDelete/*syncmap_test.DeepCopyMap-48 647ns ± 3% 646ns ± 3% ~ (p=0.943 n=4+4) AdversarialDelete/*syncmap_test.RWMutexMap 69.1ns ± 1% 69.2ns ± 6% ~ (p=0.686 n=4+4) AdversarialDelete/*syncmap_test.RWMutexMap-48 289ns ±15% 300ns ±17% ~ (p=0.829 n=4+4) AdversarialDelete/*syncmap.Map 198ns ± 5% 264ns ± 2% +33.17% (p=0.029 n=4+4) AdversarialDelete/*syncmap.Map-48 291ns ± 9% 173ns ± 8% -40.50% (p=0.029 n=4+4) name old alloc/op new alloc/op delta LoadMostlyHits/*syncmap_test.DeepCopyMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap_test.DeepCopyMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap.Map 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap.Map-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap.Map 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap.Map-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap 95.0B ± 0% 95.0B ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48 95.0B ± 0% 95.0B ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap.Map 95.0B ± 0% 88.0B ± 0% -7.37% (p=0.029 n=4+4) LoadOrStoreBalanced/*syncmap.Map-48 95.0B ± 0% 88.0B ± 0% -7.37% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap 175B ± 0% 175B ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap_test.RWMutexMap-48 175B ± 0% 175B ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap.Map 175B ± 0% 161B ± 0% -8.00% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap.Map-48 175B ± 0% 161B ± 0% -8.00% (p=0.029 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap-48 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap.Map 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap.Map-48 0.00B 0.00B ~ (all equal) Range/*syncmap_test.DeepCopyMap 0.00B 0.00B ~ (all equal) Range/*syncmap_test.DeepCopyMap-48 0.00B 0.00B ~ (all equal) Range/*syncmap_test.RWMutexMap 0.00B 0.00B ~ (all equal) Range/*syncmap_test.RWMutexMap-48 0.00B 0.00B ~ (all equal) Range/*syncmap.Map 0.00B 0.00B ~ (all equal) Range/*syncmap.Map-48 0.00B 0.00B ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap 74.0B ± 0% 74.0B ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap-48 3.15kB ± 0% 3.15kB ± 0% ~ (p=1.000 n=4+4) AdversarialAlloc/*syncmap_test.RWMutexMap 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.RWMutexMap-48 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialAlloc/*syncmap.Map 74.0B ± 0% 55.0B ± 0% -25.68% (p=0.029 n=4+4) AdversarialAlloc/*syncmap.Map-48 8.00B ± 0% 56.25B ± 1% +603.12% (p=0.029 n=4+4) AdversarialDelete/*syncmap_test.DeepCopyMap 155B ± 0% 155B ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.DeepCopyMap-48 156B ± 0% 156B ± 0% ~ (p=1.000 n=4+4) AdversarialDelete/*syncmap_test.RWMutexMap 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.RWMutexMap-48 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialDelete/*syncmap.Map 81.0B ± 0% 65.0B ± 0% -19.75% (p=0.029 n=4+4) AdversarialDelete/*syncmap.Map-48 23.0B ± 9% 15.2B ± 5% -33.70% (p=0.029 n=4+4) name old allocs/op new allocs/op delta LoadMostlyHits/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap.Map 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap.Map-48 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap.Map 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap.Map-48 0.00 0.00 ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap.Map 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.029 n=4+4) LoadOrStoreBalanced/*syncmap.Map-48 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap_test.RWMutexMap-48 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap.Map 2.00 ± 0% 4.00 ± 0% +100.00% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap.Map-48 2.00 ± 0% 4.00 ± 0% +100.00% (p=0.029 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap.Map 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap.Map-48 0.00 0.00 ~ (all equal) Range/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) Range/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) Range/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) Range/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) Range/*syncmap.Map 0.00 0.00 ~ (all equal) Range/*syncmap.Map-48 0.00 0.00 ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.RWMutexMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.RWMutexMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap.Map 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap.Map-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.DeepCopyMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.DeepCopyMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.RWMutexMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.RWMutexMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap.Map 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap.Map-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) Change-Id: I93c9458505d84238cc8e9016a7dfd285868af236 Reviewed-on: https://go-review.googlesource.com/37342 Reviewed-by: Russ Cox <rsc@golang.org>
I'd like to move this along, assuming that we're not going to introduce a new API for writing benchmarks (so no equivalent of pb.Next) and that we can't emit extra output lines by default (preserving the current one benchmark = one output line). And when we do emit extra output lines, all the lines need to be about the same thing (no warmups to filter out from averaging calculations in benchstat or other tools). That implies that we need to add some flag that causes a benchmark to be run multiple times for smaller counts and then with one line of output per each of those runs. Assuming the default -benchtime=1s, if instead of printing output for one run of 1s we want to print output for 10 runs of 0.1s, this can be specified in two possible ways: by setting the target run duration with the count implicit (-benchtime2=0.1s), or by setting the target number of runs with the duration implicit (-benchsplit=10). We only want to add one of these to package testing. I don't have a great name for the target run duration (obviously), but I think benchsplit could work if we decide to specify the number of runs instead. The split factor is subject to change to respect the overall time limit. If, say, one iteration of a benchmark takes 0.6 seconds, then with -benchtime=1s -benchsplit=10, we'll still only run ~2 of them, not 10. This makes it sensible to run So my proposal for this proposal is to add I don't know if this really helps @bcmills's original request. If not I apologize for hijacking. We've been talking about getting this data out for years, and it does provide more fine-grained detail about the distribution of runs, just not individual runs. If there are rare high-latency individual runs then hopefully you'd be able to split down to a small enough interval that some are lucky and some are not, and then that would come out in the data as a bimodal distribution. |
I don't think that really addresses my original request, unfortunately: for the kinds of latency distributions I'm interested in, one output line per run would be far more data than I want to sift through. But it does at least seem like a step in the right direction. |
Agreed: Seems like a step in the right direction, and I would definitely use this. The interaction between slow benchmarks and -count has always been painful. To be clear on the (non-)interaction between -benchsplit and -count, if you did -benchtime=1s -benchsplit=10 -count=5, you'd get anywhere between 5 and 50 lines of output? |
@bcmills, sifting through the output is for computers. @josharian, yes.
|
Objections to adding -benchsplit? /cc @aclements @mpvl |
Sounds like there are no objections to -benchsplit. Anyone want to implement it? |
(I've started working on this and plan on having a CL by early next week. Hopefully my recent |
CL https://golang.org/cl/47411 mentions this issue. |
Change https://golang.org/cl/80841 mentions this issue: |
Has progress on this stalled? |
Answering my own question from a year ago: yes, it seems like CL 47411 has stalled out, but the proposal is approved and the details seem to have been mostly ironed out in the extensive discussion on that CL. I think anyone could pick up these changes and take them across the finish line. |
In the previous version, Loads of keys with unchanged values would block on the Mutex if the map was dirty, leading to potentially long stalls while the read-only map is being copied to the dirty map. This change adds a double pointer indirection to each entry and an extra atomic load on the fast path. In exchange, it removes the need to lock the map for nearly all operations on existing keys. Each entry is represented by an atomically-updated pointer to an interface{} value. The same entries are stored in both the read-only and dirty maps. Keys deleted before the dirty map was last copied are marked as "expunged" and omitted from the dirty map until the next Store to that key. Newly-stored values exist only in the dirty map. The existence of new keys in the dirty map is indicated by an "amended" bit attached to the read-only map, allowing Load or Delete calls for nonexistent keys to sometimes return immediately (without acquiring the Mutex). This trades off a bit of steady-state throughput in the "mostly hits" case in exchange for less contention (and lower Load tail latencies) for mixed Load/Store/Delete usage. Unfortunately, the latency impact is currently difficult to measure (#19128). updates golang/go#19128 updates golang/go#18177 https://perf.golang.org/search?q=upload:20170315.5 name old time/op new time/op delta LoadMostlyHits/*syncmap_test.DeepCopyMap 70.3ns ± 6% 70.2ns ± 4% ~ (p=0.886 n=4+4) LoadMostlyHits/*syncmap_test.DeepCopyMap-48 10.9ns ± 4% 13.6ns ±24% ~ (p=0.314 n=4+4) LoadMostlyHits/*syncmap_test.RWMutexMap 86.0ns ± 9% 86.0ns ± 4% ~ (p=1.000 n=4+4) LoadMostlyHits/*syncmap_test.RWMutexMap-48 140ns ± 3% 139ns ± 2% ~ (p=0.686 n=4+4) LoadMostlyHits/*syncmap.Map 70.6ns ± 8% 70.3ns ± 1% ~ (p=0.571 n=4+4) LoadMostlyHits/*syncmap.Map-48 10.3ns ± 7% 11.2ns ± 5% ~ (p=0.114 n=4+4) LoadMostlyMisses/*syncmap_test.DeepCopyMap 59.4ns ± 4% 59.4ns ± 4% ~ (p=1.000 n=4+4) LoadMostlyMisses/*syncmap_test.DeepCopyMap-48 10.4ns ±30% 12.6ns ±29% ~ (p=0.486 n=4+4) LoadMostlyMisses/*syncmap_test.RWMutexMap 64.8ns ± 6% 63.8ns ± 4% ~ (p=0.686 n=4+4) LoadMostlyMisses/*syncmap_test.RWMutexMap-48 138ns ± 3% 139ns ± 2% ~ (p=0.800 n=4+4) LoadMostlyMisses/*syncmap.Map 51.5ns ± 7% 50.4ns ± 2% ~ (p=0.371 n=4+4) LoadMostlyMisses/*syncmap.Map-48 9.37ns ± 3% 9.40ns ± 3% ~ (p=0.886 n=4+4) LoadOrStoreBalanced/*syncmap_test.RWMutexMap 820ns ±17% 834ns ±14% ~ (p=1.000 n=4+4) LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48 1.38µs ± 4% 1.38µs ± 3% ~ (p=0.886 n=4+4) LoadOrStoreBalanced/*syncmap.Map 709ns ±13% 1085ns ± 9% +53.09% (p=0.029 n=4+4) LoadOrStoreBalanced/*syncmap.Map-48 1.30µs ±13% 1.08µs ± 8% -17.40% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap 1.26µs ±14% 1.16µs ±29% ~ (p=0.343 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap-48 1.82µs ±15% 1.98µs ± 6% ~ (p=0.143 n=4+4) LoadOrStoreUnique/*syncmap.Map 1.06µs ± 7% 1.86µs ±11% +76.09% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap.Map-48 2.00µs ± 4% 1.62µs ± 4% -19.20% (p=0.029 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap 32.9ns ± 8% 32.6ns ± 9% ~ (p=0.686 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48 2.26ns ±136% 3.41ns ±62% ~ (p=0.114 n=4+4) LoadOrStoreCollision/*syncmap_test.RWMutexMap 58.0ns ±20% 54.6ns ±17% ~ (p=0.343 n=4+4) LoadOrStoreCollision/*syncmap_test.RWMutexMap-48 458ns ± 2% 445ns ± 6% ~ (p=0.229 n=4+4) LoadOrStoreCollision/*syncmap.Map 35.8ns ± 5% 39.6ns ± 9% ~ (p=0.114 n=4+4) LoadOrStoreCollision/*syncmap.Map-48 1.24ns ± 2% 1.58ns ± 4% +27.16% (p=0.029 n=4+4) Range/*syncmap_test.DeepCopyMap 19.7µs ± 4% 19.8µs ± 3% ~ (p=0.686 n=4+4) Range/*syncmap_test.DeepCopyMap-48 763ns ± 1% 864ns ± 3% +13.24% (p=0.029 n=4+4) Range/*syncmap_test.RWMutexMap 20.9µs ± 3% 20.4µs ± 4% ~ (p=0.343 n=4+4) Range/*syncmap_test.RWMutexMap-48 764ns ± 1% 870ns ± 1% +13.77% (p=0.029 n=4+4) Range/*syncmap.Map 20.4µs ± 5% 22.7µs ± 5% +10.98% (p=0.029 n=4+4) Range/*syncmap.Map-48 776ns ± 3% 954ns ± 4% +22.95% (p=0.029 n=4+4) AdversarialAlloc/*syncmap_test.DeepCopyMap 206ns ± 5% 199ns ± 2% ~ (p=0.200 n=4+4) AdversarialAlloc/*syncmap_test.DeepCopyMap-48 8.94µs ± 5% 9.21µs ± 4% ~ (p=0.200 n=4+4) AdversarialAlloc/*syncmap_test.RWMutexMap 63.4ns ± 4% 63.8ns ± 3% ~ (p=0.686 n=4+4) AdversarialAlloc/*syncmap_test.RWMutexMap-48 184ns ±10% 198ns ±11% ~ (p=0.343 n=4+4) AdversarialAlloc/*syncmap.Map 213ns ± 3% 264ns ± 3% +23.97% (p=0.029 n=4+4) AdversarialAlloc/*syncmap.Map-48 556ns ± 5% 1389ns ± 8% +149.93% (p=0.029 n=4+4) AdversarialDelete/*syncmap_test.DeepCopyMap 300ns ± 6% 304ns ± 7% ~ (p=0.886 n=4+4) AdversarialDelete/*syncmap_test.DeepCopyMap-48 647ns ± 3% 646ns ± 3% ~ (p=0.943 n=4+4) AdversarialDelete/*syncmap_test.RWMutexMap 69.1ns ± 1% 69.2ns ± 6% ~ (p=0.686 n=4+4) AdversarialDelete/*syncmap_test.RWMutexMap-48 289ns ±15% 300ns ±17% ~ (p=0.829 n=4+4) AdversarialDelete/*syncmap.Map 198ns ± 5% 264ns ± 2% +33.17% (p=0.029 n=4+4) AdversarialDelete/*syncmap.Map-48 291ns ± 9% 173ns ± 8% -40.50% (p=0.029 n=4+4) name old alloc/op new alloc/op delta LoadMostlyHits/*syncmap_test.DeepCopyMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap_test.DeepCopyMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap.Map 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyHits/*syncmap.Map-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap.Map 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadMostlyMisses/*syncmap.Map-48 7.00B ± 0% 7.00B ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap 95.0B ± 0% 95.0B ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48 95.0B ± 0% 95.0B ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap.Map 95.0B ± 0% 88.0B ± 0% -7.37% (p=0.029 n=4+4) LoadOrStoreBalanced/*syncmap.Map-48 95.0B ± 0% 88.0B ± 0% -7.37% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap 175B ± 0% 175B ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap_test.RWMutexMap-48 175B ± 0% 175B ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap.Map 175B ± 0% 161B ± 0% -8.00% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap.Map-48 175B ± 0% 161B ± 0% -8.00% (p=0.029 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap-48 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap.Map 0.00B 0.00B ~ (all equal) LoadOrStoreCollision/*syncmap.Map-48 0.00B 0.00B ~ (all equal) Range/*syncmap_test.DeepCopyMap 0.00B 0.00B ~ (all equal) Range/*syncmap_test.DeepCopyMap-48 0.00B 0.00B ~ (all equal) Range/*syncmap_test.RWMutexMap 0.00B 0.00B ~ (all equal) Range/*syncmap_test.RWMutexMap-48 0.00B 0.00B ~ (all equal) Range/*syncmap.Map 0.00B 0.00B ~ (all equal) Range/*syncmap.Map-48 0.00B 0.00B ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap 74.0B ± 0% 74.0B ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap-48 3.15kB ± 0% 3.15kB ± 0% ~ (p=1.000 n=4+4) AdversarialAlloc/*syncmap_test.RWMutexMap 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.RWMutexMap-48 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialAlloc/*syncmap.Map 74.0B ± 0% 55.0B ± 0% -25.68% (p=0.029 n=4+4) AdversarialAlloc/*syncmap.Map-48 8.00B ± 0% 56.25B ± 1% +603.12% (p=0.029 n=4+4) AdversarialDelete/*syncmap_test.DeepCopyMap 155B ± 0% 155B ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.DeepCopyMap-48 156B ± 0% 156B ± 0% ~ (p=1.000 n=4+4) AdversarialDelete/*syncmap_test.RWMutexMap 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.RWMutexMap-48 8.00B ± 0% 8.00B ± 0% ~ (all equal) AdversarialDelete/*syncmap.Map 81.0B ± 0% 65.0B ± 0% -19.75% (p=0.029 n=4+4) AdversarialDelete/*syncmap.Map-48 23.0B ± 9% 15.2B ± 5% -33.70% (p=0.029 n=4+4) name old allocs/op new allocs/op delta LoadMostlyHits/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap.Map 0.00 0.00 ~ (all equal) LoadMostlyHits/*syncmap.Map-48 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap.Map 0.00 0.00 ~ (all equal) LoadMostlyMisses/*syncmap.Map-48 0.00 0.00 ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreBalanced/*syncmap.Map 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.029 n=4+4) LoadOrStoreBalanced/*syncmap.Map-48 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap_test.RWMutexMap 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap_test.RWMutexMap-48 2.00 ± 0% 2.00 ± 0% ~ (all equal) LoadOrStoreUnique/*syncmap.Map 2.00 ± 0% 4.00 ± 0% +100.00% (p=0.029 n=4+4) LoadOrStoreUnique/*syncmap.Map-48 2.00 ± 0% 4.00 ± 0% +100.00% (p=0.029 n=4+4) LoadOrStoreCollision/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap.Map 0.00 0.00 ~ (all equal) LoadOrStoreCollision/*syncmap.Map-48 0.00 0.00 ~ (all equal) Range/*syncmap_test.DeepCopyMap 0.00 0.00 ~ (all equal) Range/*syncmap_test.DeepCopyMap-48 0.00 0.00 ~ (all equal) Range/*syncmap_test.RWMutexMap 0.00 0.00 ~ (all equal) Range/*syncmap_test.RWMutexMap-48 0.00 0.00 ~ (all equal) Range/*syncmap.Map 0.00 0.00 ~ (all equal) Range/*syncmap.Map-48 0.00 0.00 ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.DeepCopyMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.RWMutexMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap_test.RWMutexMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap.Map 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialAlloc/*syncmap.Map-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.DeepCopyMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.DeepCopyMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.RWMutexMap 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap_test.RWMutexMap-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap.Map 1.00 ± 0% 1.00 ± 0% ~ (all equal) AdversarialDelete/*syncmap.Map-48 1.00 ± 0% 1.00 ± 0% ~ (all equal) Change-Id: I93c9458505d84238cc8e9016a7dfd285868af236 Reviewed-on: https://go-review.googlesource.com/37342 Reviewed-by: Russ Cox <rsc@golang.org>
For #18177, we need to make tradeoffs between worst-case latency and average throughput for the functions in question.
It's relatively easy to measure the average throughput today using benchmarks with the
testing
package.It would be nice if I could use those same benchmarks to measure the distribution of timings across iterations. That isn't really feasible using
(*B).N
with caller-side iteration, but it might at least be possible to tap into(*PB).Next
to get some idea of the latency between calls for parallel benchmarks.The text was updated successfully, but these errors were encountered: