New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: add -benchsplit to get more data points #19128

Open
bcmills opened this Issue Feb 16, 2017 · 20 comments

Comments

Projects
None yet
8 participants
@bcmills
Member

bcmills commented Feb 16, 2017

For #18177, we need to make tradeoffs between worst-case latency and average throughput for the functions in question.

It's relatively easy to measure the average throughput today using benchmarks with the testing package.

It would be nice if I could use those same benchmarks to measure the distribution of timings across iterations. That isn't really feasible using (*B).N with caller-side iteration, but it might at least be possible to tap into (*PB).Next to get some idea of the latency between calls for parallel benchmarks.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented Feb 16, 2017

CL https://golang.org/cl/37153 mentions this issue.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented Feb 23, 2017

CL https://golang.org/cl/37342 mentions this issue.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Mar 6, 2017

Contributor
Contributor

rsc commented Mar 6, 2017

@rsc rsc added this to the Proposal milestone Mar 6, 2017

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Mar 6, 2017

Contributor

Related: #7465

Contributor

josharian commented Mar 6, 2017

Related: #7465

@aclements

This comment has been minimized.

Show comment
Hide comment
@aclements

aclements Mar 6, 2017

Member

That isn't really feasible using (*B).N with caller-side iteration

One possibility is to crank up N only far enough to eliminate clock measurement overhead and then use caller-driven iteration to drive up the number of samples/total iterations to the desired measurement interval. However, this may not work well with benchmarks that perform non-trivial setup and use B.ResetTimer.

Member

aclements commented Mar 6, 2017

That isn't really feasible using (*B).N with caller-side iteration

One possibility is to crank up N only far enough to eliminate clock measurement overhead and then use caller-driven iteration to drive up the number of samples/total iterations to the desired measurement interval. However, this may not work well with benchmarks that perform non-trivial setup and use B.ResetTimer.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Mar 6, 2017

Contributor

benchmarks that perform non-trivial setup and use B.ResetTimer

Also related: #10930. I bring it up because I learned there that you can in fact detect this case--both the case in which b.ResetTimer is called and the case in which b.ResetTimer is (erroneously) not called. Just look at how the total wall time to run the benchmark compares to the time as recorded by the timer.

Contributor

josharian commented Mar 6, 2017

benchmarks that perform non-trivial setup and use B.ResetTimer

Also related: #10930. I bring it up because I learned there that you can in fact detect this case--both the case in which b.ResetTimer is called and the case in which b.ResetTimer is (erroneously) not called. Just look at how the total wall time to run the benchmark compares to the time as recorded by the timer.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Mar 7, 2017

Contributor

We can also tell whether b.ResetTimer is in use because we control the implementation of b.ResetTimer.

One of the reasons we've moved from benchcmp to benchstat is that the latter allows multiple data points for a given benchmark. Today those data points are all for roughly the same large size N, but the eventual plan was to emit more lines about smaller N, so that benchstat can do more work with distributions. If anyone wants to experiment with that and let us know how it goes, please do.

I'd also like to have a check somewhere (in package testing or benchstat, probably the former) that b.N really is scaling linearly. If the time for b.N is 4X that of b.N/2 and 16X that of b.N/4, the benchmark function is using b.N incorrectly and shouldn't be reported at all. Doing this in package testing, without printing the smaller runs at all, would let benchstat continue to assume that all the reported runs are about the same size.

Contributor

rsc commented Mar 7, 2017

We can also tell whether b.ResetTimer is in use because we control the implementation of b.ResetTimer.

One of the reasons we've moved from benchcmp to benchstat is that the latter allows multiple data points for a given benchmark. Today those data points are all for roughly the same large size N, but the eventual plan was to emit more lines about smaller N, so that benchstat can do more work with distributions. If anyone wants to experiment with that and let us know how it goes, please do.

I'd also like to have a check somewhere (in package testing or benchstat, probably the former) that b.N really is scaling linearly. If the time for b.N is 4X that of b.N/2 and 16X that of b.N/4, the benchmark function is using b.N incorrectly and shouldn't be reported at all. Doing this in package testing, without printing the smaller runs at all, would let benchstat continue to assume that all the reported runs are about the same size.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Mar 7, 2017

Contributor

Adding a correlation coefficient test to benchmarks seems like a good idea. We could also use the estimated slope of a (weighted?) linear regression as the ns/op instead of straight division, which in theory might yield better results.

I tried hacking in a quick-and-dirty linear regression and got plausible slope numbers. (I didn't mail it because I used a naive rather than a numerically stable calculation.)

I don't plan to work on this further at the moment.

Contributor

josharian commented Mar 7, 2017

Adding a correlation coefficient test to benchmarks seems like a good idea. We could also use the estimated slope of a (weighted?) linear regression as the ns/op instead of straight division, which in theory might yield better results.

I tried hacking in a quick-and-dirty linear regression and got plausible slope numbers. (I didn't mail it because I used a naive rather than a numerically stable calculation.)

I don't plan to work on this further at the moment.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Mar 8, 2017

Contributor

A few years ago I spent a while looking at what Haskell's benchmark tool does. I convinced myself that trying to be clever about slope vs N=0 cost was probably not worthwhile - better to brute force enough iterations that the slope is dominant.

The initial version of Haskell's tool (I forget the name) apparently did fancy things like boosting too, but that seems like overkill too, super subtle and easy to quietly get wrong. Later versions seemed to have backed off some of the complexity.

Contributor

rsc commented Mar 8, 2017

A few years ago I spent a while looking at what Haskell's benchmark tool does. I convinced myself that trying to be clever about slope vs N=0 cost was probably not worthwhile - better to brute force enough iterations that the slope is dominant.

The initial version of Haskell's tool (I forget the name) apparently did fancy things like boosting too, but that seems like overkill too, super subtle and easy to quietly get wrong. Later versions seemed to have backed off some of the complexity.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Mar 8, 2017

Contributor

I convinced myself that trying to be clever about slope vs N=0 cost was probably not worthwhile - better to brute force enough iterations that the slope is dominant.

Fair enough. Looking at the N=0 cost may still be worth doing, if only to catch bugs like dbf533a. (Of course, one could brute force enough iterations for that, but it seems inadvisable.)

Contributor

josharian commented Mar 8, 2017

I convinced myself that trying to be clever about slope vs N=0 cost was probably not worthwhile - better to brute force enough iterations that the slope is dominant.

Fair enough. Looking at the N=0 cost may still be worth doing, if only to catch bugs like dbf533a. (Of course, one could brute force enough iterations for that, but it seems inadvisable.)

gopherbot pushed a commit to golang/sync that referenced this issue Mar 15, 2017

syncmap: remove blocking for all operations on existing keys
In the previous version, Loads of keys with unchanged values would
block on the Mutex if the map was dirty, leading to potentially long
stalls while the read-only map is being copied to the dirty map.

This change adds a double pointer indirection to each entry and an
extra atomic load on the fast path. In exchange, it removes the need
to lock the map for nearly all operations on existing keys.

Each entry is represented by an atomically-updated pointer to an
interface{} value. The same entries are stored in both the read-only
and dirty maps. Keys deleted before the dirty map was last copied are
marked as "expunged" and omitted from the dirty map until the next
Store to that key.

Newly-stored values exist only in the dirty map. The existence of new
keys in the dirty map is indicated by an "amended" bit attached to the
read-only map, allowing Load or Delete calls for nonexistent keys to
sometimes return immediately (without acquiring the Mutex).

This trades off a bit of steady-state throughput in the "mostly hits"
case in exchange for less contention (and lower Load tail latencies)
for mixed Load/Store/Delete usage.

Unfortunately, the latency impact is currently difficult to measure
(#19128).

updates golang/go#19128
updates golang/go#18177

https://perf.golang.org/search?q=upload:20170315.5

name                                               old time/op    new time/op    delta
LoadMostlyHits/*syncmap_test.DeepCopyMap             70.3ns ± 6%    70.2ns ± 4%      ~     (p=0.886 n=4+4)
LoadMostlyHits/*syncmap_test.DeepCopyMap-48          10.9ns ± 4%    13.6ns ±24%      ~     (p=0.314 n=4+4)
LoadMostlyHits/*syncmap_test.RWMutexMap              86.0ns ± 9%    86.0ns ± 4%      ~     (p=1.000 n=4+4)
LoadMostlyHits/*syncmap_test.RWMutexMap-48            140ns ± 3%     139ns ± 2%      ~     (p=0.686 n=4+4)
LoadMostlyHits/*syncmap.Map                          70.6ns ± 8%    70.3ns ± 1%      ~     (p=0.571 n=4+4)
LoadMostlyHits/*syncmap.Map-48                       10.3ns ± 7%    11.2ns ± 5%      ~     (p=0.114 n=4+4)
LoadMostlyMisses/*syncmap_test.DeepCopyMap           59.4ns ± 4%    59.4ns ± 4%      ~     (p=1.000 n=4+4)
LoadMostlyMisses/*syncmap_test.DeepCopyMap-48        10.4ns ±30%    12.6ns ±29%      ~     (p=0.486 n=4+4)
LoadMostlyMisses/*syncmap_test.RWMutexMap            64.8ns ± 6%    63.8ns ± 4%      ~     (p=0.686 n=4+4)
LoadMostlyMisses/*syncmap_test.RWMutexMap-48          138ns ± 3%     139ns ± 2%      ~     (p=0.800 n=4+4)
LoadMostlyMisses/*syncmap.Map                        51.5ns ± 7%    50.4ns ± 2%      ~     (p=0.371 n=4+4)
LoadMostlyMisses/*syncmap.Map-48                     9.37ns ± 3%    9.40ns ± 3%      ~     (p=0.886 n=4+4)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap          820ns ±17%     834ns ±14%      ~     (p=1.000 n=4+4)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48      1.38µs ± 4%    1.38µs ± 3%      ~     (p=0.886 n=4+4)
LoadOrStoreBalanced/*syncmap.Map                      709ns ±13%    1085ns ± 9%   +53.09%  (p=0.029 n=4+4)
LoadOrStoreBalanced/*syncmap.Map-48                  1.30µs ±13%    1.08µs ± 8%   -17.40%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap           1.26µs ±14%    1.16µs ±29%      ~     (p=0.343 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap-48        1.82µs ±15%    1.98µs ± 6%      ~     (p=0.143 n=4+4)
LoadOrStoreUnique/*syncmap.Map                       1.06µs ± 7%    1.86µs ±11%   +76.09%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap.Map-48                    2.00µs ± 4%    1.62µs ± 4%   -19.20%  (p=0.029 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap       32.9ns ± 8%    32.6ns ± 9%      ~     (p=0.686 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48   2.26ns ±136%    3.41ns ±62%      ~     (p=0.114 n=4+4)
LoadOrStoreCollision/*syncmap_test.RWMutexMap        58.0ns ±20%    54.6ns ±17%      ~     (p=0.343 n=4+4)
LoadOrStoreCollision/*syncmap_test.RWMutexMap-48      458ns ± 2%     445ns ± 6%      ~     (p=0.229 n=4+4)
LoadOrStoreCollision/*syncmap.Map                    35.8ns ± 5%    39.6ns ± 9%      ~     (p=0.114 n=4+4)
LoadOrStoreCollision/*syncmap.Map-48                 1.24ns ± 2%    1.58ns ± 4%   +27.16%  (p=0.029 n=4+4)
Range/*syncmap_test.DeepCopyMap                      19.7µs ± 4%    19.8µs ± 3%      ~     (p=0.686 n=4+4)
Range/*syncmap_test.DeepCopyMap-48                    763ns ± 1%     864ns ± 3%   +13.24%  (p=0.029 n=4+4)
Range/*syncmap_test.RWMutexMap                       20.9µs ± 3%    20.4µs ± 4%      ~     (p=0.343 n=4+4)
Range/*syncmap_test.RWMutexMap-48                     764ns ± 1%     870ns ± 1%   +13.77%  (p=0.029 n=4+4)
Range/*syncmap.Map                                   20.4µs ± 5%    22.7µs ± 5%   +10.98%  (p=0.029 n=4+4)
Range/*syncmap.Map-48                                 776ns ± 3%     954ns ± 4%   +22.95%  (p=0.029 n=4+4)
AdversarialAlloc/*syncmap_test.DeepCopyMap            206ns ± 5%     199ns ± 2%      ~     (p=0.200 n=4+4)
AdversarialAlloc/*syncmap_test.DeepCopyMap-48        8.94µs ± 5%    9.21µs ± 4%      ~     (p=0.200 n=4+4)
AdversarialAlloc/*syncmap_test.RWMutexMap            63.4ns ± 4%    63.8ns ± 3%      ~     (p=0.686 n=4+4)
AdversarialAlloc/*syncmap_test.RWMutexMap-48          184ns ±10%     198ns ±11%      ~     (p=0.343 n=4+4)
AdversarialAlloc/*syncmap.Map                         213ns ± 3%     264ns ± 3%   +23.97%  (p=0.029 n=4+4)
AdversarialAlloc/*syncmap.Map-48                      556ns ± 5%    1389ns ± 8%  +149.93%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap_test.DeepCopyMap           300ns ± 6%     304ns ± 7%      ~     (p=0.886 n=4+4)
AdversarialDelete/*syncmap_test.DeepCopyMap-48        647ns ± 3%     646ns ± 3%      ~     (p=0.943 n=4+4)
AdversarialDelete/*syncmap_test.RWMutexMap           69.1ns ± 1%    69.2ns ± 6%      ~     (p=0.686 n=4+4)
AdversarialDelete/*syncmap_test.RWMutexMap-48         289ns ±15%     300ns ±17%      ~     (p=0.829 n=4+4)
AdversarialDelete/*syncmap.Map                        198ns ± 5%     264ns ± 2%   +33.17%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap.Map-48                     291ns ± 9%     173ns ± 8%   -40.50%  (p=0.029 n=4+4)

name                                               old alloc/op   new alloc/op   delta
LoadMostlyHits/*syncmap_test.DeepCopyMap              7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap_test.DeepCopyMap-48           7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap               7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap-48            7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap.Map                           7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyHits/*syncmap.Map-48                        7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap            7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap-48         7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap             7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap-48          7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap.Map                         7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadMostlyMisses/*syncmap.Map-48                      7.00B ± 0%     7.00B ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap          95.0B ± 0%     95.0B ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48       95.0B ± 0%     95.0B ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap.Map                      95.0B ± 0%     88.0B ± 0%    -7.37%  (p=0.029 n=4+4)
LoadOrStoreBalanced/*syncmap.Map-48                   95.0B ± 0%     88.0B ± 0%    -7.37%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap             175B ± 0%      175B ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap_test.RWMutexMap-48          175B ± 0%      175B ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap.Map                         175B ± 0%      161B ± 0%    -8.00%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap.Map-48                      175B ± 0%      161B ± 0%    -8.00%  (p=0.029 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap        0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48     0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap         0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap-48      0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map                     0.00B          0.00B           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map-48                  0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.DeepCopyMap                       0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.DeepCopyMap-48                    0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.RWMutexMap                        0.00B          0.00B           ~     (all equal)
Range/*syncmap_test.RWMutexMap-48                     0.00B          0.00B           ~     (all equal)
Range/*syncmap.Map                                    0.00B          0.00B           ~     (all equal)
Range/*syncmap.Map-48                                 0.00B          0.00B           ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap            74.0B ± 0%     74.0B ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap-48        3.15kB ± 0%    3.15kB ± 0%      ~     (p=1.000 n=4+4)
AdversarialAlloc/*syncmap_test.RWMutexMap             8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.RWMutexMap-48          8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap.Map                         74.0B ± 0%     55.0B ± 0%   -25.68%  (p=0.029 n=4+4)
AdversarialAlloc/*syncmap.Map-48                      8.00B ± 0%    56.25B ± 1%  +603.12%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap_test.DeepCopyMap            155B ± 0%      155B ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.DeepCopyMap-48         156B ± 0%      156B ± 0%      ~     (p=1.000 n=4+4)
AdversarialDelete/*syncmap_test.RWMutexMap            8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.RWMutexMap-48         8.00B ± 0%     8.00B ± 0%      ~     (all equal)
AdversarialDelete/*syncmap.Map                        81.0B ± 0%     65.0B ± 0%   -19.75%  (p=0.029 n=4+4)
AdversarialDelete/*syncmap.Map-48                     23.0B ± 9%     15.2B ± 5%   -33.70%  (p=0.029 n=4+4)

name                                               old allocs/op  new allocs/op  delta
LoadMostlyHits/*syncmap_test.DeepCopyMap               0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap_test.DeepCopyMap-48            0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap                0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap_test.RWMutexMap-48             0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap.Map                            0.00           0.00           ~     (all equal)
LoadMostlyHits/*syncmap.Map-48                         0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap             0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.DeepCopyMap-48          0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap              0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap_test.RWMutexMap-48           0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap.Map                          0.00           0.00           ~     (all equal)
LoadMostlyMisses/*syncmap.Map-48                       0.00           0.00           ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap           2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap_test.RWMutexMap-48        2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreBalanced/*syncmap.Map                       2.00 ± 0%      3.00 ± 0%   +50.00%  (p=0.029 n=4+4)
LoadOrStoreBalanced/*syncmap.Map-48                    2.00 ± 0%      3.00 ± 0%   +50.00%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap_test.RWMutexMap             2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap_test.RWMutexMap-48          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
LoadOrStoreUnique/*syncmap.Map                         2.00 ± 0%      4.00 ± 0%  +100.00%  (p=0.029 n=4+4)
LoadOrStoreUnique/*syncmap.Map-48                      2.00 ± 0%      4.00 ± 0%  +100.00%  (p=0.029 n=4+4)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap         0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.DeepCopyMap-48      0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap          0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap_test.RWMutexMap-48       0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map                      0.00           0.00           ~     (all equal)
LoadOrStoreCollision/*syncmap.Map-48                   0.00           0.00           ~     (all equal)
Range/*syncmap_test.DeepCopyMap                        0.00           0.00           ~     (all equal)
Range/*syncmap_test.DeepCopyMap-48                     0.00           0.00           ~     (all equal)
Range/*syncmap_test.RWMutexMap                         0.00           0.00           ~     (all equal)
Range/*syncmap_test.RWMutexMap-48                      0.00           0.00           ~     (all equal)
Range/*syncmap.Map                                     0.00           0.00           ~     (all equal)
Range/*syncmap.Map-48                                  0.00           0.00           ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.DeepCopyMap-48          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.RWMutexMap              1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap_test.RWMutexMap-48           1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap.Map                          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialAlloc/*syncmap.Map-48                       1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.DeepCopyMap            1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.DeepCopyMap-48         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.RWMutexMap             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap_test.RWMutexMap-48          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap.Map                         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
AdversarialDelete/*syncmap.Map-48                      1.00 ± 0%      1.00 ± 0%      ~     (all equal)

Change-Id: I93c9458505d84238cc8e9016a7dfd285868af236
Reviewed-on: https://go-review.googlesource.com/37342
Reviewed-by: Russ Cox <rsc@golang.org>

@gopherbot gopherbot added the Proposal label Mar 20, 2017

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Apr 10, 2017

Contributor

I'd like to move this along, assuming that we're not going to introduce a new API for writing benchmarks (so no equivalent of pb.Next) and that we can't emit extra output lines by default (preserving the current one benchmark = one output line). And when we do emit extra output lines, all the lines need to be about the same thing (no warmups to filter out from averaging calculations in benchstat or other tools).

That implies that we need to add some flag that causes a benchmark to be run multiple times for smaller counts and then with one line of output per each of those runs.

Assuming the default -benchtime=1s, if instead of printing output for one run of 1s we want to print output for 10 runs of 0.1s, this can be specified in two possible ways: by setting the target run duration with the count implicit (-benchtime2=0.1s), or by setting the target number of runs with the duration implicit (-benchsplit=10). We only want to add one of these to package testing.

I don't have a great name for the target run duration (obviously), but I think benchsplit could work if we decide to specify the number of runs instead.

The split factor is subject to change to respect the overall time limit. If, say, one iteration of a benchmark takes 0.6 seconds, then with -benchtime=1s -benchsplit=10, we'll still only run ~2 of them, not 10. This makes it sensible to run go test -benchsplit=10 in the go1 directory, as compared to go test -benchtime=0.1 -count=10. The latter runs even very expensive benchmarks 10 times, while the former only runs them once.

So my proposal for this proposal is to add -benchsplit=N as described above.

I don't know if this really helps @bcmills's original request. If not I apologize for hijacking. We've been talking about getting this data out for years, and it does provide more fine-grained detail about the distribution of runs, just not individual runs. If there are rare high-latency individual runs then hopefully you'd be able to split down to a small enough interval that some are lucky and some are not, and then that would come out in the data as a bimodal distribution.

Contributor

rsc commented Apr 10, 2017

I'd like to move this along, assuming that we're not going to introduce a new API for writing benchmarks (so no equivalent of pb.Next) and that we can't emit extra output lines by default (preserving the current one benchmark = one output line). And when we do emit extra output lines, all the lines need to be about the same thing (no warmups to filter out from averaging calculations in benchstat or other tools).

That implies that we need to add some flag that causes a benchmark to be run multiple times for smaller counts and then with one line of output per each of those runs.

Assuming the default -benchtime=1s, if instead of printing output for one run of 1s we want to print output for 10 runs of 0.1s, this can be specified in two possible ways: by setting the target run duration with the count implicit (-benchtime2=0.1s), or by setting the target number of runs with the duration implicit (-benchsplit=10). We only want to add one of these to package testing.

I don't have a great name for the target run duration (obviously), but I think benchsplit could work if we decide to specify the number of runs instead.

The split factor is subject to change to respect the overall time limit. If, say, one iteration of a benchmark takes 0.6 seconds, then with -benchtime=1s -benchsplit=10, we'll still only run ~2 of them, not 10. This makes it sensible to run go test -benchsplit=10 in the go1 directory, as compared to go test -benchtime=0.1 -count=10. The latter runs even very expensive benchmarks 10 times, while the former only runs them once.

So my proposal for this proposal is to add -benchsplit=N as described above.

I don't know if this really helps @bcmills's original request. If not I apologize for hijacking. We've been talking about getting this data out for years, and it does provide more fine-grained detail about the distribution of runs, just not individual runs. If there are rare high-latency individual runs then hopefully you'd be able to split down to a small enough interval that some are lucky and some are not, and then that would come out in the data as a bimodal distribution.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Apr 10, 2017

Member

I don't think that really addresses my original request, unfortunately: for the kinds of latency distributions I'm interested in, one output line per run would be far more data than I want to sift through.

But it does at least seem like a step in the right direction.

Member

bcmills commented Apr 10, 2017

I don't think that really addresses my original request, unfortunately: for the kinds of latency distributions I'm interested in, one output line per run would be far more data than I want to sift through.

But it does at least seem like a step in the right direction.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Apr 10, 2017

Contributor

Agreed: Seems like a step in the right direction, and I would definitely use this. The interaction between slow benchmarks and -count has always been painful.

To be clear on the (non-)interaction between -benchsplit and -count, if you did -benchtime=1s -benchsplit=10 -count=5, you'd get anywhere between 5 and 50 lines of output?

Contributor

josharian commented Apr 10, 2017

Agreed: Seems like a step in the right direction, and I would definitely use this. The interaction between slow benchmarks and -count has always been painful.

To be clear on the (non-)interaction between -benchsplit and -count, if you did -benchtime=1s -benchsplit=10 -count=5, you'd get anywhere between 5 and 50 lines of output?

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Apr 10, 2017

Contributor

@bcmills, sifting through the output is for computers.

@josharian, yes.

for 0..count {
    total = 0
    for total < benchtime {
        t = runAndReport(benchtime/benchsplit)
        total += t
    }
}
Contributor

rsc commented Apr 10, 2017

@bcmills, sifting through the output is for computers.

@josharian, yes.

for 0..count {
    total = 0
    for total < benchtime {
        t = runAndReport(benchtime/benchsplit)
        total += t
    }
}
@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Apr 17, 2017

Contributor

Objections to adding -benchsplit?

/cc @aclements @mpvl

Contributor

rsc commented Apr 17, 2017

Objections to adding -benchsplit?

/cc @aclements @mpvl

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jun 12, 2017

Contributor

Sounds like there are no objections to -benchsplit. Anyone want to implement it?

Contributor

rsc commented Jun 12, 2017

Sounds like there are no objections to -benchsplit. Anyone want to implement it?

@rsc rsc changed the title from proposal: testing: measure benchmark latency distributions to testing: add -benchsplit to get more data points Jun 12, 2017

@rsc rsc modified the milestones: Go1.10, Proposal Jun 12, 2017

@meirf

This comment has been minimized.

Show comment
Hide comment
@meirf

meirf Jun 29, 2017

Member

(I've started working on this and plan on having a CL by early next week. Hopefully my recent testing contributions grant me such a runway.)

Member

meirf commented Jun 29, 2017

(I've started working on this and plan on having a CL by early next week. Hopefully my recent testing contributions grant me such a runway.)

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented Jul 5, 2017

CL https://golang.org/cl/47411 mentions this issue.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Nov 29, 2017

Change https://golang.org/cl/80841 mentions this issue: testing: remove claim that b.Run is safe for concurrent use

gopherbot commented Nov 29, 2017

Change https://golang.org/cl/80841 mentions this issue: testing: remove claim that b.Run is safe for concurrent use

@cespare

This comment has been minimized.

Show comment
Hide comment
@cespare

cespare May 14, 2018

Contributor

Has progress on this stalled?

Contributor

cespare commented May 14, 2018

Has progress on this stalled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment