Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: allow `testing.B` to use real execution duration to predicate `b.N` #40227

Open
joesonw opened this issue Jul 15, 2020 · 4 comments
Open
Labels
Projects
Milestone

Comments

@joesonw
Copy link

@joesonw joesonw commented Jul 15, 2020

When trying something like below. It can lead to benchmark TIMEOUT. Because benchmark only takes account of recorded duration rather than actual execution duration when predicting next b.N.

func BenchmarkXXX(b *testing.B) {
    buf := NewSomeBuffer()

    b.Run("Write", func (b *testing.B) {  // produced 1,000,000 samples (for example)
        for i := 0; i < b.N; i++ {
            buf.ExpansiveWrite()
        }
    })

    b.Run("Read", func (b *testing.B) {
      /*
         produced 10,000,000 samples (again, for example)
         it should be somewhere slightly below 1,000,000 samples, since read operation is really cheap here.
     */
        b.StopTimer()
        for i := 0; i < b.N; i++ {
            buf.ExpansiveWrite()
        }
        b.StartTimer()

        // only trying to measure read here.
        for i := 0; i < b.N; i++ {
            buf.CheapRead()
        }
    })
}

b.start = time.Now()

b.duration += time.Since(b.start)

go/src/testing/benchmark.go

Lines 296 to 324 in c5d7f2f

d := b.benchTime.d
for n := int64(1); !b.failed && b.duration < d && n < 1e9; {
last := n
// Predict required iterations.
goalns := d.Nanoseconds()
prevIters := int64(b.N)
prevns := b.duration.Nanoseconds()
if prevns <= 0 {
// Round up, to avoid div by zero.
prevns = 1
}
// Order of operations matters.
// For very fast benchmarks, prevIters ~= prevns.
// If you divide first, you get 0 or 1,
// which can hide an order of magnitude in execution time.
// So multiply first, then divide.
n = goalns * prevIters / prevns
// Run more iterations than we think we'll need (1.2x).
n += n / 5
// Don't grow too fast in case we had timing errors previously.
n = min(n, 100*last)
// Be sure to run at least one more than last time.
n = max(n, last+1)
// Don't run more than 1e9 times. (This also keeps n in int range on 32 bit platforms.)
n = min(n, 1e9)
b.runN(int(n))
}
}
b.result = BenchmarkResult{b.N, b.duration, b.bytes, b.netAllocs, b.netBytes, b.extra}

Maybe there could be b.ResetDuration() which only resets output duration without polluting prediction algorithm.

@gopherbot gopherbot added this to the Proposal milestone Jul 15, 2020
@gopherbot gopherbot added the Proposal label Jul 15, 2020
@martisch
Copy link
Contributor

@martisch martisch commented Jul 15, 2020

I usually use https://golang.org/pkg/testing/#B.ResetTimer after setup code and before starting the benchmark loop. Does that work better for your use case?

@joesonw
Copy link
Author

@joesonw joesonw commented Jul 15, 2020

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Jul 15, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Jul 22, 2020

I can't think of a scenario where this kind of per-size setup would lead to valid benchmark results. You're kind of lucky to be getting a timeout - the alternative would be completely bogus results.

If the setup for a loop of length b.N takes longer than the thing you're timing, benchmarks are going to have a hard time getting a precise read. The implication is that the actual operation you are doing is changing based on b.N, which invalidates the entire benchmark timing computation: a buffer of 1 GB is going to have very different cache performance than a buffer of 1 MB, but the benchmark routine depends fundamentally on b.N=1<<30 doing exactly the same operation as b.N=1<<20, just 1<<10 more times.

@joesonw
Copy link
Author

@joesonw joesonw commented Jul 23, 2020

@rsc It's NOT per-size setup. It's what happened (which should not). I was merely trying to measure read and write performance separately. The current benchmark prediction algorithm was not able to handle it correctly (when expansive operations are ex luded).

I revised the code comment in description, it should be more intuitive now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.