-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: benchmark iteration reports incorrectly #41637
Comments
Because the benchmark function is run more than once.
|
@davecheney Yes. Should a fix go for the doc or go for the code? |
First, what's the actual problem? |
The test states the actual problem. |
I don't think this is a bug, |
Maybe. But the doc left the impression that the target code runs b.N times. If I have a shared resource, running the benchmark multiple times leads to more manipulation on the shared resource, which isn't what was expected. The implementation could run the benchmark incrementally other than trying multiple times to predict the iteration. In this way, the testing facility could offer a more consistent result. The other option could be at least document this behavior. |
Change https://golang.org/cl/257647 mentions this issue: |
Doesn't seem like a bug. The benchmark function runs more than once with varying numbers of iterations. It starts with a small number of iterations so that if the benchmark function is slow, it doesn't take too long. It ramps up to a larger number of iterations for functions, since shorter runs will be noisy. What would change in the documentation? It seems concise but complete. The sentence "During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably." describes this behavior. |
@jayconrod Thanks for your response.
Is this the core reason that the current implementation iterates multiple times to predict the actual run batch size? As mentioned above, the implementation could run the benchmark incrementally other than trying multiple times. Statistically speaking, the final average using
Not really. The first impression could be: "b.N increases incrementally until it meets the time constraints." The confusing is how "be timed reliability" was defined in this case. There is no further prior knowledge on the benchmark function, it is totally possible with the default time window (1s) is too short, and a developer is responsible to decide the bench time. Thus the feasible way to produce comparable results is still to further verify with Will you willing to review CL 257647? It suggests code change other than doc change and also addresses #27217. |
I think so. I'm not super-familiar with this code, but that's probably how I would write it.
I think by "incrementally", you mean that That could add significant noise in fast benchmarks where the loop body only takes a few cycles to execute, i.e., fewer cycles than it takes to measure elapsed time and do the other benchmarking calculations. It's been a while since I did low-level benchmarking, but on older architectures, getting the time took a couple hundred cycles since it flushed CPU pipelines. That stuff adds a lot of noise. |
Indeed, that's the theoretical impression. But I benchmarked a few benchmarks, for instance, the go1 bench. Two executions of the go1 bench before CL 257647 (tip v.s. tip):
Go 1 bench before and after the CL (tip v.s. tip+CL):
which doesn't show significant noise in fast benchmarks (with the CL) on i5-8500B. Instead, the execution time of the benchmark is significantly improved from |
These are slower benchmarks than I was thinking of. I'd expect to see more noise in something that runs in 10 ns or less, like acquiring an uncontested lock. The execution time improvement is nice, but precision and accuracy are the highest priority. Since I don't think the current methodology is actually wrong, any proposed change should at least hold those steady. There are some pretty sizable differences in times above ( |
The diff that shows bigger diff is clearly not caused by the CL, as you can tell from the others. Since we are talking about microbenchmarks here, thus I don't fully convince by this argument. Could you maybe suggest more counter-examples? As you agreed, we are talking about code that runs in 10ns or less. It is pretty easy to verify with atomics:
tip vs tip+CL:
|
What is the concrete problem you are trying to address here?
|
For improving the overall execution time of fast benchmarks, see #10930. |
Running the benchmark incrementally would potentially be faster, but not necessarily more consistent. Many benchmarks have a non-trivial transient at the start of the benchmark (for example, due to CPU cache misses). Summing the results of multiple incremental runs would also sum the transient effects, whereas taking only the last run (with the final computed |
Yes, I know. But the argument of suggesting running it incrementally is that the noise can be omitted when later verify with significance testing.
The original purpose is to address "I have a shared resource, running the benchmark leads to more manipulation on the shared resource, which isn't what was expected." A subsequent investigation found that it is also one of the causes of #27217, as described in CL 257647 message. |
Note that the |
Understood. Close because no changes are suggested to proceed. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes. 1.15.2.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
According to the doc:
But the following test fails and the benchmark runs the target code more than b.N times.
What did you expect to see?
PASS
What did you see instead?
The text was updated successfully, but these errors were encountered: