-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Use binary search to pick bucket. #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do you have a benchmark for this? Branch misprediction may cause this to be sub-optimal. |
Yeah, would be interesting to see at what bucket counts this actually amortizes (very large, I suspect) and what the time overhead is for more normal bucket counts. |
Benchmarks are in the code. The difference is within the statistical noise for the very low bucket count. So it is not measurable slower, but one day, somebody wants to have 10,000 buckets, and that will be my moment of triumph... |
Hmm, in a totally sequential use case like this: func BenchmarkHistogramSerial(b *testing.B) {
b.StopTimer()
s := NewHistogram(HistogramOpts{})
b.StartTimer()
for i := 0; i < b.N; i++ {
s.Observe(<insert value here>)
}
} For the old code, I get 55ns/op for observing For the new code, I get 88ns/op pretty much no matter which value I observe in the bucket range. |
Benchmarks are already provided in the code. I ran them and could not see any difference except noise. The reason probably is that the benchmarks in the code do a bit more (many goroutines in parallel, observe values from an array of samples, not just a constant value) so that more things add to the time that do not depend on the search strategy. Extrapolating from your result above, the break even would be reached at about 40 buckets, which is still quite realistic. We can try that out. I also want to accommodate the case with 100 or 1000 buckets, even if I have to pay 20ns penalty per observation for cases with few buckets. (We should try that out. Once I'm in...) We were joking about the death spiral (latencies increase, and now even the time to Observe() those latencies increase...). But should those 20ns really matter, than we really don't want to have the observe time depend on the observed value. |
So there is actually a pretty "pure" microbenchmark in the code: BenchmarkHistogramObserve1 My results: 27.1 ns/op with linear search, 33.6 ns/op with binary search. If it were only for me, I'd consider these changes irrelevant in practice and go for binary search just to not get linearly increasing observe times with higher bucket counts. If you insist, I will run benchmarks with higher bucket counts (and 'fairer' conditions where most of the observations happen in the middle buckets). I'll find the break-even point and will implement a switch to the most efficient search method depending on bucket count. (I just thing I have more pressing things to do...) |
Agreed, that's something we can do later. Let's just merge this for now, but keep in mind it's something that can be optimized later, if needed. 👍 |
I'll add a TODO for that. |
I couldn't resist and ran a benchmark (BenchmarkHistogramNoLabels, which is almost exactly like Julius's benchmark above). |
With the usual number of buckets, this doesn't really make a difference, but it should scale... See the added TODO for the precise numbers.
9e46d41
to
79efd06
Compare
Use binary search to pick bucket.
With the usual number of buckets, this doesn't really make a
difference, but it should scale...
@juliusv @brian-brazil