-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvement(query): performance improvement for sorted merge iterator #17596
improvement(query): performance improvement for sorted merge iterator #17596
Conversation
b8c6606
to
22f297a
Compare
@jsternberg @jacobmarble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @foobar thanks for this PR. Very cool!
There are a few things of note:
- I have a couple of small suggestions. The main one is to remove the use of
+1
, and just use1
instead, but I also made a simplification to the code and a possible short circuit. - I don't know if we will put this work into a
1.7
release. Can you change the base branch tomaster-1.x
? If accepted, we can manage the backport to the1.8
branch. - I would like to see some benchmarks showing the changes in performance for this change. I mainly concerned about whether the extra work to detect if the fast condition will be used will be significantly detrimental to other use-cases (queries).
- can you add some test coverage for
detectFast
in the form of unit tests?
@stuartcarnie would you please review this too.
Thanks again @foobar for the contribution!
22f297a
to
9eefd9f
Compare
9eefd9f
to
7aa6fde
Compare
@foobar hi, I'm happy with what's here so far. However, I still would like to see some benchmarking and testing of the new code-path. Further @stuartcarnie may have his own suggestions. Thanks |
78db9df
to
c7d360d
Compare
Thanks for your comments! @e-dard
I reworked the code based on your suggestion.
done
benchmark test added and here is the result on my machine:
added test cases |
Sorted merge iterator has cpu-intensive operations to sort the points from multiple inputs. Typical queries like `SELECT * FROM m GROUP BY *` do not behave well due to the comparison of points though in many cases it doesn't necessarily have to use the slow path. This patch adds a shortcut. If each input has a single and unique series we can just return the points input by input. The detection of the shortcut introduces slight overhead but the gains are significant in many slow queries.
1cd5a47
to
af8e66c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is a great improvement 🥇
May I suggest the following small change, which may be a little easier for a future maintainer to understand. It is also likely to be a little more efficient and avoids the use of strings.Compare
, which is not recommended by the Go documentation and the source itself:
var less func(i, j int) bool
if h.opt.Ascending {
less = func(i, j int) bool {
x, y := s[i].point, s[j].point
if x.Name != y.Name {
return x.Name < y.Name
}
if x.Tags.ID() != y.Tags.ID() {
return x.Tags.ID() < y.Tags.ID()
}
hasDup = true
return false
}
} else {
less = func(i, j int) bool {
x, y := s[i].point, s[j].point
if x.Name != y.Name {
return x.Name > y.Name
}
if x.Tags.ID() != y.Tags.ID() {
return x.Tags.ID() > y.Tags.ID()
}
hasDup = true
return false
}
}
sort.Slice(s, less)
As a follow on to #17596, performance of all merge operations can be improved by removing allocations when comparing tags. `benchstat` results: ``` name old time/op new time/op delta SortedMergeIterator-16 32.4ms ± 2% 5.2ms ± 3% -83.81% (p=0.000 n=10+10) name old alloc/op new alloc/op delta SortedMergeIterator-16 36.5MB ± 0% 5.8MB ± 0% -84.20% (p=0.000 n=10+9) name old allocs/op new allocs/op delta SortedMergeIterator-16 420k ± 0% 60k ± 0% -85.71% (p=0.000 n=9+10) ```
As a follow on to #17596, performance of all merge operations can be improved by removing allocations when comparing tags. `benchstat` results: ``` name old time/op new time/op delta SortedMergeIterator-16 32.4ms ± 2% 5.2ms ± 3% -83.81% (p=0.000 n=10+10) name old alloc/op new alloc/op delta SortedMergeIterator-16 36.5MB ± 0% 5.8MB ± 0% -84.20% (p=0.000 n=10+9) name old allocs/op new allocs/op delta SortedMergeIterator-16 420k ± 0% 60k ± 0% -85.71% (p=0.000 n=9+10) ```
As a follow on to #17596, performance of all merge operations can be improved by removing allocations when comparing tags. `benchstat` results: ``` name old time/op new time/op delta SortedMergeIterator-16 32.4ms ± 2% 5.2ms ± 3% -83.81% (p=0.000 n=10+10) name old alloc/op new alloc/op delta SortedMergeIterator-16 36.5MB ± 0% 5.8MB ± 0% -84.20% (p=0.000 n=10+9) name old allocs/op new allocs/op delta SortedMergeIterator-16 420k ± 0% 60k ± 0% -85.71% (p=0.000 n=9+10) ```
Thanks @stuartcarnie for reviewing this PR.
From the source it shouldn't have significant difference in term of performance; for readability, current code is shorter with @e-dard comments. I'm fine with either. |
@e-dard @stuartcarnie @rickspencer3 any other comments? |
@stuartcarnie can you run with this? I got a bunch of other review backed up :-) |
hi @dgnorton, had you got a chance to look at this ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgnorton everything looks good to me
@dgnorton could you get it merged? |
Please forgive me if I'm off base -- I haven't tested this thoroughly but I think the entire
this should be a bit faster. i haven't tested or benchmarked it though. |
s := make([]*floatSortedMergeHeapItem, len(h.items)) | ||
copy(s, h.items) | ||
|
||
less := func(i, j int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please investigate if something like the following would work as a comparator:
less := func(i, j int) bool {
x, y := s[i].point, s[j].point
hasDup = hasDup || x.Name == y.Name
return ((x.Name < y.Name) || (x.Tags.ID() < y.Tags.ID()) && h.opt.Ascending
}
i think that should be at least as fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayang64 the less
function is required to sort ascending or descending order depending on h.opt.Ascending
. It is unclear to me if the short version above achieves that. Also, your proposed version is setting hasDup
to true
only if x.Name == y.Name
, whereas the original function sets hasDup
to true iif both the Name
and Tags
are equal, which is a required property.
I didn't see any obvious performance issues with the existing code, and the overall improvement is significant.
hi @timhallinflux, could this catch 1.8.1 milestone? |
Sorted merge iterator has cpu-intensive operations to sort the points
from multiple inputs. Typical queries like
SELECT * FROM m GROUP BY *
do not behave well due to the comparison of points though in many cases
it doesn't necessarily have to use the slow path.
This patch adds a shortcut. If each input has a single and unique
series we can just return the points input by input.
The detection of the shortcut introduces slight overhead but the gains
are significant in many slow queries.
See #8304