Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement(query): performance improvement for sorted merge iterator #17596

Merged

Conversation

foobar
Copy link
Contributor

@foobar foobar commented Apr 3, 2020

Sorted merge iterator has cpu-intensive operations to sort the points
from multiple inputs. Typical queries like SELECT * FROM m GROUP BY *
do not behave well due to the comparison of points though in many cases
it doesn't necessarily have to use the slow path.

This patch adds a shortcut. If each input has a single and unique
series we can just return the points input by input.
The detection of the shortcut introduces slight overhead but the gains
are significant in many slow queries.

See #8304

@foobar foobar force-pushed the optimize-sorted-merge-iterator branch from b8c6606 to 22f297a Compare April 3, 2020 07:20
@foobar foobar changed the title feat(query): Performance improvement for sorted merge iterator feat(query): performance improvement for sorted merge iterator Apr 3, 2020
@foobar
Copy link
Contributor Author

foobar commented Apr 3, 2020

@jsternberg @jacobmarble
could you take a look?

@foobar foobar changed the title feat(query): performance improvement for sorted merge iterator improvement(query): performance improvement for sorted merge iterator Apr 3, 2020
e-dard
e-dard previously requested changes Apr 17, 2020
Copy link
Contributor

@e-dard e-dard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @foobar thanks for this PR. Very cool!

There are a few things of note:

  1. I have a couple of small suggestions. The main one is to remove the use of +1, and just use 1 instead, but I also made a simplification to the code and a possible short circuit.
  2. I don't know if we will put this work into a 1.7 release. Can you change the base branch to master-1.x? If accepted, we can manage the backport to the 1.8 branch.
  3. I would like to see some benchmarks showing the changes in performance for this change. I mainly concerned about whether the extra work to detect if the fast condition will be used will be significantly detrimental to other use-cases (queries).
  4. can you add some test coverage for detectFast in the form of unit tests?

@stuartcarnie would you please review this too.

Thanks again @foobar for the contribution!

query/iterator.gen.go.tmpl Outdated Show resolved Hide resolved
query/iterator.gen.go.tmpl Outdated Show resolved Hide resolved
query/iterator.gen.go.tmpl Outdated Show resolved Hide resolved
@foobar foobar changed the base branch from 1.7 to master-1.x April 20, 2020 03:21
@foobar foobar changed the base branch from master-1.x to 1.7 April 20, 2020 03:36
@foobar foobar force-pushed the optimize-sorted-merge-iterator branch from 22f297a to 9eefd9f Compare April 20, 2020 04:01
@foobar foobar changed the base branch from 1.7 to master-1.x April 20, 2020 04:02
@foobar foobar force-pushed the optimize-sorted-merge-iterator branch from 9eefd9f to 7aa6fde Compare April 20, 2020 04:09
@e-dard
Copy link
Contributor

e-dard commented Apr 20, 2020

@foobar hi, I'm happy with what's here so far. However, I still would like to see some benchmarking and testing of the new code-path. Further @stuartcarnie may have his own suggestions. Thanks

@foobar foobar force-pushed the optimize-sorted-merge-iterator branch from 78db9df to c7d360d Compare April 20, 2020 11:35
@foobar
Copy link
Contributor Author

foobar commented Apr 20, 2020

Thanks for your comments! @e-dard

  1. I have a couple of small suggestions. The main one is to remove the use of +1, and just use 1 instead, but I also made a simplification to the code and a possible short circuit.

I reworked the code based on your suggestion.

  1. I don't know if we will put this work into a 1.7 release. Can you change the base branch to master-1.x? If accepted, we can manage the backport to the 1.8 branch.

done

  1. I would like to see some benchmarks showing the changes in performance for this change. I mainly concerned about whether the extra work to detect if the fast condition will be used will be significantly detrimental to other use-cases (queries).

benchmark test added and here is the result on my machine:

go test -bench=BenchmarkSorted ./query/...
goos: linux
goarch: amd64
pkg: github.com/influxdata/influxdb/query
BenchmarkSortedMergeIterator_Fast-32                         166           6738013 ns/op
BenchmarkSortedMergeIterator_NotFast-32                       19          61152186 ns/op
BenchmarkSortedMergeIterator_FastCheckOverhead-32        1718110               700 ns/op
  1. can you add some test coverage for detectFast in the form of unit tests?

added test cases

@foobar foobar requested a review from e-dard April 20, 2020 13:06
Sorted merge iterator has cpu-intensive operations to sort the points
from multiple inputs. Typical queries like `SELECT * FROM m GROUP BY *`
do not behave well due to the comparison of points though in many cases
it doesn't necessarily have to use the slow path.

This patch adds a shortcut.  If each input has a single and unique
series we can just return the points input by input.
The detection of the shortcut introduces slight overhead but the gains
are significant in many slow queries.
@foobar foobar force-pushed the optimize-sorted-merge-iterator branch from 1cd5a47 to af8e66c Compare April 20, 2020 13:06
Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a great improvement 🥇

May I suggest the following small change, which may be a little easier for a future maintainer to understand. It is also likely to be a little more efficient and avoids the use of strings.Compare, which is not recommended by the Go documentation and the source itself:

	var less func(i, j int) bool
	if h.opt.Ascending {
		less = func(i, j int) bool {
			x, y := s[i].point, s[j].point
			if x.Name != y.Name {
				return x.Name < y.Name
			}

			if x.Tags.ID() != y.Tags.ID() {
				return x.Tags.ID() < y.Tags.ID()
			}

			hasDup = true
			return false
		}
	} else {
		less = func(i, j int) bool {
			x, y := s[i].point, s[j].point
			if x.Name != y.Name {
				return x.Name > y.Name
			}

			if x.Tags.ID() != y.Tags.ID() {
				return x.Tags.ID() > y.Tags.ID()
			}

			hasDup = true
			return false
		}
	}
	sort.Slice(s, less)

stuartcarnie added a commit that referenced this pull request Apr 22, 2020
As a follow on to #17596, performance of all merge operations can
be improved by removing allocations when comparing tags.

`benchstat` results:

```
name                    old time/op    new time/op    delta
SortedMergeIterator-16    32.4ms ± 2%     5.2ms ± 3%  -83.81%  (p=0.000 n=10+10)

name                    old alloc/op   new alloc/op   delta
SortedMergeIterator-16    36.5MB ± 0%     5.8MB ± 0%  -84.20%  (p=0.000 n=10+9)

name                    old allocs/op  new allocs/op  delta
SortedMergeIterator-16      420k ± 0%       60k ± 0%  -85.71%  (p=0.000 n=9+10)
```
stuartcarnie added a commit that referenced this pull request Apr 22, 2020
As a follow on to #17596, performance of all merge operations can
be improved by removing allocations when comparing tags.

`benchstat` results:

```
name                    old time/op    new time/op    delta
SortedMergeIterator-16    32.4ms ± 2%     5.2ms ± 3%  -83.81%  (p=0.000 n=10+10)

name                    old alloc/op   new alloc/op   delta
SortedMergeIterator-16    36.5MB ± 0%     5.8MB ± 0%  -84.20%  (p=0.000 n=10+9)

name                    old allocs/op  new allocs/op  delta
SortedMergeIterator-16      420k ± 0%       60k ± 0%  -85.71%  (p=0.000 n=9+10)
```
stuartcarnie added a commit that referenced this pull request Apr 22, 2020
As a follow on to #17596, performance of all merge operations can
be improved by removing allocations when comparing tags.

`benchstat` results:

```
name                    old time/op    new time/op    delta
SortedMergeIterator-16    32.4ms ± 2%     5.2ms ± 3%  -83.81%  (p=0.000 n=10+10)

name                    old alloc/op   new alloc/op   delta
SortedMergeIterator-16    36.5MB ± 0%     5.8MB ± 0%  -84.20%  (p=0.000 n=10+9)

name                    old allocs/op  new allocs/op  delta
SortedMergeIterator-16      420k ± 0%       60k ± 0%  -85.71%  (p=0.000 n=9+10)
```
@foobar
Copy link
Contributor Author

foobar commented Apr 23, 2020

Thanks @stuartcarnie for reviewing this PR.

May I suggest the following small change, which may be a little easier for a future maintainer to understand. It is also likely to be a little more efficient and avoids the use of strings.Compare, which is not recommended by the Go documentation and [the source itself]

From the source it shouldn't have significant difference in term of performance; for readability, current code is shorter with @e-dard comments. I'm fine with either.
@e-dard your thought?

@foobar
Copy link
Contributor Author

foobar commented May 4, 2020

@e-dard @stuartcarnie @rickspencer3 any other comments?

@e-dard
Copy link
Contributor

e-dard commented May 6, 2020

@stuartcarnie can you run with this? I got a bunch of other review backed up :-)

@stuartcarnie
Copy link
Contributor

I am ok, as-is. Thanks again, @foobar!

@dgnorton if you are happy, you are welcome to merge it

@foobar
Copy link
Contributor Author

foobar commented May 11, 2020

hi @dgnorton, had you got a chance to look at this ?

Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgnorton everything looks good to me

@foobar
Copy link
Contributor Author

foobar commented May 15, 2020

@dgnorton could you get it merged?

@ayang64
Copy link
Contributor

ayang64 commented May 18, 2020

@foobar, @stuartcarnie

Please forgive me if I'm off base -- I haven't tested this thoroughly but I think the entire less func could be simplified to something like this:

less := func(i, j int) bool {
  x, y := s[i].point, s[j].point
  hasDup = hasDup ||  x.Name == y.Name
  return ((x.Name < y.Name) || (x.Tags.ID() < y.Tags.ID()) && h.opt.Ascending
}

this should be a bit faster. i haven't tested or benchmarked it though.

s := make([]*floatSortedMergeHeapItem, len(h.items))
copy(s, h.items)

less := func(i, j int) bool {
Copy link
Contributor

@ayang64 ayang64 May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please investigate if something like the following would work as a comparator:

less := func(i, j int) bool {
  x, y := s[i].point, s[j].point
  hasDup = hasDup || x.Name == y.Name
  return ((x.Name < y.Name) || (x.Tags.ID() < y.Tags.ID()) && h.opt.Ascending
}

i think that should be at least as fast.

Copy link
Contributor

@stuartcarnie stuartcarnie May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayang64 the less function is required to sort ascending or descending order depending on h.opt.Ascending. It is unclear to me if the short version above achieves that. Also, your proposed version is setting hasDup to true only if x.Name == y.Name, whereas the original function sets hasDup to true iif both the Name and Tags are equal, which is a required property.

I didn't see any obvious performance issues with the existing code, and the overall improvement is significant.

@e-dard e-dard removed their request for review May 22, 2020 09:13
@foobar foobar requested review from ayang64 and e-dard June 11, 2020 06:27
@foobar
Copy link
Contributor Author

foobar commented Jun 11, 2020

hi @timhallinflux, could this catch 1.8.1 milestone?

@dgnorton dgnorton dismissed stale reviews from ayang64 and e-dard June 23, 2020 16:47

See Stuarts response

@dgnorton dgnorton merged commit 78a05d1 into influxdata:master-1.x Jun 23, 2020
@foobar foobar deleted the optimize-sorted-merge-iterator branch June 24, 2020 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants