Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the math engine to compile the query and use eval #9563

Merged
merged 1 commit into from
Mar 20, 2018

Conversation

jsternberg
Copy link
Contributor

@jsternberg jsternberg commented Mar 13, 2018

This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.

It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.

This also makes use of the new eval functionality that was recently
added to the influxql package.

No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.

This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:

SELECT max(value) * 2, max(value) / 2 FROM cpu

This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.

Fixes #9511.

@hercules-influx
Copy link
Contributor

During a run of megacheck the following issues were discovered:

@hercules-influx
Copy link
Contributor

During a run of megacheck the following issues were discovered:

@hercules-influx
Copy link
Contributor

During a run of megacheck the following issues were discovered:

@hercules-influx
Copy link
Contributor

During a run of megacheck the following issues were discovered:

/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1155:6: type integerFloatTransformIterator is unused (U1000)
/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1180:6: type integerFloatTransformFunc is unused (U1000)
/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1182:6: type integerFloatCastIterator is unused (U1000)
/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1204:6: type unsignedFloatCastIterator is unused (U1000)

@jsternberg jsternberg force-pushed the js-9511-math-eval-iterators branch 5 times, most recently from 37b7a03 to acf48df Compare March 14, 2018 21:03
Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. I have added some comments / question in context.

var SkipDefault = interface{}(0)

// NewIteratorScanner produces an IteratorScanner for the Iterator.
func NewIteratorScanner(input Iterator, keys []string, defaultValue interface{}) IteratorScanner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the reduction in complexity and locking by removing the aux iterator, have you seen any perf improvements as a result of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to test it. I'm not sure how thorough our benchmarks are though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a change to reduce memory allocations. There was a regression in memory allocations because the time was being taken care of by eval which resulted in an extra allocation of an interface for every row. Here's the numbers for the one with the memory optimization that avoids the extra allocation:

benchmark                            old ns/op      new ns/op     delta
BenchmarkCountIterator_1K-8          42116          43269         +2.74%
BenchmarkCountIterator_100K-8        4191261        4113898       -1.85%
BenchmarkCountIterator_1M-8          41485285       39940193      -3.72%
BenchmarkSampleIterator_1k-8         464            461           -0.65%
BenchmarkSampleIterator_100k-8       410            412           +0.49%
BenchmarkSampleIterator_1M-8         321            306           -4.67%
BenchmarkDistinctIterator_1K-8       74546          72119         -3.26%
BenchmarkDistinctIterator_100K-8     6858535        6741477       -1.71%
BenchmarkDistinctIterator_1M-8       68455510       69030348      +0.84%
BenchmarkModeIterator_1K-8           103948         111675        +7.43%
BenchmarkModeIterator_100K-8         34101326       34786063      +2.01%
BenchmarkModeIterator_1M-8           408322854      232866597     -42.97%
BenchmarkSelect_Raw_1K-8             1724663        197585        -88.54%
BenchmarkSelect_Raw_100K-8           1808846670     192179458     -89.38%
BenchmarkSelect_Dedupe_1K-8          38699956       30059134      -22.33%

benchmark                            old allocs     new allocs     delta
BenchmarkCountIterator_1K-8          11             11             +0.00%
BenchmarkCountIterator_100K-8        11             11             +0.00%
BenchmarkCountIterator_1M-8          11             11             +0.00%
BenchmarkSampleIterator_1k-8         3              3              +0.00%
BenchmarkSampleIterator_100k-8       3              3              +0.00%
BenchmarkSampleIterator_1M-8         3              3              +0.00%
BenchmarkDistinctIterator_1K-8       18             18             +0.00%
BenchmarkDistinctIterator_100K-8     18             18             +0.00%
BenchmarkDistinctIterator_1M-8       18             18             +0.00%
BenchmarkModeIterator_1K-8           22             22             +0.00%
BenchmarkModeIterator_100K-8         42             42             +0.00%
BenchmarkModeIterator_1M-8           53             53             +0.00%
BenchmarkSelect_Raw_1K-8             2068           2091           +1.11%
BenchmarkSelect_Raw_100K-8           2000123        2000096        -0.00%
BenchmarkSelect_Dedupe_1K-8          302008         302032         +0.01%

benchmark                            old bytes     new bytes     delta
BenchmarkCountIterator_1K-8          928           928           +0.00%
BenchmarkCountIterator_100K-8        945           945           +0.00%
BenchmarkCountIterator_1M-8          1101          1032          -6.27%
BenchmarkSampleIterator_1k-8         400           400           +0.00%
BenchmarkSampleIterator_100k-8       400           400           +0.00%
BenchmarkSampleIterator_1M-8         400           400           +0.00%
BenchmarkDistinctIterator_1K-8       7052          7053          +0.01%
BenchmarkDistinctIterator_100K-8     7058          7054          -0.06%
BenchmarkDistinctIterator_1M-8       7032          7032          +0.00%
BenchmarkModeIterator_1K-8           197152        197152        +0.00%
BenchmarkModeIterator_100K-8         44999200      44999200      +0.00%
BenchmarkModeIterator_1M-8           529293856     529293856     +0.00%
BenchmarkSelect_Raw_1K-8             44086         53304         +20.91%
BenchmarkSelect_Raw_100K-8           40010976      48006068      +19.98%
BenchmarkSelect_Dedupe_1K-8          4209411       4210828       +0.03%

There's still more memory allocated in general for raw queries for number of bytes used, but it's not as large as the previous change. I'll keep trying to track it down.

CPU usage is generally better across the board with this technique though.

@@ -1206,6 +1206,20 @@ func (a Shards) MapType(measurement, field string) influxql.DataType {
return typ
}

func (a Shards) CallType(name string, args []influxql.DataType) (influxql.DataType, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is CallType added to the Shard type – I do not understand the relevance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is needed anymore.

The original purpose was to be similar to MapType. This function was supposed to say, "The shards implement these functions and here are the return types as reported by the storage engine." It still may be worth doing that, but it seems like this method is captured by another struct I created to update the type mapper.

I think I'm going to keep this for consistency and remove the other struct that intercepts this method call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I finally remember now.

So yes, this is meant to expose the return values for a function and eventually do some type checking too so we can error out earlier in the iterator creation process. The hookups for more user friendly error messages and validation aren't done, but they also aren't that important since later in the system it will find out the types are bad anyway.

The reason this is being added to shard is for consistency. The field mapper that is used in query/functions.go wraps a group of shards to expose all of the values from the query engine. Technically, that section of code can handle returning the types since both the storage engine and the query engine are capable of handling call iterators. I like this here though so there's no surprise that the Shards type can handle a mean() call, but doesn't implement CallType to tell the type checker what type it is. I've now refactored it though so they use the same underlying code base though so there is less confusion and less of a chance they will become disconnected.

@jsternberg jsternberg force-pushed the js-9511-math-eval-iterators branch 2 times, most recently from 20ea562 to 2494c79 Compare March 19, 2018 17:06
This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.

It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.

This also makes use of the new eval functionality that was recently
added to the influxql package.

No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.

This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:

    SELECT max(value) * 2, max(value) / 2 FROM cpu

This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.
Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@jsternberg jsternberg merged commit 49ed5f3 into master Mar 20, 2018
@jsternberg jsternberg deleted the js-9511-math-eval-iterators branch March 20, 2018 19:13
@ghost ghost removed the review label Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants