Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support subquery execution in the query language #7646

Merged
merged 1 commit into from Jan 9, 2017

Conversation

jsternberg
Copy link
Contributor

This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

SELECT mean(value) FROM cpu
SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
mean(value) at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.

Fixes #4619.

@jsternberg
Copy link
Contributor Author

Functioning subqueries in the open source version. Still needs to have a lot of tests added (I only ran some manual tests) and I also need to adapt the refactor in the query engine to also work in the closed source version.

So while this isn't ready to merge, this is a substantial step in the direction of having this implemented and merged.

@jsternberg jsternberg force-pushed the js-4619-subqueries branch 5 times, most recently from 11f10cb to b0a29fc Compare November 21, 2016 22:00
@jsternberg jsternberg force-pushed the js-4619-subqueries branch 3 times, most recently from 2f9a3c2 to 25b9512 Compare November 22, 2016 20:51
@jsternberg
Copy link
Contributor Author

@jwilder found a problem with some queries. Listing them here so I make sure to fix them before we merge this.

  1. Any fill iterator in an inner query causes no results to be returned when there is another aggregate in the outer query. If you use fill(none) on an inner query everything works fine, but this may be due to some aggregators not properly handling null values.
  2. The following query will likely crash the query engine: SELECT mean(mean) FROM (SELECT mean(value) FROM cpu GROUP BY time(10s)) WHERE time > now() - 1m. This is because the outer query has the default end time set to the maximum time because there is no GROUP BY time(...) and then the inner query inherits that end time and the fill iterator goes crazy generating points until the end of time.

@jsternberg jsternberg force-pushed the js-4619-subqueries branch 4 times, most recently from 1224503 to 8deb045 Compare November 25, 2016 04:30
@jsternberg
Copy link
Contributor Author

I've resolved the previous two issues. There is one more issue that came up. If you have something like GROUP BY host, time(...) in the inner query and don't have the host part in the outer query, the results are incorrect.

The reason for this is because we return all rows from each series in the inner query before continuing to the next series. The problem is, the outer query needs to then merge these different series and can't because they aren't ordered correctly. This is a more difficult problem. I'll include more notes on this later.

@jsternberg
Copy link
Contributor Author

A larger explanation of the above problem.

So the problem arises because now we have more than two levels of grouping that we can use. The problem didn't show up when there was only one level of grouping because we could figure out the grouping at the beginning of the query and didn't have to worry about it being grouped in a different way later. The problem arises when you have intervals and group by at least one tag in the inner query.

When we output the points for the final time, we arrange for all of the points in one series to be output before the points in another series. But, when the points are grouped into the same bucket, we need to output all of the points within the same interval for the grouped series before continuing to the next interval. This fundamental difference in how the points are output causes the problem. After processing the first aggregate, we have merged all of the series that are being output into the same stream so we can't read concurrently from different streams to group them together again. We have to read the full stream.

My current idea is to stop merging different series into the same iterator. That would resolve the issue, but I have concerns about how it would affect the closed source version since it might mean we need a socket per grouped series. If you do something like GROUP BY * on a high cardinality database, it would crash clustering and that's not acceptable.

My other idea is to have the aggregates start outputting points in a way that the next aggregate iterator can process, but I haven't worked out how to do that yet and I don't know if it will hold when you have an inner query within an inner query.

@jsternberg
Copy link
Contributor Author

My other idea is to have the aggregates start outputting points in a way that the next aggregate iterator can process, but I haven't worked out how to do that yet and I don't know if it will hold when you have an inner query within an inner query.

After taking a few days to clear my head by working on something else, this is my current favored idea and I actually think it will work.

So the important part is determining where the top level aggregate is located and determining how to structure the iterators based on that top level aggregate. I still need to think about SELECT mean FROM (SELECT mean(value) FROM cpu WHERE ... GROUP BY time(10s)) and make sure I understand that one correctly too. I also need to think about 3 levels rather than just 2, but hopefully this makes some progress.

I figure that if we have a query like this:

SELECT mean(mean) FROM (SELECT mean(value) FROM cpu GROUP BY *, time(10s)) WHERE ... GROUP BY time(10s), host

And the cpu measurement has host and region as tags, we have two different conditions for how the iterator is structured. The host tag key at the top level needs to be organized so all points from each output series is read completely before continuing to the next iterator. But, for the inner query, we need to read points from each interval. So we need to find all of the series and organize them into groups by their host and then use a combination of MergeIterator and CallIterator on the internal regions. I think we already have the code for this, but we are just using CallIterator incorrectly because we have never had a situation where the outputs were different than the internal groupings during processing.

I think I'll start playing with this and see if it helps... Likely a lot of edge cases I haven't thought of. On that list is also what to do when GROUP BY time(...) intervals are different between the inner and outer query.

@jsternberg jsternberg force-pushed the js-4619-subqueries branch 2 times, most recently from b01f384 to 03c48d7 Compare December 5, 2016 19:23
@jsternberg
Copy link
Contributor Author

I believe this to now be ready for further testing. I believe the second idea that I had worked. Basically, the only dimensions we have to order by are the last ones. If those dimensions get passed separate from the ones we need to group by, we can prepare the iterators at the lowest level to order data in an appropriate way.

I have not done exhaustive testing of this. I'm not even 100% certain that the output is correct yet, but I wanted to give a heads up for anybody who wants to try it and can give feedback. There's no more repeating timestamps so I think it's working correctly.

@jsternberg jsternberg force-pushed the js-4619-subqueries branch 2 times, most recently from 16d3f2a to 82f2576 Compare December 5, 2016 20:59
@jsternberg
Copy link
Contributor Author

I've now included a test for a different GROUP BY in the inner query than the outer query. I've also included tests making sure SELECT ... FROM (SELECT ... GROUP BY time(2s)) GROUP BY time(4s) works and it does!

Looking much better now. This is ready for further testing.

@jsternberg jsternberg added this to the 1.2.0 milestone Dec 6, 2016
@desa
Copy link
Contributor

desa commented Dec 7, 2016

@jsternberg Just started playing with this branch and I came across this panic

panic: runtime error: index out of range

goroutine 345 [running]:
panic(0x5e0040, 0xc42000c120)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/influxdata/influxdb/influxql.auxIteratorFields.send(0xc42cb185b8, 0x1, 0x1, 0xa001c0, 0xc422c323c0, 0x180001)
	/Users/michaeldesa/go/src/github.com/influxdata/influxdb/influxql/iterator.go:484 +0x79f
github.com/influxdata/influxdb/influxql.(*integerAuxIterator).stream(0xc42fe14660)
	/Users/michaeldesa/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:3337 +0xf0
created by github.com/influxdata/influxdb/influxql.(*integerAuxIterator).Start
	/Users/michaeldesa/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:3312 +0x3f

for this query

select bottom(bottom,3),some from (select bottom(n, 10) from ctr where time > now() - 3m group by *)

To repro run influx-stress insert and then issue that query on the stress db.

Edit:

Just confirmed the following queries also cause the panic

select bottom(n,3),some from (select * from ctr where time > now() - 3m group by *)
select bottom(n,3),some from (select * from ctr where time > now() - 3m)

@desa
Copy link
Contributor

desa commented Dec 7, 2016

I'm a little confused as to why

select * from (select * from ctr where time > now() - 10s group by *) limit 10

yields the error

ERR: SELECT * FROM (SELECT n::integer FROM stress.autogen.ctr WHERE time > '2016-12-07T18:58:55.33862792Z' GROUP BY some) LIMIT 10 [panic:unreachable]

but

select n,some from (select n,some from ctr where time > now() - 10s group by *) limit 10

yields the expected results

name: ctr
time				n	some
----				-	----
2016-12-07T19:07:26.201389237Z	2560	tag-99999
2016-12-07T19:07:26.201389237Z	2565	tag-99998
2016-12-07T19:07:26.201389237Z	2565	tag-99997
2016-12-07T19:07:26.201389237Z	2565	tag-99996
2016-12-07T19:07:26.201389237Z	2565	tag-99995
2016-12-07T19:07:26.201389237Z	2565	tag-99994
2016-12-07T19:07:26.201389237Z	2565	tag-99993
2016-12-07T19:07:26.201389237Z	2565	tag-99992
2016-12-07T19:07:26.201389237Z	2565	tag-99991
2016-12-07T19:07:26.201389237Z	2565	tag-99990

@desa
Copy link
Contributor

desa commented Dec 7, 2016

Just discovered that slimit in the outer query isn't respected.

The query

select n,some from (select n,some from ctr where time > now() - 10s group by *) group by * slimit 1

yields all of the series available to the group by *

name: ctr
tags: some=tag-0
time				n	some
----				-	----
2016-12-07T19:14:59.909782426Z	123999	tag-0
2016-12-07T19:15:00.909617816Z	124999	tag-0
2016-12-07T19:15:01.909737258Z	125999	tag-0
2016-12-07T19:15:02.909665162Z	126999	tag-0
2016-12-07T19:15:03.909651512Z	127999	tag-0
2016-12-07T19:15:04.909512303Z	128999	tag-0
2016-12-07T19:15:05.909192904Z	129999	tag-0
2016-12-07T19:15:06.909494353Z	130999	tag-0
2016-12-07T19:15:07.909427926Z	131999	tag-0
2016-12-07T19:15:08.90937058Z	132999	tag-0

...


name: ctr
tags: some=tag-9
time				n	some
----				-	----
2016-12-07T19:14:59.909782426Z	123999	tag-9
2016-12-07T19:15:00.909617816Z	124999	tag-9
2016-12-07T19:15:01.909737258Z	125999	tag-9
2016-12-07T19:15:02.909665162Z	126999	tag-9
2016-12-07T19:15:03.909651512Z	127999	tag-9
2016-12-07T19:15:04.909512303Z	128999	tag-9
2016-12-07T19:15:05.909192904Z	129999	tag-9
2016-12-07T19:15:06.909494353Z	130999	tag-9
2016-12-07T19:15:07.909427926Z	131999	tag-9
2016-12-07T19:15:08.90937058Z	132999	tag-9

@desa
Copy link
Contributor

desa commented Dec 7, 2016

Another thing I came across was that is consistent with how CQs work, but that is technically kind of wrong. If you select explicitly for a tag in the select statement of the sub query, then that tag is transformed into a field in the super query. For example

select n,some from (select n,some from ctr where time > now() - 10s) group by * limit 10 slimit 2

yields

name: ctr
time				n	some
----				-	----
2016-12-07T19:17:15.907797744Z	259999	tag-0
2016-12-07T19:17:15.907797744Z	259999	tag-9
2016-12-07T19:17:15.907797744Z	259999	tag-2
2016-12-07T19:17:15.907797744Z	259999	tag-6
2016-12-07T19:17:15.907797744Z	259999	tag-5
2016-12-07T19:17:15.907797744Z	259999	tag-1
2016-12-07T19:17:15.907797744Z	259999	tag-4
2016-12-07T19:17:15.907797744Z	259999	tag-3
2016-12-07T19:17:15.907797744Z	259999	tag-8
2016-12-07T19:17:15.907797744Z	259999	tag-7

whereas I would have expected

name: ctr
tags: some=tag-0
time				n	some
----				-	----
2016-12-07T19:23:54.908624858Z	658999	tag-0
2016-12-07T19:23:55.9104743Z	659999	tag-0
2016-12-07T19:23:56.912186802Z	660999	tag-0
2016-12-07T19:23:57.910213495Z	661999	tag-0
2016-12-07T19:23:58.912166277Z	662999	tag-0
2016-12-07T19:23:59.90794996Z	663999	tag-0
2016-12-07T19:24:00.907246247Z	664999	tag-0
2016-12-07T19:24:01.910092278Z	665999	tag-0
2016-12-07T19:24:02.912045077Z	666999	tag-0
2016-12-07T19:24:03.908263156Z	667999	tag-0

name: ctr
tags: some=tag-1
time				n	some
----				-	----
2016-12-07T19:23:54.908624858Z	658999	tag-1
2016-12-07T19:23:55.9104743Z	659999	tag-1
2016-12-07T19:23:56.912186802Z	660999	tag-1
2016-12-07T19:23:57.910213495Z	661999	tag-1
2016-12-07T19:23:58.912166277Z	662999	tag-1
2016-12-07T19:23:59.90794996Z	663999	tag-1
2016-12-07T19:24:00.907246247Z	664999	tag-1
2016-12-07T19:24:01.910092278Z	665999	tag-1
2016-12-07T19:24:02.912045077Z	666999	tag-1
2016-12-07T19:24:03.908263156Z	667999	tag-1

@jsternberg jsternberg force-pushed the js-4619-subqueries branch 4 times, most recently from fbd3794 to 4a08229 Compare December 20, 2016 18:39
func IsSelector(expr Expr) bool {
if call, ok := expr.(*Call); ok {
switch call.Name {
case "first", "last", "min", "max", "percentile", "top", "bottom":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample is also a selector.

@jsternberg jsternberg force-pushed the js-4619-subqueries branch 2 times, most recently from 397d0d1 to 503ebd6 Compare December 20, 2016 19:33
@@ -2057,7 +2058,7 @@ func (p *Parser) peekRune() rune {
return r
}

func (p *Parser) parseSource() (Source, error) {
func (p *Parser) parseSource(subqueries bool) (Source, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like there should be two methods here parseSource and parseSourceWithSubqueries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did this is because otherwise it would have required copying code. Splitting them into two different functions was impossible to compose because the subqueries logic had to go in the middle of the function. It wasn't possible to have it at the beginning or end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Makes sense.

@@ -2028,11 +2029,11 @@ func (p *Parser) parseAlias() (string, error) {
}

// parseSources parses a comma delimited list of sources.
func (p *Parser) parseSources() (Sources, error) {
func (p *Parser) parseSources(subqueries bool) (Sources, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with these parseSources and parseSourcesWithSubqueries.

@desa
Copy link
Contributor

desa commented Dec 20, 2016

Just finished a first pass read through. From what I can tell LGTM.

@sebito91
Copy link
Contributor

sebito91 commented Dec 20, 2016

We've confirmed this branch is working nicely, renders ~24h of data in under 30s. Caveat was that we needed to add fill(none) to the inner query.

With commits before 4a08229 we saw:

[root@carf-metrics-influx03 ~]# time influx -database 'tg_udp' -host 'localhost' -port '8086' -execute 'select sum(non_negative_derivative) from (select non_negative_derivative(last(PortXmitData_bits), 1s) / 8 from ibstats where host =~ /^fpia-gpfs-jet.*/ and interface =~ /^p3p.*/ group by *,time(1s))  where time >= now() - 1h group by time(10s)' &>/dev/null

real    2m0.097s
user    0m0.006s
sys     0m0.004s

With commit 4a08229 we saw:

[root@carf-metrics-influx03 influxdb]# time influx -database 'tg_udp' -host 'localhost' -port '8086' -execute 'select sum(non_negative_derivative) from (select non_negative_derivative(last(PortXmitData_bits), 1s) / 8 from ibstats where host =~ /^fpia-gpfs-jet.*/ and interface =~ /^p3p.*/ group by *,time(1s) fill(none))  where time >= now() - 3h group by time(10s)' &>/dev/null

real    0m31.402s
user    0m0.008s
sys     0m0.006s

@jsternberg
Copy link
Contributor Author

I've pushed an amended commit that will automatically switch the inner query to turn fill(null) into fill(none) since the null values produced by an inner query don't do anything to the outer query. So those speeds should now happen without adding fill(none).

Copy link
Contributor

@benbjohnson benbjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking good. Just a couple questions and a nit.

}
evalTime(stmt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract this inline recursive function assignment out to a standalone function? It's confusing to follow within the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I made a Reduce function on SelectStatement that would call Reduce on the appropriate fields. I couldn't add it as part of the case statement for Reduce itself though because that only accepts an Expr and I didn't want to change the function signature.

}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to break this block out to an evalSubqueryType() function. The indentation is pretty far over and it would allow you to remove the label on the for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to use the FieldExprByName utility function. I think another wrapper function for this might be good, but tell me what you think about the new code first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// used for executing queries.
type ShardMapper interface {
MapShards(sources influxql.Sources, opt *influxql.SelectOptions) (IteratorCreator, error)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ShardMapper necessary? Does the query engine's interface need to know about shards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think. The original intention of this is when I took it from lazy iterator evaluation. It was to allow the ordering of the shards. Here, it's used to organize the shards based on their sources and create a special IteratorCreator that can use FieldDimensions rather than requiring FieldDimensions and the sources to need to be propagated everywhere.

So it maps the shards needed to the sources requested. Eventually, it will probably also implement lazy iterators if we can ever make that performant.

This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
@benbjohnson
Copy link
Contributor

@jsternberg Did you push your latest changes up? I tried searching the d7c8c7c diff and couldn't find Reduce on SelectStatement or FieldExprByName.

@jsternberg jsternberg merged commit 4a559c4 into master Jan 9, 2017
@jsternberg jsternberg deleted the js-4619-subqueries branch January 9, 2017 20:14
@ghost
Copy link

ghost commented Jan 14, 2017

Hi

I'm trying to filter a result of a subquery, but the where condition doesn't work on the result of the subquery. The last result should list sub_gourp=1 and sub_group=2 only, but it includes also sub_group=3.

create database temp

use temp

insert test,my_group=1,sub_group=1 val=1
insert test,my_group=1,sub_group=2 val=1
insert test,my_group=1,sub_group=3 val=0

select * from test
	name: test
	time                my_group sub_group val
	----                -------- --------- ---
	1484407176627242006 1        1         1
	1484407179866974593 1        2         1
	1484407185003410148 1        3         0

select * from (select * from test where my_group='1' group by sub_group order by desc limit 1) group by sub_group
	name: test
	tags: sub_group=3
	time                my_group val
	----                -------- ---
	1484407185003410148 1        0

	name: test
	tags: sub_group=2
	time                my_group val
	----                -------- ---
	1484407179866974593 1        1

	name: test
	tags: sub_group=1
	time                my_group val
	----                -------- ---
	1484407176627242006 1        1

select * from (select * from test where my_group='1' group by sub_group order by desc limit 1) where val > '0' group by sub_group
	name: test
	tags: sub_group=3
	time                my_group val
	----                -------- ---
	1484407185003410148 1        0

	name: test
	tags: sub_group=2
	time                my_group val
	----                -------- ---
	1484407179866974593 1        1

	name: test
	tags: sub_group=1
	time                my_group val
	----                -------- ---
	1484407176627242006 1        1

Am I missing something?

@jsternberg
Copy link
Contributor Author

If val is an integer or float, you need to compare it with an integer or float. I don't know why that's not returning false for all of them, but that's most likely it. val > 0 rather than val > '0'.

For issues like this in the future, please open a new ticket or send a message to the mailing list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants