Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] subqueries / nested queries #4619

Closed
beckettsean opened this issue Oct 29, 2015 · 19 comments
Closed

[feature request] subqueries / nested queries #4619

beckettsean opened this issue Oct 29, 2015 · 19 comments

Comments

@beckettsean
Copy link
Contributor

I strongly suspect this is a duplicate but I cannot find another issue making the request.

@ivanscattergood
Copy link

+1

@shanielh
Copy link

shanielh commented Nov 1, 2015

I have some use cases for this issue:

  • Dividing Two Series (Number of errors / Number of requests) to achieve error%
  • Dividing Two Series with Time Shifting ("Number of Requests / Number of Requests 7 days ago") to achieve %traffic against last week

I think that Graphite-API addresses those issues, It just "forces" me to use lots of series instead of tags.

@aadrian
Copy link

aadrian commented Nov 1, 2015

+1

@adrianlzt
Copy link

+1

@jstoner
Copy link

jstoner commented Jun 14, 2016

+1

I would like to be able to bound queries based on queries. So

select measure from blah where time > (select time from blah2 limit 1)

@jackfischer
Copy link

+1 and second @jstoner

@kheraud
Copy link

kheraud commented Aug 2, 2016

+1. In my case summing the results of a mean(col) group by

@bedrin
Copy link

bedrin commented Aug 3, 2016

+1

1 similar comment
@wmarbut
Copy link

wmarbut commented Aug 8, 2016

+1

@har-ki
Copy link

har-ki commented Aug 9, 2016

+1
I would love a feature that lets me do whats mentioned here : #6218

@nktl
Copy link

nktl commented Aug 11, 2016

+1

2 similar comments
@nahumliran
Copy link

+1

@cloudnull
Copy link

+1

@lexmag
Copy link

lexmag commented Sep 6, 2016

Rather than +1 in comments it's way better to react 👍 under initial comment.
There are many people who have subscribed to this issue to get a notification when it will be closed (hopefully implemented), and those +1 only create unnecessary noise for everyone.

@jsternberg jsternberg self-assigned this Sep 30, 2016
@jsternberg
Copy link
Contributor

jsternberg commented Sep 30, 2016

Short write-up of this issue. I'm going to start mocking up an example starting with the syntax. If you are following this and have any specific examples that you want to advocate for, please comment so I can be sure we're solving the correct problem. Thanks.

Subqueries

Subqueries will act as an ad-hoc measurement and will be treated that
way. It will have to be the equivalent of doing the following:

SELECT derivative(value) INTO tmp_XXXXXX WHERE time >= now() - 1d FROM cpu;
SELECT sum(derivative) FROM tmp_XXXXXX WHERE time >= now() - 1d GROUP BY time(1h);
DROP MEASUREMENTS tmp_XXXXXX

This will be the equivalent of the following query:

SELECT sum(derivative) FROM (SELECT derivative(value) FROM cpu GROUP BY host) WHERE time >= now() - 1d GROUP BY time(1h)

The WHERE on the inner query will be considered irrelevant because of
the time on the outer query. I do not think we should allow time in the
inner-subqueries because I think it will end up being too complicated.

The source that gets created here in the FROM acts as a separate
measurement. The fields it creates and the tags that are kept are the
ones that are selected and the tags that are in the GROUP BY.

Any query can be used as a selector, but a target cannot be used and a
WHERE clause with a time cannot be used.

@Tomcat-Engineering
Copy link
Contributor

Looks good, but it would seem more intuitive to put the "WHERE time" clause on the inner query rather than the outer one? Presumably the database engine isn't going to run the inner query on "all time" and leave it to the outer query to limit the time span?

Also, will we be able to use the results of the inner query in a WHERE clause like this:

SELECT sum(derivative) FROM (SELECT derivative(value) FROM cpu) WHERE derivative > 5

?

@jkielbaey
Copy link

I'm also wondering how the inner query will be limited to fetch data for only a limited period of time.
Will the temporary measurement be stored only in memory? IMHO this would make sense.

My use case would be to calculate averages per group-by-time interval and then calculate the sum() over the averaged data. CQs can be a workaround, but I find they add a lot of processing.
Currently we have Telegraf configured to collect metrics every 30 sec. When displaying small periods of time (few hours) this works fine, but for longer periods of time (days/weeks) it's a lot of data to display and graphs become unreadable.
eg. to get the total of disk read operations per server, following query works fine as long as the group-by-interval is the same as the collection interval:

SELECT sum(reads) FROM diskio WHERE time > now() - 1h GROUP BY time(30s), "host"

When increasing the group-by-time (eg. to 15m), this breaks. We want to be able to calculate the average over the 15min intervals. This would basically come down to following 2 queries:

SELECT sum(reads) INTO tmp_diskio_reads FROM diskio WHERE AND time > now() - 24h GROUP BY time(30s), "host"
SELECT mean(sum) FROM tmp_diskio_reads WHERE time > now() - 24h GROUP BY time(15m), "host"

or in 1 query:

SELECT mean(sum) FROM (SELECT sum(reads) INTO tmp_diskio_reads FROM diskio WHERE AND time > now() - 24h GROUP BY time(30s), "host") GROUP BY time(15m), "host"

(The WHERE in the outer query should be no longer necessary as the inner query already limits the queried data.)

@toni-moreno
Copy link

+1

@jsternberg
Copy link
Contributor

For those following this issue and want to try out the feature, help with testing before we merge it, or ensure that the PR does what you think it should do, the current work for this is in #7646. Feedback is welcome and appreciated on that PR.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests