Support disparate time intervals and more advanced time in WHERE clauses #7530

Open
jsternberg opened this Issue Oct 27, 2016 · 9 comments

Comments

Projects
None yet
7 participants
@jsternberg
Contributor

jsternberg commented Oct 27, 2016

Feature Request

I'm calling this a feature request, but it's both a feature request and a bug in the existing query engine. Explained more below.

Proposal: Support disparate time intervals and more advanced time in WHERE clauses.

Current behavior: You can only select one continuous block of time. There is no way to select multiple blocks of time even though the query language implies it should be possible. In fact, we completely ignore any of the query language that would suggest that it's possible in favor of doing something unexpected. Given the following examples:

SELECT * FROM cpu WHERE time >= now() - 10m OR (time >= now() - 20m AND region = 'uswest')

The above query says you want any points within the last 10 minutes or within the last 20 minutes if the region is uswest. What actually happens is we take all points from the last 10 minutes because we naively look through every binary expression that has time as one of the arguments and assume that they were put together with AND.

Desired behavior: Parse the WHERE clause to retrieve the disparate time periods that it specifies correctly. The above example would create two different time intervals. One where the time was anything within the last 10 minutes and one that was within the last 20 minutes if the region was uswest.

Use case: It prevents confusion from people who are attempting to use the WHERE clause in this way and then they find out we do not parse the time clauses in the way that we imply we do. It also opens possibilities in the future of implementing new syntax for selecting only time points in the morning or the afternoon.

I'm not sure how useful this is so it will likely remain low priority. If this is important for your use case, please add a 👍 reaction to this issue and explain your use case so we can learn more about this issue and those it affects.

Thanks!

@lblt

This comment has been minimized.

Show comment
Hide comment
@lblt

lblt Aug 1, 2017

Hello!

I have a simple use case on that issue, where the implementation of OR would be very useful.

We are studying availability rates of devices, and in the calculation we need frequently to remove uninteresting periods of time. We use influxDB as a database, and queries are made from our python script.
Example: calculation of availability between 2017-05-20 and 2017-06-20, but the devices was in maintenance between 2017-05-24 and 2017-05-26, so its irrelevant to consider that period.

Without the OR, we need to make two queries.

Thus the implementation of OR would be awesome!! Is there any plan on that?

lblt commented Aug 1, 2017

Hello!

I have a simple use case on that issue, where the implementation of OR would be very useful.

We are studying availability rates of devices, and in the calculation we need frequently to remove uninteresting periods of time. We use influxDB as a database, and queries are made from our python script.
Example: calculation of availability between 2017-05-20 and 2017-06-20, but the devices was in maintenance between 2017-05-24 and 2017-05-26, so its irrelevant to consider that period.

Without the OR, we need to make two queries.

Thus the implementation of OR would be awesome!! Is there any plan on that?

@ahsanali

This comment has been minimized.

Show comment
Hide comment
@ahsanali

ahsanali Aug 31, 2017

We are trying detecting anomalies by comparing current hour with the same hour of the previous three four weeks. For that we need to specify different time ranges

ahsanali commented Aug 31, 2017

We are trying detecting anomalies by comparing current hour with the same hour of the previous three four weeks. For that we need to specify different time ranges

@jsternberg

This comment has been minimized.

Show comment
Hide comment
@jsternberg

jsternberg Aug 31, 2017

Contributor

I'm not sure how disparate intervals will help with that unless we also support time shift, which the current query engine does not. We are aware of time shift and want to support it in the future, but we are running into the limits of using an SQL-like language for things like that.

At the same time, you can just use two separate queries and you will get the data you want.

Contributor

jsternberg commented Aug 31, 2017

I'm not sure how disparate intervals will help with that unless we also support time shift, which the current query engine does not. We are aware of time shift and want to support it in the future, but we are running into the limits of using an SQL-like language for things like that.

At the same time, you can just use two separate queries and you will get the data you want.

@jsternberg

This comment has been minimized.

Show comment
Hide comment
@jsternberg

jsternberg Aug 31, 2017

Contributor

To make a note on this issue, we may not be able to do this with the current query engine because of a little weird aspect of the query parsing that too many people likely rely on.

The following condition is parsed in a way that many likely won't expect.

WHERE host = 'server01' OR host = 'server02' AND time > now() - 1d

This actually gets parsed as something equivalent to this:

WHERE host = 'server01' OR (host = 'server02' AND time > now() - 1d)

That would mean the time restriction should only be applied to host = 'server02', but the current query engine applies that time range to both conditions at the moment. If we fixed this bug/implemented this feature, we would have to correct this behavior and old queries would break. There is at least one high profile tool that wrote queries with this kind of error since I just fixed the query in Grafana with grafana/grafana#9131, so I'm not sure we can ever move from this for SQL.

Stay tuned for ifql 2.0 though where we will ensure we do not have this kind of oversight again.

Contributor

jsternberg commented Aug 31, 2017

To make a note on this issue, we may not be able to do this with the current query engine because of a little weird aspect of the query parsing that too many people likely rely on.

The following condition is parsed in a way that many likely won't expect.

WHERE host = 'server01' OR host = 'server02' AND time > now() - 1d

This actually gets parsed as something equivalent to this:

WHERE host = 'server01' OR (host = 'server02' AND time > now() - 1d)

That would mean the time restriction should only be applied to host = 'server02', but the current query engine applies that time range to both conditions at the moment. If we fixed this bug/implemented this feature, we would have to correct this behavior and old queries would break. There is at least one high profile tool that wrote queries with this kind of error since I just fixed the query in Grafana with grafana/grafana#9131, so I'm not sure we can ever move from this for SQL.

Stay tuned for ifql 2.0 though where we will ensure we do not have this kind of oversight again.

@abrandl70

This comment has been minimized.

Show comment
Hide comment
@abrandl70

abrandl70 Oct 19, 2017

I'm trying to filter the series with a regex (i.e. WHERE time =~ /regex/) in order to get only data during business hours, i.e. timestamps between 07:00 and 19:00 - regardless of the date. I'm getting no data points back because - I assume - that the query would create disparate time intervals with one data point each - just like with the unsupported "OR". I do not seem to find a way to filter data based on timestamp. The suggested workaround with separate queries does not work...

SELECT mean("perc") FROM "occupancy_avg" WHERE time =~ /'....-..-..T(0[7-9]|1[0-9]):..:..Z'/ GROUP BY time(1h) fill(null)

PS: I think that features like filtering data based on time are very important (business hours, maintenance windows, etc...)

PPS: any hint is appreciated. I'm using influx with Grafana

I'm trying to filter the series with a regex (i.e. WHERE time =~ /regex/) in order to get only data during business hours, i.e. timestamps between 07:00 and 19:00 - regardless of the date. I'm getting no data points back because - I assume - that the query would create disparate time intervals with one data point each - just like with the unsupported "OR". I do not seem to find a way to filter data based on timestamp. The suggested workaround with separate queries does not work...

SELECT mean("perc") FROM "occupancy_avg" WHERE time =~ /'....-..-..T(0[7-9]|1[0-9]):..:..Z'/ GROUP BY time(1h) fill(null)

PS: I think that features like filtering data based on time are very important (business hours, maintenance windows, etc...)

PPS: any hint is appreciated. I'm using influx with Grafana

@nathanielc nathanielc referenced this issue in influxdata/ifql Feb 1, 2018

Closed

Support Calendar Time Operations #201

@andyflatt

This comment has been minimized.

Show comment
Hide comment
@andyflatt

andyflatt Apr 24, 2018

This would be useful. I thought I would share the MUCH slower workaround. Since 1.2 you can use sub queries. This means you can use syntax like:

Select mean(*) from 
(select * from channel where (time > '2017-05-01 13:00:00' and time < '2017-05-01 15:00:00') ), 
(select * from channel where (time > '2017-05-01 18:00:00' and time < '2017-05-01 19:00:00') )
GROUP BY time(20m)fill(none);

andyflatt commented Apr 24, 2018

This would be useful. I thought I would share the MUCH slower workaround. Since 1.2 you can use sub queries. This means you can use syntax like:

Select mean(*) from 
(select * from channel where (time > '2017-05-01 13:00:00' and time < '2017-05-01 15:00:00') ), 
(select * from channel where (time > '2017-05-01 18:00:00' and time < '2017-05-01 19:00:00') )
GROUP BY time(20m)fill(none);
@jsternberg

This comment has been minimized.

Show comment
Hide comment
@jsternberg

jsternberg Apr 24, 2018

Contributor

That workaround is how we would likely implement this internally. The big problem right now is that we can't really change the behavior of how the time selectors work because they don't follow the normal rules you would expect when selecting disparate times. So to make this work as you would logically expect it to given the conditions, we would have to make existing queries break and at least some versions of Grafana use a method that would break. I think I submitted a fix some time ago to get Grafana to stop constructing the queries incorrectly, but it doesn't remove the fact that a lot of people likely rely on the incorrect behavior.

I'm sorry that's the case.

Contributor

jsternberg commented Apr 24, 2018

That workaround is how we would likely implement this internally. The big problem right now is that we can't really change the behavior of how the time selectors work because they don't follow the normal rules you would expect when selecting disparate times. So to make this work as you would logically expect it to given the conditions, we would have to make existing queries break and at least some versions of Grafana use a method that would break. I think I submitted a fix some time ago to get Grafana to stop constructing the queries incorrectly, but it doesn't remove the fact that a lot of people likely rely on the incorrect behavior.

I'm sorry that's the case.

@AndrewSav

This comment has been minimized.

Show comment
Hide comment
@AndrewSav

AndrewSav Apr 24, 2018

@jsternberg so you'll leave incorrect implementation in forever because of backward compatibility, instead of doing the right thing of fixing it?

Do not the project have guidelines how to deal with this type of situation? For example have some flags that would by default support old behaviour for a few releases and then would by default support the new behaviour for a few releases and then remove the incorrect behaviour altogether?

The timelime for phasing out the incorrect one can be years, so that everyone who depend on it has time to update.

To me that would seem logical, saying this is wrong but we are not fixing it sounds strange.

@jsternberg so you'll leave incorrect implementation in forever because of backward compatibility, instead of doing the right thing of fixing it?

Do not the project have guidelines how to deal with this type of situation? For example have some flags that would by default support old behaviour for a few releases and then would by default support the new behaviour for a few releases and then remove the incorrect behaviour altogether?

The timelime for phasing out the incorrect one can be years, so that everyone who depend on it has time to update.

To me that would seem logical, saying this is wrong but we are not fixing it sounds strange.

@jsternberg

This comment has been minimized.

Show comment
Hide comment
@jsternberg

jsternberg Apr 24, 2018

Contributor

The current plan is to fix this as part of ifql since ifql, by language design, won't have this issue. It's being worked on right now. By the time we would be able to phase out the incorrect behavior and find all of the places for it to be fixed in the wild, ifql will likely be mature and have more features than the current influxql.

I apologize that we are not able to fix this in a better way, but mistakes happen and that's the most I can say given the circumstances. I really wish I could have fixed this issue and I did try once in the past. Here's the commit where I fixed the behavior and here's the one where I had to revert the fix.

Contributor

jsternberg commented Apr 24, 2018

The current plan is to fix this as part of ifql since ifql, by language design, won't have this issue. It's being worked on right now. By the time we would be able to phase out the incorrect behavior and find all of the places for it to be fixed in the wild, ifql will likely be mature and have more features than the current influxql.

I apologize that we are not able to fix this in a better way, but mistakes happen and that's the most I can say given the circumstances. I really wish I could have fixed this issue and I did try once in the past. Here's the commit where I fixed the behavior and here's the one where I had to revert the fix.

@adamperlin adamperlin referenced this issue in influxdata/platform Jul 3, 2018

Open

Support Calendar Time Operations #317

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment