New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to group by time to set where the time windows occur #1837

Closed
pauldix opened this Issue Mar 3, 2015 · 14 comments

Comments

Projects
None yet
10 participants
@pauldix
Member

pauldix commented Mar 3, 2015

Currently when you do a query and GROUP BY time(1h) it will bucket everything modulo the interval, directly on the hour markers. Users should be able to specify that the buckets occur either at the current time, or some offset, or the time of the first or last data point in a query, like so:

SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, now())

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, last())

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h and time < now() - 1h
GROUP BY time(10m, now() - 1h)

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, first())

These options to the group by time will only be valid for live queries. If someone is creating a continuous query we'll have to validate that they haven't put in these extra options.

@gergia

This comment has been minimized.

Show comment
Hide comment
@gergia

gergia Mar 4, 2015

Wouldn't something similar also be a good feature for continuous queries? The use case I have in mind is aggregating data from different users and doing this starting from 7am (but their timezone!) every 24 hours.

gergia commented Mar 4, 2015

Wouldn't something similar also be a good feature for continuous queries? The use case I have in mind is aggregating data from different users and doing this starting from 7am (but their timezone!) every 24 hours.

@pauldix

This comment has been minimized.

Show comment
Hide comment
@pauldix

pauldix Mar 5, 2015

Member

@gergia that would be much trickier to implement. The design of continuous queries won't support more than maybe a few hundred. Ideally they are able to downsample many thousands of series. Not sure how that would be done if each user has a timezone. Maybe one for each timezone you have?

Member

pauldix commented Mar 5, 2015

@gergia that would be much trickier to implement. The design of continuous queries won't support more than maybe a few hundred. Ideally they are able to downsample many thousands of series. Not sure how that would be done if each user has a timezone. Maybe one for each timezone you have?

@jnutzmann

This comment has been minimized.

Show comment
Hide comment
@jnutzmann

jnutzmann Mar 10, 2015

This could be a decent intermediate solution for #388 as well, without having to deal with the mess of timezones (let the higher level application manage that). I am in the same boat as a number of the other users that I need to run hourly, daily, weekly, and monthly aggregates (differences and means) in the timezone that my server is located in. I think this would be a hugely beneficial addition.

jnutzmann commented Mar 10, 2015

This could be a decent intermediate solution for #388 as well, without having to deal with the mess of timezones (let the higher level application manage that). I am in the same boat as a number of the other users that I need to run hourly, daily, weekly, and monthly aggregates (differences and means) in the timezone that my server is located in. I think this would be a hugely beneficial addition.

@pablo-tech

This comment has been minimized.

Show comment
Hide comment
@pablo-tech

pablo-tech Nov 6, 2015

this feature is essential for streaming machine learning. Systems like in finance, i.e. hedge funds, need to be able to train their AI with every possible window of time in data history, and with any group size, not just at preconceived markers.

pablo-tech commented Nov 6, 2015

this feature is essential for streaming machine learning. Systems like in finance, i.e. hedge funds, need to be able to train their AI with every possible window of time in data history, and with any group size, not just at preconceived markers.

@heythisisnate

This comment has been minimized.

Show comment
Hide comment
@heythisisnate

heythisisnate Nov 18, 2015

I have the same request. In my use case, I'm trying to aggregate by week using group by time(1w) but the intervals always start on Thursdays. Why thursdays? Ideally I want to measure weeks naturally from Sunday to Saturday (or in some cases I might want Monday to Sunday).

heythisisnate commented Nov 18, 2015

I have the same request. In my use case, I'm trying to aggregate by week using group by time(1w) but the intervals always start on Thursdays. Why thursdays? Ideally I want to measure weeks naturally from Sunday to Saturday (or in some cases I might want Monday to Sunday).

@royalaid

This comment has been minimized.

Show comment
Hide comment
@royalaid

royalaid commented Feb 1, 2016

+1

@jsternberg

This comment has been minimized.

Show comment
Hide comment
@jsternberg

jsternberg Apr 6, 2016

Contributor

@pauldix would it be good to just have the buckets start at the start time? It will end up making the output of queries using now() act more sane. Currently, I can't know how many results this query will return because of now().

SELECT mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)

This should always return 10 rows, but it will often return 11 because now() isn't exactly on a time interval. We would then also need an alternate syntax that allows truncating now() to the correct precision though in order to allow for the old behavior.

Contributor

jsternberg commented Apr 6, 2016

@pauldix would it be good to just have the buckets start at the start time? It will end up making the output of queries using now() act more sane. Currently, I can't know how many results this query will return because of now().

SELECT mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)

This should always return 10 rows, but it will often return 11 because now() isn't exactly on a time interval. We would then also need an alternate syntax that allows truncating now() to the correct precision though in order to allow for the old behavior.

@beckettsean

This comment has been minimized.

Show comment
Hide comment
@beckettsean

beckettsean May 27, 2016

Contributor

@jsternberg is this not already implemented? I like your idea in #1837 (comment) but I think we should close this issue and open a new one for controlling bucket numbers explicitly.

Contributor

beckettsean commented May 27, 2016

@jsternberg is this not already implemented? I like your idea in #1837 (comment) but I think we should close this issue and open a new one for controlling bucket numbers explicitly.

@jsternberg

This comment has been minimized.

Show comment
Hide comment
@jsternberg

jsternberg Jul 14, 2016

Contributor

This is already implemented. It got lost in the shuffle of things.

Contributor

jsternberg commented Jul 14, 2016

This is already implemented. It got lost in the shuffle of things.

@jsternberg jsternberg closed this Jul 14, 2016

@Reventon1993

This comment has been minimized.

Show comment
Hide comment
@Reventon1993

Reventon1993 Aug 26, 2016

I use the query "select ... from ... where ... group by time(30m) fill(0)" and the db returns nothing because there is no data . Can I get a series of '0' data when there is no data ?

Reventon1993 commented Aug 26, 2016

I use the query "select ... from ... where ... group by time(30m) fill(0)" and the db returns nothing because there is no data . Can I get a series of '0' data when there is no data ?

@beckettsean

This comment has been minimized.

Show comment
Hide comment
@beckettsean

beckettsean Aug 26, 2016

Contributor

@Reventon1993 currently that is not possible. Follow #6967 for progress on returning fill() even if there are no results.

Contributor

beckettsean commented Aug 26, 2016

@Reventon1993 currently that is not possible. Follow #6967 for progress on returning fill() even if there are no results.

@hariteritreee

This comment has been minimized.

Show comment
Hide comment
@hariteritreee

hariteritreee Nov 7, 2016

when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here. I am using 1.0.2 version

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)
when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

hariteritreee commented Nov 7, 2016

when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here. I am using 1.0.2 version

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)
when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

@hariteritreee

This comment has been minimized.

Show comment
Hide comment
@hariteritreee

hariteritreee Nov 7, 2016

when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

hariteritreee commented Nov 7, 2016

when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

@beckettsean

This comment has been minimized.

Show comment
Hide comment
@beckettsean

beckettsean Nov 7, 2016

Contributor

@hariteritreee please use influxdb@googlegroups.com for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax

Contributor

beckettsean commented Nov 7, 2016

@hariteritreee please use influxdb@googlegroups.com for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment