Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to group by time to set where the time windows occur #1837

Closed
pauldix opened this issue Mar 3, 2015 · 16 comments
Labels
Milestone

Comments

@pauldix
Copy link
Member

@pauldix pauldix commented Mar 3, 2015

Currently when you do a query and GROUP BY time(1h) it will bucket everything modulo the interval, directly on the hour markers. Users should be able to specify that the buckets occur either at the current time, or some offset, or the time of the first or last data point in a query, like so:

SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, now())

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, last())

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h and time < now() - 1h
GROUP BY time(10m, now() - 1h)

-- or this
SELECT mean(value)
FROM cpu
WHERE time > now() - 10h
GROUP BY time(10m, first())

These options to the group by time will only be valid for live queries. If someone is creating a continuous query we'll have to validate that they haven't put in these extra options.

@gergia

This comment has been minimized.

Copy link

@gergia gergia commented Mar 4, 2015

Wouldn't something similar also be a good feature for continuous queries? The use case I have in mind is aggregating data from different users and doing this starting from 7am (but their timezone!) every 24 hours.

@pauldix

This comment has been minimized.

Copy link
Member Author

@pauldix pauldix commented Mar 5, 2015

@gergia that would be much trickier to implement. The design of continuous queries won't support more than maybe a few hundred. Ideally they are able to downsample many thousands of series. Not sure how that would be done if each user has a timezone. Maybe one for each timezone you have?

@jnutzmann

This comment has been minimized.

Copy link

@jnutzmann jnutzmann commented Mar 10, 2015

This could be a decent intermediate solution for #388 as well, without having to deal with the mess of timezones (let the higher level application manage that). I am in the same boat as a number of the other users that I need to run hourly, daily, weekly, and monthly aggregates (differences and means) in the timezone that my server is located in. I think this would be a hugely beneficial addition.

@pablo-tech

This comment has been minimized.

Copy link

@pablo-tech pablo-tech commented Nov 6, 2015

this feature is essential for streaming machine learning. Systems like in finance, i.e. hedge funds, need to be able to train their AI with every possible window of time in data history, and with any group size, not just at preconceived markers.

@heythisisnate

This comment has been minimized.

Copy link

@heythisisnate heythisisnate commented Nov 18, 2015

I have the same request. In my use case, I'm trying to aggregate by week using group by time(1w) but the intervals always start on Thursdays. Why thursdays? Ideally I want to measure weeks naturally from Sunday to Saturday (or in some cases I might want Monday to Sunday).

@royalaid

This comment has been minimized.

Copy link

@royalaid royalaid commented Feb 1, 2016

+1

@jsternberg

This comment has been minimized.

Copy link
Contributor

@jsternberg jsternberg commented Apr 6, 2016

@pauldix would it be good to just have the buckets start at the start time? It will end up making the output of queries using now() act more sane. Currently, I can't know how many results this query will return because of now().

SELECT mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)

This should always return 10 rows, but it will often return 11 because now() isn't exactly on a time interval. We would then also need an alternate syntax that allows truncating now() to the correct precision though in order to allow for the old behavior.

@beckettsean

This comment has been minimized.

Copy link
Contributor

@beckettsean beckettsean commented May 27, 2016

@jsternberg is this not already implemented? I like your idea in #1837 (comment) but I think we should close this issue and open a new one for controlling bucket numbers explicitly.

@jsternberg

This comment has been minimized.

Copy link
Contributor

@jsternberg jsternberg commented Jul 14, 2016

This is already implemented. It got lost in the shuffle of things.

@jsternberg jsternberg closed this Jul 14, 2016
@Reventon1993

This comment has been minimized.

Copy link

@Reventon1993 Reventon1993 commented Aug 26, 2016

I use the query "select ... from ... where ... group by time(30m) fill(0)" and the db returns nothing because there is no data . Can I get a series of '0' data when there is no data ?

@beckettsean

This comment has been minimized.

Copy link
Contributor

@beckettsean beckettsean commented Aug 26, 2016

@Reventon1993 currently that is not possible. Follow #6967 for progress on returning fill() even if there are no results.

@hariteritreee

This comment has been minimized.

Copy link

@hariteritreee hariteritreee commented Nov 7, 2016

when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here. I am using 1.0.2 version

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)
when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

@hariteritreee

This comment has been minimized.

Copy link

@hariteritreee hariteritreee commented Nov 7, 2016

when I group by time(1w). I still see week starting from Thursday . I want week to start from sunday or monday. Please let me know how to get it done. I spent 4,5 hours trying to figure out. mostly all issues retargeting here.

SELECT SUM(value) FROM measurement_name WHERE time >= '2016-10-09T18:30:00Z' AND time < '2016-11-07T18:29:59Z' GROUP BY time(1w)

@beckettsean

This comment has been minimized.

Copy link
Contributor

@beckettsean beckettsean commented Nov 7, 2016

@hariteritreee please use influxdb@googlegroups.com for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax

@mfernand0

This comment has been minimized.

Copy link

@mfernand0 mfernand0 commented Apr 30, 2019

@hariteritreee please use influxdb@googlegroups.com for questions. GitHub issues are for bugs or feature requests. The answer to your question is to use the GROUP BY time offset: http://docs.influxdata.com/influxdb/v1.0/query_language/data_exploration/#advanced-group-by-time-syntax

this googlegroups.com account is not working beck. do you have any other email ?

@ArtruneTranspara

This comment has been minimized.

Copy link

@ArtruneTranspara ArtruneTranspara commented Nov 8, 2019

is this no longer supported?

GROUP BY time(10m, last())

I get

time dimension offset must be duration or now()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.