Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation by time interval should support timezones #388

Closed
jvshahid opened this issue Apr 1, 2014 · 27 comments
Closed

Aggregation by time interval should support timezones #388

jvshahid opened this issue Apr 1, 2014 · 27 comments

Comments

@jvshahid
Copy link
Contributor

jvshahid commented Apr 1, 2014

This is related to #387, since we have to respect the natural boundaries of a day, week, etc. InfluxDB has to accept a timezone argument for each query which will be used to convert the stored timestamps, instead of assuming UTC all the time.

@consense
Copy link

consense commented Apr 5, 2014

+1

@tisba
Copy link

tisba commented Apr 7, 2014

Does it really make sense to provide a timezone on query time? That would change the interpretation of previously stored data. That is a very difficult topic and I'd like to hear, why you proposed to do that on query time.

@consense
Copy link

consense commented Apr 7, 2014

The point would not be to retrospectively change the interpretation of time stored inside influx (thats always UTC) but to adjust the meaning of keywords such as GROUP BY time(1d).
The (1d) grouping in this case is something not inherent to the data but specific to the viewpoint of the issuer of the query.

@timuckun
Copy link

Postgres has a weird syntax but deals with times zones using the "time zone at" syntax. PG has a "timestamp without time zone" and "timestamp with time zone" types but does not store the time zone which is a bit of an WTF.

If you say field time zone at 'utc' the result will depend on the field type. If the field type is timestamp without time zone then it will be presumed to be UTC if its of type timestamp with time zone then it gets converted to UTC.

Sometimes I have to do something like this time_stamp_filed at time zone "blah" at time zone "UTC" meaning convert the field which is at time zone 'blah' to UTC.

In my opinion it would be nice to include the time zone in the field so it's obvious when you are looking at it.

@jufemaiz
Copy link

Would like to +1 the support for timezone when querying using aggregates.

The storage of data with a timezone inherint is less of a concern for me and I'm yet to see suitable use cases for this - with the exception of local installation where they wish to force the use of a local timezone, which then has issues with regards to Daylight Savings Time (DST – ><) and then allow easy access directly to influxdb rather than through an API layer that deals with this for them (I'm thinking a replacement for OSIsoft PI and its interfaces).

The impact of DST, managing the dates of operation for example, needs to be well understood to be sure.

@elrosa
Copy link

elrosa commented May 26, 2014

A huge +1 for timezones in group by clauses.

@jufemaiz
Copy link

Another or edge case, where aggregate metrics are being used, the user may wish to have the rising edge or the trailing edge of the time period used.

How is it currently being implemented (I suspect rising edge)? Is there thoughts to support trailing edge aggregation if this is the case?

@RajanRastogi
Copy link

+1

Would like to see an option for time zone while querying with aggregates. Like consense mentioned, it should be specific to the viewpoint of the issuer of the query.

@meteozond
Copy link

+1

@nirvdrum
Copy link

nirvdrum commented Jul 6, 2014

I was looking at taking a crack at this. Is there a proposed syntax for specifying the timezone? It looks like this may require a grammar change, so I'd rather not sink time into work that may need to be redone there if there's already a syntax in mind.

@pauldix
Copy link
Member

pauldix commented Jul 6, 2014

This would definitely require a grammar change. Not sure about the best way to handle this. Maybe like this?

-- subtract 5 hours. -5:00 UTC offset, (eastern time)
select count(value) from foo
group by time(5m, -5)

-- or this. add 9.5 hours
select count(value) from foo
group by time(5m, 9:30)

This becomes a little tricky with continuous queries because the offsets change depending on the time of the year. You wouldn't want to specify a strict offset. You'd want to specify say Pacific Time and have it compute the offset when the continuous query is run.

Now that I'm thinking about this, it would probably be better to just have timezone as a function option on the end kind of like what we did in #662.

select count(value) from foo
group by time(5m)
where time > now() - 4h
timezone(+5)

Maybe it reads a little cleaner like this

select count(value) from foo
group by time(5m)
where time > now() - 4h
with timezone(+5)

Then you can key the parser off the with keyword and expect any number of functions after that modify the source series.

And then have the timezone function take +5:00, -4:00, or some text code. Not sure if there are shortcodes for timezones or some good way of translating. That way you could do timezone('US Eastern Time') and have it work depending on whether it's daylight savings or not.

@mboelstra
Copy link
Contributor

+1 on extending the query grammer with a timezone parameter.

In my opinion it would be cleanest and best to use the timezone names as specified in the TZ database. See http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
Using the names of the TZ column (e.g. "Europe/Amsterdam") is also supported in many DateTime libraries such as JodaTime for Java or Qt for C/C++.

And as already mentioned using a name also helps handling DST. Especially when doing a query where the time range involves a transition from or to DST.

@nirvdrum
Copy link

I started work on this, but have gotten side-tracked since. It's available at https://github.com/nirvdrum/influxdb/tree/add_with_clause_support if someone wants to pick up from there (or use whatever for inspiration). I'll try to get back to this, but I did this before the code was reorganized and haven't rebased yet.

@mauidude
Copy link

I started working on this too a while back and finally updated it with master.
It seems to work with a simple query such as:

SELECT * from my_series GROUP BY time(1h) with timezone(-7h) 

I'm not sure if it works with a WHERE clause properly and it probably needs some tests.

https://github.com/mauidude/influxdb/tree/388-with-clause

@jufemaiz
Copy link

Timezone support needs to include daylight savings though, otherwise it should only be referred to as time-offset. Just a thought.

@fxstein
Copy link

fxstein commented Oct 6, 2014

Definitely needed together with #387. In my use case smart grid/smart home, aggregation of e.g. energy consumption only makes sense when applied to the local Timezone of the grid/property in question. In addition to the hard coded offset proposed here, it might even have to go a step further.
Here is an IoT example: Compare the power usage patterns of homes in New York with those in San Diego. All the data comes in as UTC.

@jufemaiz
Copy link

jufemaiz commented Oct 6, 2014

@fxstein I think we need to talk ;)

@NamPNQ
Copy link

NamPNQ commented Nov 19, 2014

I have some suggest

select count(value) from foo
group by time(5m)
where time > now() - 4h
with timezone(timezone,5) #first arg is column, second is default value

you can using column in database for timezone
ex:
+----------+-------------+---------------+------------------+
| time | timezone | somedata | somedata |
+----------+-------------+---------------+------------------+

because you have multidata different timezone

@jufemaiz
Copy link

@NamPNQ unfortunately if a timezone isn't fully supported then daylight savings times won't be considered… :\

@toddboom toddboom added the idea label Nov 25, 2014
@jessecascio
Copy link

I have a bunch of series for different accounts on advertising stats, like ad.account_123. Each account is for companies located in different parts of the world, and the time value is always the midnight time stamp of the current day: 1417546260, 1417459863, etc., so I can see spend, cpc, etc. for any of my accounts by day, by week, by month. I'm having an issue with the group by time() since each account is located in a different timezone. Running

select sum(spend) from ad.account_123 group by time(1d)

will not correctly group by day for the timezone of the accounts rather by the timezone of the server (at least that's what I'm assuming). The ability to set the timezone in the query would be a huge + for me:

select sum(spend) from ad.account_123 group by time(1d) with timezone('america/new_york')

@manusvs650
Copy link

+1

2 similar comments
@k2xl
Copy link

k2xl commented Feb 3, 2015

+1

@marko8n
Copy link

marko8n commented Feb 23, 2015

+1

@jnutzmann
Copy link

+1

Any thoughts of getting this into the 0.9 release?

@Glavin001
Copy link

+1 would be great to have this in 0.9.0 release, however please do not hold that release up. This is a must for the future, although we should get 0.9.0 out ASAP. Thanks!

@kayhman
Copy link

kayhman commented Mar 20, 2015

+1

@toddboom
Copy link
Contributor

This definitely needs to happen, but I'm moving this to #2074 to clean things up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests