Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support per-query timezone offsets #2074

Closed
toddboom opened this issue Mar 25, 2015 · 69 comments · Fixed by #6504
Closed

Support per-query timezone offsets #2074

toddboom opened this issue Mar 25, 2015 · 69 comments · Fixed by #6504
Milestone

Comments

@toddboom
Copy link
Contributor

Extending the discussion started in #2071 and re-starting the old discussion from #388, we should support time zones on a per-query basis.

A quick example might look like:

select mean(value) from cpu
where time >= today()
group by time(10m)
time_zone(PST)

Or you could also do time_zone(+8) or time_zone(-2).

Any other suggestions?

@jnutzmann
Copy link

Doing a simple +/- is not sufficient for full timezone support because of daylight savings (PST vs PDT depending on the time of year).

For example, if you have a GROUP BY time(1d), the size of the day/bucket can vary depending on it's time of year. This makes it extremely messy to implement with the current method. That said, it is exactly what I need for my application.

Right now, I am planning on implementing an index on top of influx for this purpose, but would love to not have to do that.

@toddboom toddboom added this to the Next Point Release milestone Mar 25, 2015
@XANi
Copy link

XANi commented Mar 25, 2015

IMO there should be both options.

Query-wide parameter covers most of the cases but function that just returns offset based on supplied timezone (which would be necessary for per-query operator anyway) is much more convenient than doing that client-side

@jufemaiz
Copy link

Regarding "any other questions", can support be provided for falling edge v rising edge groupings?

time_zone as a "number" should be a four number string too (some XX30 tz)

@rosieman
Copy link

Not sure if you're still debating format, but I'd recommend supporting the Olsen / tz database. See http://joda-time.sourceforge.net/timezones.html. It supports named timezones, as well as UTC offsets (which are confusingly backwards, ie. Etc/GMT+5 is actually GMT-5). This is the closest thing to a standard I've ever seen.

@jessecascio
Copy link

+1 for Olsen tz

... with timezone('america/phoenix')

@itxx00
Copy link

itxx00 commented Jun 15, 2015

why not write a timezone item in conf file?

@manusvs650
Copy link

+1

1 similar comment
@zredAlmeria
Copy link

+1

@ccutrer
Copy link

ccutrer commented Jun 18, 2015

another +1 for olsen db rather than literal offsets or any other named time zones

though I should also point out that timezones aren't the only problem. if I do a group by time(30m), I might want my groups to be between 15 and 45, so that the "middle" of the grouping is on the hour and half hour, when I expect my peak load due to user traffic patterns. and that has nothing to do with timezone - it's more "offset the grouping by a certain amount"

@namtuoc0202
Copy link

select mean(value) from cpu
where time >= today()
group by time(10m)
time_zone(+8)

It may be the best

@nemesifier
Copy link

I would use a configurable default timezone + query param to override it

@thetcc
Copy link

thetcc commented Jul 3, 2015

I eagerly awaited this feature.
Without this feature,I can not use group by time(1d) .

Make timezone can be configurable(in influxdb.conf) maybe is easier.
Can add this feature in the 0.9.2 release version first?

@jeremyfromearth
Copy link

+1
We are fetching sums and grouping by 1 day and can not get accurate stats because of this timezone issue.

@zredAlmeria
Copy link

+1 for setting default timezone in config file.

Currently, I'm using a quick workaround which consists in using a cron script or program to retrieve the average of the day (executed at 00:01 AM for example) of the desired series from InfluxDB, using the correct start and end timestamps in the WHERE clause of the query.

After that, insert the data in a new series with the timestamp of the beginning of the day and you'll be able to retrieve the correct data in your queries.

@CrazyJvm
Copy link
Contributor

CrazyJvm commented Aug 6, 2015

I really want to know which version will include this feature, is there a PR proposed already?

@beckettsean
Copy link
Contributor

There is no PR for this yet, and the work is currently unscheduled.

@beckettsean beckettsean modified the milestones: Next Point Release, Longer term Aug 6, 2015
@hotsnow77
Copy link

+1
I'm surprised such a simple and basic feature is unscheduled :( So far grouping by 1d is useless for me :(
...or at least a timezone setting in a config file. In my case it's always the same timezone.

@beckettsean
Copy link
Contributor

@hotsnow77 we will happily accept a PR for this feature, but I don't think you'll find it simple. It is definitely a highly desired addition but likely won't be tackled until clustering and other stability/availability concerns are considered mature.

@nemesifier
Copy link

This is pointless. If the devs are busy with other features that they consider more important why bother complaining? Either get together some golang developers who can send a patch or let them work on their priorities without pointless complaints that will only slow them down because they are nice and they'll find the time to answer politely.

We want this feature as bad as you so don't slow them down with useless comments.

PS: this is an open source project, "without warranty of any kind"

@Zolmeister
Copy link

+1 for per-query timezone

jsternberg added a commit that referenced this issue Apr 29, 2016
An offset of `time(1m, now())` will anchor the offset to the start time
of the query. The default offset is `0s` which is the current default
anyway.

This fixes #2074 by making time zone offset support unnecessary. Time
comparisons can use timezones inside of the time clause and the offset
needed for non-hour timezone differences can be used as part of the
offset argument.
jsternberg added a commit that referenced this issue May 2, 2016
An offset of `time(1m, now())` will anchor the offset to the start time
of the query. The default offset is `0s` which is the current default
anyway.

This fixes #2074 by making time zone offset support unnecessary. Time
comparisons can use timezones inside of the time clause and the offset
needed for non-hour timezone differences can be used as part of the
offset argument.
jsternberg added a commit that referenced this issue May 2, 2016
An offset of `time(1m, now())` will anchor the offset to the current
time of the query. The default offset is `0s` which is the current
default anyway.

This fixes #2074 by making time zone offset support unnecessary. Time
comparisons can use timezones inside of the time clause and the offset
needed for non-hour timezone differences can be used as part of the
offset argument.
@giko45
Copy link

giko45 commented May 2, 2016

This does seem to add a constant offset to the comparison so it does not handle summer/winter time changes in the dataset, right?

@jufemaiz
Copy link

jufemaiz commented May 2, 2016

Yeah, not sure you can close out a timezone requirement with a time offset solution. Sorry guys. Only partly resolved.

#6504 (comment)

@jsternberg
Copy link
Contributor

@giko45 if you can explain a bit what you mean, we could add something more. What kind of output are you expecting? Times all get represented as UTC so the main concern is having the client interpreting the result. Even when you cross timezone boundaries due to going from something like PDT to PST, you can just do this:

SELECT value FROM cpu WHERE time > '2016-11-05T00:00:00Z-07:00' AND time <= '2016-11-07T00:00:00Z-08:00'

These times will work properly and only select data within those time periods. It's just the output will always be in UTC. If you have an idea for something that can't be done with the currently existing tools, can you open an issue describing your use case?

@ccutrer
Copy link

ccutrer commented May 3, 2016

SELECT SUM(value) FROM energy WHERE time>'2016-02-05T00:00:00Z-07:00' AND time<'2016-07-05T00:00:00Z-06:00' GROUP BY time(1d);

The offset bucket you added now allows me to account for the fact that I want my buckets at 06:00Z, but it does not account for the fact that in the specified interval for this query, one of the days has a bucket that is 23 hours, and that the UTC time of the boundaries between buckets will change from 07:00Z to 06:00Z on that day. This is not just a presentation problem, because the bucket size needs to dynamically adjust based on when the DST transition is. Also note that in the query as written right now, there's no way to know when the DST transition is. The offsets -0700 and -0600 don't tell you that - only the actual timezone ('America/Denver') has enough information to know when DST transitions are.

Also note that while the offset solution is enough to help half-hour timezones (like you mentioned - India) get by (ignoring DST problems), if you provide a way for a timezone to be provided, you shouldn't also need to specify an offset, since you can infer the offsets from the time zone (i.e. current offset % 60 will tell you if it's a 30 minute, or possibly even 15/45, or :04 -- I'm pretty sure one of those zones exist). In fact, I would expect something like "SELECT MEAN(value) FROM series GROUP BY time(1h) AT TIME ZONE 'Asia/Kolkata'" to automatically infer an offset of 30m from UTC times for the grouping.

@jsternberg
Copy link
Contributor

I've created #6541 and #6542 for tracking those two use cases. This implementation should work for the use case described in this ticket and will be a good jumping off point for implementing full timezone support.

Thank you for your feedback.

@giko45
Copy link

giko45 commented May 3, 2016

A syntax like #6541 or @ccutrer
SELECT mean(value) FROM cpu WHERE time > now() - 7d GROUP BY time(1d) TZ("America/New York")
would be my use case too.

However implementation requires
1: times in the database are in UTC
2: before the grouping: convert all "time"-s in to the required TZ
3: do the grouping (since "time" now is a local time grouping will handle DST, and yes this will imply one day per year with 23 h and one day with 25h but thats OK )
4: last step i am not totally sure: *Should the result be converted back to UTC? * in case of continues queries: yes, you don't want to save local times in you DB. In case of client queries: probably yes too. only return UTC times to the client. this just might seem a bit strange since you will get daily aggregated values att an offset in UTC. However its the clients responsibility to present this correct.

@henriquemaciel
Copy link

+1

@skarunkumar
Copy link

it would be nice if the time_zone option is available

@jufemaiz
Copy link

@mattbasta assuming 8h could be augmented with minute based offset, there's still the issue of DST which can't be dealt with like this :(

@jufemaiz
Copy link

@mattbasta all good and well to hold the opinion, however there are ample use cases and it's a fairly standard requirement for a database dealing with datetimes to be capable of dealing with the real world and its complexities. Imho this is a long outstanding issue that needs resolving by the Influx core team and I know I am not alone. Until that happens should we implement Influx to meet our tsdb needs we will be limited to aggregating at a half hour level in the database and then completing any aggregation in the application. This puts it at a disadvantage when comparing with other options.

@giko45
Copy link

giko45 commented Dec 17, 2016

If one saves data in the database in UTC then daily aggregates can simply be done by converting to the right TZ before aggregration, this will handel DST too.
Aggregation should be something like this:

select mean(value) from cpu group by time(1d,"CET")
And in the backend (i am not a go programmer):

t := [time read from DB, should be in UTC]
tz, _ := time.LoadLocation("CET") // get TZ identifier from SQL select clause
t.In(tz) // convert time before group by
... // after this do the normal GROUP BY.

Normal select queries will return data in UTC (as the data is saved in UTC) but Clients can convert to local TZ (which is good practice anyway since clients might be in diffrent TZ)

@XANi
Copy link

XANi commented Dec 17, 2016

@jufemaiz Does any other tsdb handle that better ? Genuinely curious, haven't played with other ones much

@ccutrer
Copy link

ccutrer commented Dec 17, 2016

@mattbasta PST and PDT are not time zones. They are simply defined offsets (-8 and -7). Pacific Time is a time zone, and includes the information about when it changes offsets (DST), as well as nicknames for those offsets. And it's not hard for software to support it. There is a well maintained database of time zone information that software can use to know what the DST rules are for any given time zone in any given year (within practicality): https://en.wikipedia.org/wiki/Tz_database. Usually software will support the tzinfo data files, and rely on the operating system to keep them up to date.

@mattbasta
Copy link

I deleted my comment. I don't want to be involved in this discussion. Feel free to argue all you want.

@timhallinflux timhallinflux modified the milestones: 0.13.0, Longer term Dec 20, 2016
@greggwon
Copy link

greggwon commented Jan 24, 2017

Influx should store data with a UTC timestamp. Formatting timestamps should be a display detail.
The complete solution to timezone handling is to provide a timezone formatting mechanism. It should be possible to just use something like this with time: format(time,'Central Standard Time'). There can not be any solution based on using "server time" or "client time" etc. Queries simply need to be able to format time. Any query inferring time, such as group by and others, should be using the timezone of the locale that influx can detect at the server to shift time relative to UTC.

@jufemaiz
Copy link

@greggwon you're correct except for the rather large use case of aggregation methods by calendar days (weeks, months, years etc), then InfluxDB needs to be able to manage timezone offsets. In this case it is not merely a display detail but a functional requirement :)

@kushagraSahu
Copy link

+1

1 similar comment
@svetlina-scaleforce
Copy link

+1

@greggwon
Copy link

Aggregation becomes a "display detail", because you are now visualizing. The functional definition as shown in the sample query is not my point. My point is that there should be a common time reference that is automatically known for all data. Data should not float around in varied Timezones and then have to be coerced to another in random fashion. Instead, there should be a single base timezone reference and then "visualization" can specify what timezone is important.

In the sample query, the 'time' comparison to "today()" also implies a visualization (extraction of the data for use) and so today() should also be offset by a timezone offset. today() needs the parameter, not a new function which will randomly interject problems to all other 'time' values in an expression. Each use of a time function, should by default use the server timezone offset but allow a parameter which specifies the timezone offset applicable.

Yes, that becomes more places that the offset must be specified, but it allows the developer/application the explicit control needed to make sure that we don't suddenly have offset math strewn about the query with various +21600000, and -21600000 etc.

@jsternberg
Copy link
Contributor

jsternberg commented Jul 21, 2017

I will be locking this issue. Litigating the result of a closed issue that was opened two years ago and resolved six months ago is unfair to the people subscribed to the issue and I think it is unfair to forcibly involve others in the discussion.

With that said, I am happy to share the rationale for why the issue was resolved in the way it was resolved. But, the proper location for that question is on our community forums and not the issue tracker. @greggwon if you want to continue discussion, please create an account on that website and post a new topic. I will keep an eye out for the next day to see if there is any discussion about time zones there.

If you have issues with the way it was resolved that pertain to a valid use case of time zone math, please open an issue. I do not want to fail to address valid use cases since I want the database to be as useful to as many people as possible. If there are problems with the existing implementation, I want to address them so that everyone benefits from a better database.

One clarification that might be needed for most people (since I have seen this confusion before), the TZ parameter does not affect any time values in the actual query. It affects time boundaries. The time 2000-01-01T00:00:00Z already has a time listed. It's UTC and the time zone parameter cannot and will not affect that.

Time zones are a disabled feature by default. If you don't include the TZ() parameter, no time zone math will be done. And the previous group by offset feature introduced in 1.1 or 1.2 is still there for other use cases.

I hope that clarification helps and look forward to your continued engagement with the InfluxData community.

@influxdata influxdata locked and limited conversation to collaborators Jul 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.