Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Timezones and Datetime64 #3290

Open
PythonCHB opened this Issue · 15 comments

6 participants

@PythonCHB

A new version of datetime64 dtype was added to numpy recently, and as of 1.7, is considered experimental.

As of 1.7, datetime64 attempts to handle timezones by:

  • Assuming all datetime64 objects are in UTC
  • Applying timezone offsets when parsing ISO 8601 strings
  • Applying the Locale timezone offset when the ISO string does not specify a TZ.
  • Applying the Locale timezone offset when printing, etc.

(note that no timezone handling is done when converting to/from the stdlib datetime object)

However, this behavior makes it very difficult to use datetime64 in a timezone agnostic way, or if you need to work with a timezone that is not the system Locale one.

Complete and proper timezone management could be a very useful, but the current system is only useful in a few small cases, and makes common use cases very difficult. Good timezone handling is a large and difficult problem -- it is unclear whether numpy wants to take it on.

There has been much discussion on the numpy mailing list, and a new datetime64 NEP is progress, but in the meantime, I propose we make the following changes to make datetime64 far more useful and less prone to error:

  • Assume that datetime64 is timezone "naive", as the standard library datetime object does -- it is up to the user to do any time zone offset conversions.
  • Do not apply a timezone offset when converting to an ISO string
  • Never use the Locale settings for anything, without the user explicitly asking for it.
  • raise an exception when parsing an ISO 8601 string that has a timezone offset other than 0 (Z)
@njsmith
Owner
@PythonCHB

Well, dateutil returns a time-zone aware datetime object when you give it a time zone -- so it CAN return different values with and without a TZ offset.

I just figured it wouldn't change the value, so why not. But sure, perhaps it would be less likely to cause errors later on to not except ANY timezone offset in a ISO string.

@njsmith
Owner
@njsmith
Owner
@PythonCHB

@njsmith ,

Yes. Naive and Tz-aware date times are quite different. But doing tz-awareness right is really hard. Even if we can Agree on what we want it to do. This my proposal to stick with naive. Which, indeed it currently is--the current code simply uses some unfortunate defaults. I'd like to see that change to something far more useful.

If we go with naive, then really the only open question is when to raise errors when parsing ISO strings. If numpy only supports naive date times, then I see little chance of confusion if it accepts a tzoffset of 0 -- but consistancy is good, so no- offset-allowed is good too.

@wesm

I think that datetime64 should always be implicitly a UTC timestamp, and can be interpreted as being in some time zone as the user desires. That's the approach we're taking in pandas right now, e.g.:

In [2]: stamp = pd.Timestamp('2013-05-03T23:00-0500')

In [3]: stamp
Out[3]: <Timestamp: 2013-05-03 23:00:00-0500 , tz=tzoffset(None, -18000)>

In [4]: stamp.u
stamp.utcfromtimestamp  stamp.utcoffset         
stamp.utcnow            stamp.utctimetuple      

In [4]: stamp.value
Out[4]: 1367640000000000000

In [5]: stamp.tz_convert('utc')
Out[5]: <Timestamp: 2013-05-04 04:00:00+0000 UTC, tz=UTC>

In [6]: stamp.tz_convert('utc').value
Out[6]: 1367640000000000000

The Timestamp object is simply a user-friendly "box" and subclass of datetime.datetime that holds a UTC datetime64 timestamp (in nanoseconds).

@wesm

I should also add that unfortunately we had to kind of do everything ourselves in pandas because the NumPy datetime64 stuff has been in flux (I wrote earlier "mess" but that was putting it a bit strongly, sorry). I also disagreed with some of the decisions that Mark made in the API. I think interpreting a timestamp as naive and not localtime is a good idea, I didn't like the "convert this naive string to UTC given local timezone" which is how I recall numpy 1.7 working

@charris
Owner

@mwiebe Just a heads up to the discussion on the datetime issue.

@PythonCHB

Panda's proves the point: a naive datetime may not be as useful as a full-featured Tz-aware one but it is very useful, and users can patch their own TZ handling on top of it. Half-baked TZ support is next to useless. I'm using numpy1.7 datetime64, but frankly, I don't know that I'm buying anything over simply using a regular array that I interpret as seconds-since-the-epoc.

Maybe we should borrow Panda's code?

As for Naive vs. UTC -- they are almost exactly the same -- the only difference may be how you handle an ISO string as input: add/subtract the offset, or don't allow an offset. As long as you don't apply the locale offset without the user specifically asking for it, I'm good either way, though I think calling it "UTC" buys very little over naive, but does add a tiny bit of extra room for error.

So: I propose either Naive or UTC--implementer's choice.

-Chris

@notbanker

Wes I am curious about the motivation for allowing pd.to_datetime( ) to apparently select a timezone from the local system when supplied only a long. That is:

stamp = pd.Timestamp('2013-05-03T23:00')

pd.to_datetime( stamp, box = False )
Timestamp('2013-05-03 23:00:00', tz=None)

versus

pd.to_datetime( stamp.value, box = False )
numpy.datetime64('2013-05-03T19:00:00.000000000-0400')

Took me by surprise, but I'm sure you've thought it through.

@charris
Owner

This is still under discussion/development for 1.9

@PythonCHB
@njsmith
Owner
@mangecoeur

Never use the Locale settings for anything, without the user explicitly asking for it.

Please can this be introduced! Numpy amuses itself by displaying timezone offsets for values which purposefully don't have time-zones - but it makes it look like they do. I happen to travel between time zones quite often (with the machine updating local accordingly) and it would be nice if my program would output the same thing twice rather than trying to be "helpful" in this way.

Also, strongly disagree with

Naive vs. UTC -- they are almost exactly the same

A naive datetime is one for which no timezone information is known or desired. It could be for instance you are running a theoretical model with no relation to a real location, so a timezone is meaningless.
A UTC datetime is one where a timezone has been defined and is desired, and is chosen to match an international convention on the "baseline" time zone. A naive datetime should never, ever be presented with a "timezone offset" or anything like it because that makes no sense, while for a UTC datetime this does make sense.

Its precisely this kind of vagueness that makes dealing with times such a pain, because you end up with data where its never clear if the value you have is in UTC, naive, includes DST, etc - and part of that is because tools (like numpy) don't really help you out and everyone ends up throwing unix timestamps around till no one has a clue what's going on...

@PythonCHB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.