Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Timezones and Datetime64 #3290

Closed
PythonCHB opened this Issue Apr 29, 2013 · 16 comments

Comments

Projects
None yet
7 participants

A new version of datetime64 dtype was added to numpy recently, and as of 1.7, is considered experimental.

As of 1.7, datetime64 attempts to handle timezones by:

  • Assuming all datetime64 objects are in UTC
  • Applying timezone offsets when parsing ISO 8601 strings
  • Applying the Locale timezone offset when the ISO string does not specify a TZ.
  • Applying the Locale timezone offset when printing, etc.

(note that no timezone handling is done when converting to/from the stdlib datetime object)

However, this behavior makes it very difficult to use datetime64 in a timezone agnostic way, or if you need to work with a timezone that is not the system Locale one.

Complete and proper timezone management could be a very useful, but the current system is only useful in a few small cases, and makes common use cases very difficult. Good timezone handling is a large and difficult problem -- it is unclear whether numpy wants to take it on.

There has been much discussion on the numpy mailing list, and a new datetime64 NEP is progress, but in the meantime, I propose we make the following changes to make datetime64 far more useful and less prone to error:

  • Assume that datetime64 is timezone "naive", as the standard library datetime object does -- it is up to the user to do any time zone offset conversions.
  • Do not apply a timezone offset when converting to an ISO string
  • Never use the Locale settings for anything, without the user explicitly asking for it.
  • raise an exception when parsing an ISO 8601 string that has a timezone offset other than 0 (Z)
Owner

njsmith commented Apr 29, 2013

If we're naive, shouldn't we also refuse to process a timezone offset of 0,
and only accept ISO 8601 strings that leave the offset unspecified?

"Naive" and "UTC" are different timezones... note that these return
different values:
dateutil.parser.parse("2010-01-01T00:00:00")
dateutil.parser.parse("2010-01-01T00:00:00Z")

On Mon, Apr 29, 2013 at 5:54 PM, Christopher H.Barker, PhD <
notifications@github.com> wrote:

A new version of datetime64 dtype was added to numpy recently, and as of
1.7, is considered experimental.

As of 1.7, datetime64 attempts to handle timezones by:

  • Assuming all datetime64 objects are in UTC
  • Applying timezone offsets when parsing ISO 8601 strings
  • Applying the Locale timezone offset when the ISO string does not
    specify a TZ.
  • Applying the Locale timezone offset when printing, etc.

(note that no timezone handling is done when converting to/from the stdlib
datetime object)

However, this behavior makes it very difficult to use datetime64 in a
timezone agnostic way, or if you need to work with a timezone that is not
the system Locale one.

Complete and proper timezone management could be a very useful, but the
current system is only useful in a few small cases, and makes common use
cases very difficult. Good timezone handling is a large and difficult
problem -- it is unclear whether numpy wants to take it on.

There has been much discussion on the numpy mailing list, and a new
datetime64 NEP is progress, but in the meantime, I propose we make the
following changes to make datetime64 far more useful and less prone to
error:

  • Assume that datetime64 is timezone "naive", as the standard library
    datetime object does -- it is up to the user to do any time zone offset
    conversions.
  • Do not apply a timezone offset when converting to an ISO string
  • Never use the Locale settings for anything, without the user
    explicitly asking for it.
  • raise an exception when parsing an ISO 8601 string that has a
    timezone offset other than 0 (Z)


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/3290
.

Well, dateutil returns a time-zone aware datetime object when you give it a time zone -- so it CAN return different values with and without a TZ offset.

I just figured it wouldn't change the value, so why not. But sure, perhaps it would be less likely to cause errors later on to not except ANY timezone offset in a ISO string.

Owner

njsmith commented Apr 30, 2013

I mean that it is a different value precisely in the sense that time-zone
aware datetimes and time-zone naive datetime objects are very different,
incommensurate things, even if they have some of the same numbers stashed
in them:

In [6]: dateutil.parser.parse("2010-01-01T00:00:00")
Out[6]: datetime.datetime(2010, 1, 1, 0, 0)

In [7]: dateutil.parser.parse("2010-01-01T00:00:00Z")
Out[7]: datetime.datetime(2010, 1, 1, 0, 0, tzinfo=tzutc())

In [8]: dateutil.parser.parse("2010-01-01T00:00:00") -
dateutil.parser.parse("2010-01-01T00:00:00Z")
TypeError: can't subtract offset-naive and offset-aware datetimes

Saying "oh well they're not that different it might be convenient and who
will really notice anyway" is exactly the kind of hand-wavy reasoning that
got us into this mess :-).

On Tue, Apr 30, 2013 at 12:33 AM, Christopher H.Barker, PhD <
notifications@github.com> wrote:

Well, dateutil returns a time-zone aware datetime object when you give it
a time zone -- so it CAN return different values with and without a TZ
offset.

I just figured it wouldn't change the value, so why not. But sure, perhaps
it would be less likely to cause errors later on to not except ANY timezone
offset in a ISO string.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/3290#issuecomment-17209152
.

Owner

njsmith commented Apr 30, 2013

@wesm: Do you have any opinion on the proposed changes? I'm extremely
reluctant to mess with this stuff further without pandas signing off.

On Tue, Apr 30, 2013 at 8:18 AM, Nathaniel Smith njs@pobox.com wrote:

I mean that it is a different value precisely in the sense that time-zone
aware datetimes and time-zone naive datetime objects are very different,
incommensurate things, even if they have some of the same numbers stashed
in them:

In [6]: dateutil.parser.parse("2010-01-01T00:00:00")
Out[6]: datetime.datetime(2010, 1, 1, 0, 0)

In [7]: dateutil.parser.parse("2010-01-01T00:00:00Z")
Out[7]: datetime.datetime(2010, 1, 1, 0, 0, tzinfo=tzutc())

In [8]: dateutil.parser.parse("2010-01-01T00:00:00") -
dateutil.parser.parse("2010-01-01T00:00:00Z")
TypeError: can't subtract offset-naive and offset-aware datetimes

Saying "oh well they're not that different it might be convenient and
who will really notice anyway" is exactly the kind of hand-wavy reasoning
that got us into this mess :-).

On Tue, Apr 30, 2013 at 12:33 AM, Christopher H.Barker, PhD <
notifications@github.com> wrote:

Well, dateutil returns a time-zone aware datetime object when you give it
a time zone -- so it CAN return different values with and without a TZ
offset.

I just figured it wouldn't change the value, so why not. But sure,
perhaps it would be less likely to cause errors later on to not except ANY
timezone offset in a ISO string.


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/3290#issuecomment-17209152
.

@njsmith ,

Yes. Naive and Tz-aware date times are quite different. But doing tz-awareness right is really hard. Even if we can Agree on what we want it to do. This my proposal to stick with naive. Which, indeed it currently is--the current code simply uses some unfortunate defaults. I'd like to see that change to something far more useful.

If we go with naive, then really the only open question is when to raise errors when parsing ISO strings. If numpy only supports naive date times, then I see little chance of confusion if it accepts a tzoffset of 0 -- but consistancy is good, so no- offset-allowed is good too.

wesm commented May 6, 2013

I think that datetime64 should always be implicitly a UTC timestamp, and can be interpreted as being in some time zone as the user desires. That's the approach we're taking in pandas right now, e.g.:

In [2]: stamp = pd.Timestamp('2013-05-03T23:00-0500')

In [3]: stamp
Out[3]: <Timestamp: 2013-05-03 23:00:00-0500 , tz=tzoffset(None, -18000)>

In [4]: stamp.u
stamp.utcfromtimestamp  stamp.utcoffset         
stamp.utcnow            stamp.utctimetuple      

In [4]: stamp.value
Out[4]: 1367640000000000000

In [5]: stamp.tz_convert('utc')
Out[5]: <Timestamp: 2013-05-04 04:00:00+0000 UTC, tz=UTC>

In [6]: stamp.tz_convert('utc').value
Out[6]: 1367640000000000000

The Timestamp object is simply a user-friendly "box" and subclass of datetime.datetime that holds a UTC datetime64 timestamp (in nanoseconds).

wesm commented May 6, 2013

I should also add that unfortunately we had to kind of do everything ourselves in pandas because the NumPy datetime64 stuff has been in flux (I wrote earlier "mess" but that was putting it a bit strongly, sorry). I also disagreed with some of the decisions that Mark made in the API. I think interpreting a timestamp as naive and not localtime is a good idea, I didn't like the "convert this naive string to UTC given local timezone" which is how I recall numpy 1.7 working

Owner

charris commented May 6, 2013

@mwiebe Just a heads up to the discussion on the datetime issue.

Panda's proves the point: a naive datetime may not be as useful as a full-featured Tz-aware one but it is very useful, and users can patch their own TZ handling on top of it. Half-baked TZ support is next to useless. I'm using numpy1.7 datetime64, but frankly, I don't know that I'm buying anything over simply using a regular array that I interpret as seconds-since-the-epoc.

Maybe we should borrow Panda's code?

As for Naive vs. UTC -- they are almost exactly the same -- the only difference may be how you handle an ISO string as input: add/subtract the offset, or don't allow an offset. As long as you don't apply the locale offset without the user specifically asking for it, I'm good either way, though I think calling it "UTC" buys very little over naive, but does add a tiny bit of extra room for error.

So: I propose either Naive or UTC--implementer's choice.

-Chris

Wes I am curious about the motivation for allowing pd.to_datetime( ) to apparently select a timezone from the local system when supplied only a long. That is:

stamp = pd.Timestamp('2013-05-03T23:00')

pd.to_datetime( stamp, box = False )
Timestamp('2013-05-03 23:00:00', tz=None)

versus

pd.to_datetime( stamp.value, box = False )
numpy.datetime64('2013-05-03T19:00:00.000000000-0400')

Took me by surprise, but I'm sure you've thought it through.

Owner

charris commented Feb 22, 2014

This is still under discussion/development for 1.9

On Sat, Feb 22, 2014 at 2:46 PM, Charles Harris notifications@github.comwrote:

This is still under discussion/development for 1.9

great -- any kind of timeframe on that? I think we have a little more
consensus building to to, but really the key blocker is someone that has
the time and skills to sit down and do the work. It wuold be nice not to
leave it ot the last minute like right before 1.8...

-Chris

Christopher Barker, PhD

Python Language Consulting

  • Teaching
  • Scientific Software Development
  • Desktop GUI and Web Development
  • wxPython, numpy, scipy, Cython
Owner

njsmith commented Feb 25, 2014

The timeframe is "whenever someone with the time/skills/interest sits down
to do the work". If you can find them and ask them then that'd be great :-).

On Mon, Feb 24, 2014 at 7:08 PM, Christopher H.Barker, PhD <
notifications@github.com> wrote:

On Sat, Feb 22, 2014 at 2:46 PM, Charles Harris <notifications@github.com

wrote:

This is still under discussion/development for 1.9

great -- any kind of timeframe on that? I think we have a little more
consensus building to to, but really the key blocker is someone that has
the time and skills to sit down and do the work. It wuold be nice not to
leave it ot the last minute like right before 1.8...

-Chris

Christopher Barker, PhD

Python Language Consulting

  • Teaching
  • Scientific Software Development
  • Desktop GUI and Web Development
  • wxPython, numpy, scipy, Cython


Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/3290#issuecomment-35959526
.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Never use the Locale settings for anything, without the user explicitly asking for it.

Please can this be introduced! Numpy amuses itself by displaying timezone offsets for values which purposefully don't have time-zones - but it makes it look like they do. I happen to travel between time zones quite often (with the machine updating local accordingly) and it would be nice if my program would output the same thing twice rather than trying to be "helpful" in this way.

Also, strongly disagree with

Naive vs. UTC -- they are almost exactly the same

A naive datetime is one for which no timezone information is known or desired. It could be for instance you are running a theoretical model with no relation to a real location, so a timezone is meaningless.
A UTC datetime is one where a timezone has been defined and is desired, and is chosen to match an international convention on the "baseline" time zone. A naive datetime should never, ever be presented with a "timezone offset" or anything like it because that makes no sense, while for a UTC datetime this does make sense.

Its precisely this kind of vagueness that makes dealing with times such a pain, because you end up with data where its never clear if the value you have is in UTC, naive, includes DST, etc - and part of that is because tools (like numpy) don't really help you out and everyone ends up throwing unix timestamps around till no one has a clue what's going on...

On Thu, Mar 12, 2015 at 11:42 AM, mangecoeur notifications@github.com
wrote:

Never use the Locale settings for anything, without the user explicitly
asking for it.

Please can this be introduced!

Well -- it takes someone to get in and fix the code....

Numpy amuses itself by displaying timezone offsets for values which
purposefully don't have time-zones - but it makes it look like they do.

well, they sort of do -- currently numpy datetimes are ALWAYS UTC, and they
use the Locale to set a time zone. THis is quite broken, as you know, but
it's logical.

I happen to travel between time zones quite often (with the machine

updating local accordingly) and it would be nice if my program would output
the same thing twice rather than trying to be "helpful" in this way.

Also, strongly disagree with

Naive vs. UTC -- they are almost exactly the same

Fine -- conceptually they are very different. But implementation-wise,
they are almost exactly the same. As you say:

A naive datetime should never, ever be presented with a "timezone
offset" or anything like it because that makes no sense, while for a UTC
datetime this does make sense.

and that, in fact, the the only difference implementation wise- naive, you
numpy wouldn't add an TZ offset, UTC, it might. That's it. And a user code
could treat a naive DT as UTC and add the offset itself. That was my poiint.

Its precisely this kind of vagueness

It's only vague if numpy doesn't make it clear which is in use- naive or
UTC. My point was that either is just fine.

that makes dealing with times such a pain, because you end up with data
where its never clear if the value you have is in UTC, naive, includes DST,
etc

I see it as kind of like text encodings -- you need to know on IO and you
need to be clear in your data files. It would be nice of the tools took car
of al that for you, but that is very, very hard to do. So something useful
is better than nothing.

-CHB

Christopher Barker, PhD

Python Language Consulting

  • Teaching
  • Scientific Software Development
  • Desktop GUI and Web Development
  • wxPython, numpy, scipy, Cython
Member

shoyer commented Oct 12, 2015

Just FYI, I'm working on implementing this in #6453.

@charris charris closed this in #6453 Jan 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment