BUG: Constructing Timestamp from local time + offset + time zone yields wrong offset #7833

Closed
ischwabacher opened this Issue Jul 24, 2014 · 8 comments

Comments

Projects
None yet
3 participants
Contributor

ischwabacher commented Jul 24, 2014

xref #7825

In [1]: import pandas as pd

In [2]: pd.Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')
Out[2]: Timestamp('2013-10-31 23:09:00-0551', tz='America/Chicago')

This bug is inherited from datetime, which is using local mean time (the first entry for America/Chicago in tzinfo).

Contributor

jreback commented Jul 24, 2014

this is actually a bit tricky, because you are passing an already localized tz, then what is the tz='....' doing? a conversion? (has to be, right?) or is this reallying localizing the UTC of the original value?

jreback added the Timezones label Jul 24, 2014

jreback added this to the 0.15.0 milestone Jul 24, 2014

Contributor

jreback commented Jul 24, 2014

Contributor

ischwabacher commented Jul 24, 2014

But -0500 isn't a time zone-- it's just an offset. It tells you how to convert this time to UTC, but not this time plus an hour. How do we want to handle Timestamp('2013-11-3 01:30:00', tz='America/Chicago')? That time is ambiguous, but if we add -0500 or -0600, it's not. We also can't rely on time zone abbreviations-- Timestamp('2013-11-3 01:30:00 CDT') could be in America/Havana, and even Timestamp('2013-11-3 01:30:00 CDT-0500') could be in one of the Indiana time zones. To make matters worse, not all local time discontinuities come from DST.

I really think the only unambiguous way to construct an aware datetime from local time is if both the time zone and the offset are also present. Offset lets you convert to UTC, where you look up the correct offset for the time zone at that UTC time and compare it to the offset again. If they match, great. If they don't match, you can either correct the offset or give a NonExistentTime error, and which of these you do should probably be determined by a strict= kwarg. Maybe you accept only the correct offset, or Z to construct from UTC + time zone.

EDIT: Another benefit of fixing this:

In [1]: from pandas import Timestamp

In [2]: Timestamp('2013-11-3', tz='America/Chicago')
Out[2]: Timestamp('2013-11-03 00:00:00-0500', tz='America/Chicago')

In [3]: eval(repr(_))
Out[3]: Timestamp('2013-11-02 23:09:00-0551', tz='America/Chicago')

jreback added the Bug label Jul 24, 2014

Member

sinhrks commented Jul 24, 2014

It looks to be caused by tslib._localize_tso (in Timestamp.__new__) ignores current tz. May better to replace _localize_tso to tz_convert? And I think -05:00 is timezone with fixed offset, by definition.

result = pd.Timestamp('2013-11-01 00:00:00-05:00')
result, result.tz
# Timestamp('2013-11-01 00:00:00-0500'), tzoffset(None, -18000)

result.tz_convert('America/Chicago')
# 2013-11-01 00:00:00-05:00
result.tz_convert('America/Chicago')

ischwabacher changed the title from BUG: Constructing `Timestamp` from local time + offset + time zone yields wrong offset to BUG: Constructing Timestamp from local time + offset + time zone yields wrong offset Jul 24, 2014

Contributor

jreback commented Jul 24, 2014

@sinhrks yep

yes I think if the current tz is not None then tz_convert it. But I might only do it if its a fixed offset (or a UTC). Not sure that you want implicit conversion just by passing a tz, so maybe raise ValueError if that is the case (and obviously its innocuous if you pass in the SAME tz as inferred from the stamp).

Contributor

ischwabacher commented Jul 25, 2014

It might be best if the conversion only happens when ts_input is a string object, isinstance(ts_input.tzinfo, (dateutil.tz.tzoffset, dateutil.tz.tzutc)), and tz is not None. This test is awkward, but it lets us make sure we're only doing as much in the constructor as we need to.

Then there's this problem:

In [20]: dateutil.parser.parse('2013-11-3 00:00:00 CDT').tzinfo
Out[20]: tzlocal()   # local TZ is America/Chicago

In [21]: dateutil.parser.parse('2013-11-3 00:00:00 EDT').tzinfo
# None

So I'm not sure we even can do the right thing in all cases without messing with dateutil.parser.

Also, tz_convert is a a Timestamp method. What are the consequences of calling methods on an object while it is still in __new__? Are there any pitfalls we should be watching out for?

EDIT:

And I think -05:00 is timezone with fixed offset, by definition.

I agree that this is how it's defined, but I think it's better to think about what information we need about a time in order to do what the user wants. Ultimately, I think the user (and here I'm generalizing from a sample size of 'me') wants to be able to relate times in a particular place. Fixed offset time zones are a tool for doing this (and I use them because I collect data using time-zone-naive devices for which the offset is only guaranteed to be correct at the start of the collection window), but in general I suspect that most of the time that a user gets a Timestamp with a fixed offset, the user is either going to 1) convert it to UTC, 2) convert it to the real time zone it was supposed to be in in the first place, or 3) write buggy DST-handling code. Not every user can be saved from case 3), but I think we should make case 2) as attractive as possible.

Contributor

jreback commented Jul 25, 2014

@ischwabacher, @sinhrks is talking about a much lower-level object, called a TSObject.

dateutil is not used here, its parsed by the internal c-routines (as this is a standard format)

Contributor

ischwabacher commented Jul 25, 2014

Timestamp.__new__ calls dateutil.parser.parse (for which parse_date is an alias) on its first argument if it's a string. So in the string input case, the first argument to the low-level convert_to_tsobject call is a datetime.datetime with a dateutil.tz time zone. It will never be a string object.

EDIT: But I did fail to notice that there's a tz_convert that takes an ndarray[int64_t] as its first argument. Is that what you were referring to?

jreback closed this in #7907 Aug 11, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment