date_range does not capture right timezone from input dates #7901

Closed
rockg opened this Issue Aug 2, 2014 · 19 comments

Comments

Projects
None yet
5 participants
Contributor

rockg commented Aug 2, 2014

Example is below. I would expect that if dates have a timezone on them, date_range would then use that timezone to fill in the rest of the period. However, something goes awry (notice the 01:00 below). If I have dates with a timezone and pass a timezone (case 2), it still doesn't work. Only when I remove the timezone from the dates does it work (case 3). I would expect all these to work the same.

import pytz
tz = pytz.timezone('US/Eastern')
from datetime import datetime
sd = tz.localize(datetime(2014, 3, 6))
ed = tz.localize(datetime(2014, 3, 12))
list(pd.date_range(sd, ed, freq='D'))
Out[41]: 
[Timestamp('2014-03-06 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-08 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-09 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-10 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-11 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-12 01:00:00-0400', tz='US/Eastern', offset='D')]

list(pd.date_range(sd, ed, freq='D', tz='US/Eastern'))
Out[42]: 
[Timestamp('2014-03-06 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-08 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-09 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-10 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-11 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-12 01:00:00-0400', tz='US/Eastern', offset='D')]

list(pd.date_range(sd.replace(tzinfo=None), ed.replace(tzinfo=None), freq='D', tz='US/Eastern'))
Out[43]: 
[Timestamp('2014-03-06 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-08 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-09 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-10 00:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-11 00:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-12 00:00:00-0400', tz='US/Eastern', offset='D')]
Contributor

jreback commented Aug 2, 2014

is this the same as #7835 ?

Contributor

rockg commented Aug 2, 2014

Now that I look more closely, it probably is. I'd prefer to leave it open for some additional test cases at least.

jreback added this to the 0.15.0 milestone Aug 2, 2014

Contributor

rockg commented Aug 4, 2014

The 3.4 test failed and it's because Timestamp vs datetime result in different offsets. I recall seeing this a few days ago in one of the issues but can't easily find it. Any ideas what's going on?

from pytz import timezone as tz
pd.Timestamp('1/1/2011', tz='US/Eastern')
2011-01-01 00:00:00-05:00
datetime(2011, 1, 1, tzinfo=tz('US/Eastern'))
2011-01-01 00:00:00-04:56
Contributor

seth-p commented Aug 4, 2014

FWIW, this is what I see using 64-bit Python 3.4.1 on Windows, with pytz 2014.4:

In [105]: from datetime import datetime

In [106]: datetime(2011, 1, 1, tzinfo=tz('US/Eastern'))
Out[106]: datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>)

I don't know if it has anything to do with anything, but I just noticed that there's no egg for 3.4 on https://pypi.python.org/pypi/pytz.

Contributor

rockg commented Aug 4, 2014

And I don't know how this particular test test_daterange.py(TestDateRange.test_range_tz_pytz) ever passes on 3.4 (I didn't add it and my change doesn't impact it).

Contributor

jreback commented Aug 4, 2014

@rockg pytz will fall back to a generic installer for 3.4, and since its python only this works.

Contributor

rockg commented Aug 4, 2014

I don't understand completely what that means. Is this a bug in itself (why does the Timestamp have a different offset than the datetime) or are they doing something different with pytz?

Member

sinhrks commented Aug 4, 2014

Shoud use localize in pytz with DST to get localized offset.

datetime.datetime(2011, 1, 1, tzinfo=pytz.timezone('US/Eastern'))
# 2011-01-01 00:00:00-04:56

pytz.timezone('US/Eastern').localize(datetime.datetime(2011, 1, 1))
# 2011-01-01 00:00:00-05:00

Before your fix, both date_range and datetime had non-localized offset <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD> thus test has passed.

Contributor

rockg commented Aug 4, 2014

@sinhrks Why would this only have to happen for 3.4? All other versions of tests passed fine. And I swear that I tried localize to the same effect, but you show otherwise...I will confirm later this evening.

Member

sinhrks commented Aug 4, 2014

In my understanding, the issue is caused by pytz 2014.4 used in 3.4 test, unrelated to python version. I checked above behaviour in 2.7.6, but may better to confirm with 3.4 also.

http://stackoverflow.com/questions/24188060/in-pandas-why-does-tz-convert-change-the-timezone-used-from-est-to-lmt

Contributor

jreback commented Aug 4, 2014

if its with pytz 2014.4 then the problem is with the comparison itself. See for example a fix here: https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_timezones.py#L386

You have to be very explicit with the expected case, iow, you have to normalize it correctly. Their were a few cases that 'worked' because US/Eastern was the same through all pytz, but that changed in pytz 2014.3 (when the actual timezone definition changed to be LMT).

3.4 fails because it uses the current pytz. the others use a definition that has < 2014.3 pytz

Contributor

rockg commented Aug 4, 2014

Okay, now this is all making sense. I thought all travis tests were using the latest pytz version. I will update my test. Thanks @sinhrks, @jreback. I will add to the release note that simply passing in tzinfo is not enough and that localize is the right way to create localized times.

Contributor

jreback commented Aug 4, 2014

@rockg what do you you mean passing in 'tzinfo' is not enough? you ALWAYS have to localize

Contributor

rockg commented Aug 4, 2014

I know, but the pandas tests themselves don't localize (and I'm sure other people have done the same thing and it was fine until the latest release of pytz).

Contributor

jreback commented Aug 4, 2014

no, the pandas tests are WRONG if they don't localize. I appreciate that you want to fix the docs, ok. But pandas tests themselves need to be fixed if they are wrong (as some were when we shifted to 2014.3)

Contributor

jreback commented Aug 4, 2014

Contributor

rockg commented Aug 4, 2014

That's what I'm saying...the pandas tests are wrong. Of course I'm going to fix the tests in addition to the docs.

Contributor

jreback commented Aug 4, 2014

@rockg perfect, thanks!

and a doc-warning (actually in the timezone section) might not be a bad idea as well.

Contributor

ischwabacher commented Aug 4, 2014

I don't think the user should ever have to call localize or normalize. Those are datetime.datetime implementation details that somehow datetime.datetime failed to implement, so pytz had to graft them on somewhere in order to work. (There's more detail on this in my Stack Overflow answer.) But Timestamp is a datetime.datetime subclass that does know about them, so it should take care of calling localize and normalize as appropriate and leave the user none the wiser.

Again, I am strongly in favor of a view of time zones as immutable objects, at least modulo the ability of governments to screw up our predictions of the future.

jreback closed this in #7909 Aug 5, 2014

@jreback jreback added a commit that referenced this issue Aug 5, 2014

@jreback jreback Merge pull request #7909 from rockg/master
Remove from start/end dates if tz is not None (#7901, #7835)
7568018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment