Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: DatetimeIndex ignores timezones #11736

Closed
hcarvalhoalves opened this issue Dec 1, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@hcarvalhoalves
Copy link

commented Dec 1, 2015

In 0.16.2, DatetimeIndex functions used to handle the timezone correctly:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'x': ["2015-05-27 19:05:17.178000-02:00"]})
In [3]: pd.DatetimeIndex(df.x)
Out[3]: DatetimeIndex(['2015-05-27 19:05:17.178000-02:00'], dtype='datetime64[ns]', freq=None, tz='tzoffset(None, -7200)')

In 0.17.0, it's now ignoring timezones:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': ["2015-05-27 19:05:17.178000-02:00"]})

In [3]: pd.DatetimeIndex(df.x)
Out[3]: DatetimeIndex(['2015-05-27 21:05:17.178000'], dtype='datetime64[ns]', freq=None)

This slipped thru since the tests are very simple and don't cover many cases:

https://github.com/pydata/pandas/blob/cda9da82a68aed42421f7f788fb0c1e949e7a60c/pandas/tests/test_format.py#L3342

I'll attempt a patch later but if someone can give pointers where the breaking change happened I appreciate.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2015

so there are actually quite a number of tests, but not in formatting; in fact this has nothing to do with formatting.

In 0.17.1, this 'works'

In [2]: from dateutil.tz import tzoffset

In [3]: pd.DatetimeIndex(["2015-05-27 19:05:17.178000-02:00"],tz=tzoffset(None, -7200))
Out[3]: DatetimeIndex(['2015-05-27 21:05:17.178000-02:00'], dtype='datetime64[ns, tzoffset(None, -7200)]', freq=None)

but is actually 'wrong'

In [4]: pd.DatetimeIndex(["2015-05-27 19:05:17.178000-02:00"]).tz_localize('UTC').tz_convert(tz)
Out[4]: DatetimeIndex(['2015-05-27 19:05:17.178000-02:00'], dtype='datetime64[ns, tzoffset(None, -7200)]', freq=None)

is the correct result. However, this parsing of a tz-aware then 'setting' its value via the tz arg is very very odd.

This is why pd.to_datetime doesn't even have a tz argument, because its too error prone.

E.g. what exactly do you mean by having a tz-aware date AND providing a tz? should it be localized, renamed, or converted?

I think the DatetimeIndex constructor should raise on this actually, rather than let this slip thru.

@hcarvalhoalves

This comment has been minimized.

Copy link
Author

commented Dec 1, 2015

E.g. what exactly do you mean by having a tz-aware date AND providing a tz? should it be localized, renamed, or converted?

Basically, pre 0.17.x behavior was correct - it doesn't throw away the tz info neither tries to covert when fed a TZ-aware ISO string. So, if I had analogous function using standard datetime API, it would be equivalent to:

some_parser('2015-05-27 19:05:17.178000-02:00') == datetime.datetime(2015, 5, 27, 19, 5, 17, 178, tzinfo=pytz.FixedOffset(-120)) # offset is in minutes

Which is sensible behavior for such ISO datetime parser [1].

0.17.0 behavior is to ignore the TZ and create a naive datetime series, which is bad because now we've lost tz information from the string. And I've now learned 0.17.1 behavior is even worse, since it's converting and keeping the tz info. Either you convert and turn into naive (drop tz info) or you localize (add the tz info), but never a mix of both.

so there are actually quite a number of tests, but not in formatting; in fact this has nothing to do with formatting.

I see. There are tests here (https://github.com/pydata/pandas/blob/master/pandas/tests/test_series.py#L112) but are not covering parsing an ISO datetime string either.

The parsing seems to be happening here (https://github.com/pydata/pandas/blob/master/pandas/tseries/tools.py#L279), I'll come up with some test cases.

[1] https://tools.ietf.org/html/rfc3339#section-4.2

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2015

@hcarvalhoalves no, the pre 0.17.x behavior is wrong. It is VERY confusing. This is why the main entry point is pd.to_datetime.

I don't know if passing a tz directly is really tested at all, look at tseries/test_timeseries, tseries/test_timezones, tseries/test_base and tests/test_index as locations for tests.

Since I have never seen this I doubt there are tests (or I would have had to resolve this when I added this feature).

You are essentially 'naming' a tz. This is not a normal things to do. Either you have a string that has a tz in it with an offset, in which case you convert to UTC, THEN convert it, or you construct it in local time, then localize it.

There are NO other operations.

@hcarvalhoalves

This comment has been minimized.

Copy link
Author

commented Dec 1, 2015

I see. You mean the expected usage is this?

In [16]: pd.to_datetime(['2015-05-27 19:05:17.178000-02:00'], utc=True)
Out[16]: DatetimeIndex(['2015-05-27 21:05:17.178000+00:00'], dtype='datetime64[ns, UTC]', freq=None)

If this is really intended and not a regression I'll close this specific issue - and another issue could be opened to not have DatetimeIndex fail silently when there's an offset in the string.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2015

no I would leave this open. I think this path is not very well tested and needs exercising / clarification.

My point though is if you have text data, pd.to_datetime is the way (you can also set parse_dates=True when reading from a csv). But DatetimeIndex 'happens' to be able to parse strings and accept a tz which IMHO is just plain confusing.

@jreback jreback added this to the 0.18.0 milestone Dec 1, 2015

@jreback jreback added the Testing label Dec 1, 2015

@jreback jreback modified the milestones: Next Major Release, 0.18.0 Jan 30, 2016

jreback added a commit to jreback/pandas that referenced this issue Feb 1, 2016

jreback added a commit that referenced this issue Feb 1, 2016

BUG: concat of tz series with NaT
closes #11693     xref #11736

Author: Varun <k.varun@outlook.com>
Author: Jeff Reback <jeff@reback.net>

Closes #12195 from jreback/concat_ts and squashes the following commits:

eaad93a [Jeff Reback] BUG: DatetimeIndex constructor to handle dtype & tz with conflicts
90d71f6 [Varun] BUG GH11693 Support NaT series concatenation
@mroeschke

This comment has been minimized.

Copy link
Member

commented Jul 7, 2018

xref #17697, keeping this issue open since it demonstrates the behavior with using DatetimeIndex instead of explicitly to_datetime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.