Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken roundrip DatetimeIndex -> CategoricalIndex -> DatetimeIndex #18664

Closed
toobaz opened this issue Dec 6, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@toobaz
Copy link
Member

commented Dec 6, 2017

Code Sample, a copy-pastable example if possible

In [2]: pd.DatetimeIndex(pd.CategoricalIndex(pd.DatetimeIndex(['2015-10-10'], tz='US/Eastern')))
Out[2]: DatetimeIndex(['2015-10-10 04:00:00'], dtype='datetime64[ns]', freq=None)

Problem description

Out[2] has no timezone information. Related to this comment

Notice that it makes sense to loose freq, as this is a property of a specific collection of dates, such as a DatetimeIndex is. Instead, tz is a property of each date, and it should be hence kept.

This is (I think) the reason why

In [2]: pd.core.dtypes.cast.maybe_cast_to_datetime(pd.Timestamp('2015-10-10'), None)
Out[2]: array(['2015-10-10T00:00:00.000000000'], dtype='datetime64[ns]')

but

In [3]: pd.core.dtypes.cast.maybe_cast_to_datetime(pd.Timestamp('2015-10-10', tz='US/Eastern'), None)
Out[3]: DatetimeIndex(['2015-10-10 00:00:00-04:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

Certainly related to #13783, #14052, #13238 and others, but probably requires a separate fix (as long as we don't have a real tz-aware dtype).

Expected Output

Out[2]: DatetimeIndex(['2015-10-10 00:00:00-04:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: fdba133
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+301.gfdba13333
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

@TomAugspurger any idea what's going on here? Seems like something to address as long as we're all focused on the DTA/TDA/PA constructors

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 9, 2018

Seems like the DTI constructor goes CategoricalIndex -> ndarray[datetime64[ns]], which loses the tzinfo.

In [2]: a = pd.CategoricalIndex(pd.DatetimeIndex(['2015-01-01'], tz='US/Eastern'))
In [3]: pd.DatetimeIndex(a)
> /Users/taugspurger/sandbox/pandas/pandas/core/indexes/datetimes.py(244)__new__()
-> if isinstance(data, Index):
(Pdb) data
CategoricalIndex(['2015-01-01 00:00:00-05:00'], categories=[2015-01-01 00:00:00-05:00], ordered=False, dtype='category')
(Pdb) c
> /Users/taugspurger/sandbox/pandas/pandas/core/indexes/datetimes.py(244)__new__()
-> if isinstance(data, Index):
(Pdb) data
array(['2015-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

We would ideally follow this check down the datetimetz path, but is_dateimtetz(CategoricalIndex[datetime64[ns, tz]]) is false.

279  ->         if not (is_datetime64_dtype(data) or is_datetimetz(data) or
280                     is_integer_dtype(data) or lib.infer_dtype(data) == 'integer'):
(Pdb) data
CategoricalIndex(['2015-01-01 00:00:00-05:00'], categories=[2015-01-01 00:00:00-05:00], ordered=False, dtype='category')
(Pdb) is_datetimetz(data)
False

To the extent possible, I would recommend an array from the Series / Index as early as possible. Or we could maybe update is_datetimetz to look into index classes.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 9, 2018

This is also fishy:

(Pdb) tools.to_datetime(data)
DatetimeIndex(['2015-01-01 05:00:00'], dtype='datetime64[ns]', freq=None)
(Pdb) tools.to_datetime(data.categories)
DatetimeIndex(['2015-01-01 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

Those should probably both be datetime64[ns, US/Eastern]. We'd want to fix that for user code which may hit it, but again would be solved by unboxing arrays early in the index constructor (which maybe has to wait till we have lossless arrays for everything).

@jbrockmendel jbrockmendel added this to Orthogonal Blockers in DatetimeArray Refactor Nov 16, 2018

@jbrockmendel jbrockmendel removed this from Orthogonal Blockers in DatetimeArray Refactor Nov 16, 2018

@jbrockmendel jbrockmendel added this to DatetimeIndex Bugs in DatetimeArray Refactor Nov 16, 2018

@jbrockmendel jbrockmendel referenced this issue Dec 8, 2018

Merged

BUG: Assorted DatetimeIndex bugfixes #24157

1 of 1 task complete

@jreback jreback added this to the 0.24.0 milestone Dec 13, 2018

@jbrockmendel jbrockmendel moved this from DTI/DTA Constructor Issues to Done in DatetimeArray Refactor Dec 15, 2018

@jbrockmendel jbrockmendel removed this from Done in DatetimeArray Refactor Jan 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.