BUG/API: Categorical doesn't support categories with tz #10713

Closed
sinhrks opened this Issue Jul 31, 2015 · 4 comments

Comments

Projects
None yet
3 participants
Member

sinhrks commented Jul 31, 2015

Creating Categorical from DatetimeIndex with tz results in GMT.

import pandas as pd
idx = pd.date_range('2011-01-01', periods=5, freq='M', tz='US/Eastern')
idx
# DatetimeIndex(['2011-01-31 00:00:00-05:00', '2011-02-28 00:00:00-05:00',
#                '2011-03-31 00:00:00-04:00', '2011-04-30 00:00:00-04:00',
#                '2011-05-31 00:00:00-04:00'],
#               dtype='datetime64[ns]', freq='M', tz='US/Eastern')

pd.Categorical(idx)
# [2011-01-31 05:00:00, 2011-02-28 05:00:00, 2011-03-31 04:00:00, 2011-04-30 04:00:00, 2011-05-31 # 04:00:00]
# Categories (5, datetime64[ns]): [2011-01-31 05:00:00, 2011-02-28 05:00:00, 2011-03-31 04:00:00
#                                 , 2011-04-30 04:00:00, 2011-05-31 04:00:00]

@sinhrks this is only the repr that is incorrect, the actual categories are still tz aware:

In [95]: pd.Categorical(idx)
Out[95]:
[2011-01-31 05:00:00, 2011-02-28 05:00:00, 2011-03-31 04:00:00, 2011-04-30 04:00
:00, 2011-05-31 04:00:00]
Categories (5, datetime64[ns]): [2011-01-31 05:00:00, 2011-02-28 05:00:00, 2011-
03-31 04:00:00
                                , 2011-04-30 04:00:00, 2011-05-31 04:00:00]

In [96]: cat = pd.Categorical(idx)

In [97]: cat.categories
Out[97]:
DatetimeIndex(['2011-01-31 00:00:00-05:00', '2011-02-28 00:00:00-05:00',
               '2011-03-31 00:00:00-04:00', '2011-04-30 00:00:00-04:00',
               '2011-05-31 00:00:00-04:00'],
              dtype='datetime64[ns]', freq='M', tz='US/Eastern')

In [98]: cat[0]
Out[98]: Timestamp('2011-01-31 00:00:00-0500', tz='US/Eastern', offset='M')
Member

sinhrks commented Aug 1, 2015

@jorisvandenbossche Thanks. I think I understood the issue. Also, PeriodIndex seems to show incorrect category values (or intended?).

import pandas as pd
idx = pd.period_range('2011-01-01 09:00', freq='H', periods=5)
c1 = pd.Categorical(idx)
# [2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00]
# Categories (5, int64): [359409, 359410, 359411, 359412, 359413]
Contributor

jreback commented Aug 1, 2015

@sinhrks https://github.com/pydata/pandas/blob/master/pandas/core/categorical.py#L1328
should be self.categories (the .get_values()) converts to a baser form that's why tz and period are messed up.

Also pls check on the Series repr as well

sinhrks referenced this issue Aug 1, 2015

Merged

BUG: Categorical doesn't show tzinfo properly #10718

6 of 6 tasks complete
Member

sinhrks commented Aug 1, 2015

Thanks. I've prepared #10718, and will look into Series with Categorical also.

jreback added this to the 0.17.0 milestone Aug 1, 2015

sinhrks closed this in #10718 Aug 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment