Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: tz info lost by set_index and reindex #7092

Merged
merged 1 commit into from May 12, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented May 10, 2014

Closes #6631. Closes #5878. Regarding #3950, original problem can be fixed by this, but groupby problem isn't.

Also, this includes the fix for MultiIndex.get_level_values doesn't retain tz and freq, as the method is used in set_index.

# current master
>>> import pandas as pd
>>> didx = pd.DatetimeIndex(start='2013/01/01', freq='H', periods=4, tz='Asia/Tokyo')
>>> pidx = pd.PeriodIndex(start='2013/01/01', freq='H', periods=4)

>>> midx = pd.MultiIndex.from_arrays([didx, pidx])
>>> midx.get_level_values(0)
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-12-31 15:00:00, ..., 2012-12-31 18:00:00]
Length: 4, Freq: None, Timezone: None

>>> midx.get_level_values(1)
Int64Index([376944, 376945, 376946, 376947], dtype='int64')
# after fix
>>> midx.get_level_values(0)
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00+09:00, ..., 2013-01-01 03:00:00+09:00]
Length: 4, Freq: H, Timezone: Asia/Tokyo

>>> midx.get_level_values(1)
PeriodIndex([u'2013-01-01 00:00', u'2013-01-01 01:00', u'2013-01-01 02:00', u'2013-01-01 03:00'], freq='H')

fill_value=unique_vals._na_value))
values.name = self.names[num]
filled = com.take_1d(unique_vals.values, labels, fill_value=unique_vals._na_value)
if isinstance(unique_vals, DatetimeIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simply do:

values = Index(filled, freq=getattr(unique_vals,'freq',None), tz=getattr(unique_vals,'tz',None, name = self.names[num])

instead of the if/elsif (and don't need to import)

Index will infer the type will is more pythonic here (and the additional args will not be passed on)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. PeriodIndex can work but DatetimeIndex doesn't. Because values are once converted to ndarray, Index performs tz conversion once again and results in incorrect output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm
worth creating a _simple_new for Index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will be an option. Otherwise make DatetimeIndex to support np.datetime64 timezone...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no don't use the tz of np.datetime64 totally broken

@jreback jreback added this to the 0.14.0 milestone May 10, 2014
@jreback jreback modified the milestones: 0.14.1, 0.14.0 May 12, 2014
@sinhrks
Copy link
Member Author

sinhrks commented May 12, 2014

OK, added _simple_new to Index. Also, I noticed #5878 can be solved by the same fix.

@jreback jreback modified the milestones: 0.14.0, 0.14.1 May 12, 2014
@jreback
Copy link
Contributor

jreback commented May 12, 2014

gr8! thanks

jreback added a commit that referenced this pull request May 12, 2014
BUG: tz info lost by set_index and reindex
@jreback jreback merged commit c59b217 into pandas-dev:master May 12, 2014
@sinhrks sinhrks deleted the appendtz branch May 13, 2014 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Period Period data type
Projects
None yet
2 participants