Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: index name lost with timedelta ops #9926

Closed
sinhrks opened this issue Apr 18, 2015 · 4 comments · Fixed by #10158
Closed

BUG: index name lost with timedelta ops #9926

sinhrks opened this issue Apr 18, 2015 · 4 comments · Fixed by #10158
Labels
Bug Timedelta Timedelta data type
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Apr 18, 2015

import pandas as pd 

dtidx = pd.DatetimeIndex(['2011-01-01'], freq='D', name='dtidx')
(dtidx + 1).name
# dtidx

# NG
(dtidx + pd.Timedelta('1 day')).name
# None

tdidx = pd.TimedeltaIndex(['1 day'], freq='D', name='tdidx')
(tdidx + 1).name
# tdidx

# NG
(tdidx + pd.Timedelta('1 day')).name
# None

ref: #9862

@sinhrks
Copy link
Member Author

sinhrks commented Apr 18, 2015

Would like to define expected bahavior. Based on normal index, I understand first (left-side) index name should be prioritized.

idx1 = pd.Index([1], name='idx1')
idx2 = pd.Index([1], name='idx2')
(idx1 + 1).name
# idx1
(idx1 + idx1).name
# idx1
(idx1 + idx2).name
# idx1

But in case of datetime-likes, I feel it is natural that prioritize the name of DatetimeIndex. I'd like to ask whether I can prepare a fix based on following behavior.

left side right side prioritized name
DatetimeIndex DatetimeIndex name of left side
TimedeltaIndex TimedeltaIndex name of left side
DatetimeIndex TimedeltaIndex name of left side (DatetimeIndex)
TimedeltaIndex DatetimeIndex name of right side (DatetimeIndex)
dtidx = pd.DatetimeIndex(['2011-01-01'], freq='D', name='dtidx')
tdidx = pd.TimedeltaIndex(['1 day'], freq='D', name='tdidx')

dtidx + tdidx
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2011-01-02]
# Length: 1, Freq: None, Timezone: None

(dtidx + tdidx).name
# dtidx

tdidx + dtidx
# <class 'pandas.tseries.index.DatetimeIndex'>
# [2011-01-02]
# Length: 1, Freq: None, Timezone: None

(tdidx + dtidx).name
# dtidx

@jreback
Copy link
Contributor

jreback commented Apr 18, 2015

no priority
append / op on an index has to have the same name (or None)
otherwise will be set to None

see Index.append

it may be a bug if add ops don't follow this pattern

@sinhrks
Copy link
Member Author

sinhrks commented Apr 18, 2015

Thanks, could you check following understanding is correct for both set (intersection, etc) and arithmetic ops (addition, etc)?

  • In case of Index + Index, name is preserved if left and right index have the same name. Otherwise, name is reset to None.
  • In case of Index + scalar or scalar + Index, name of the index is preserved.

Based on above understanding, normal index behaves incorrectly. All the below ops should reset the name to None.

idx1 = pd.Index([1, 2, 3], name='idx1')
idx2 = pd.Index([1, 2, 3], name='idx2')

result = idx1 + idx2
result, result.name
# (Int64Index([2, 4, 6], dtype='int64'), 'idx1')

result = idx1.__add__(idx2)
result, result.name
# (Int64Index([2, 4, 6], dtype='int64'), 'idx1')

result = idx1.intersection(idx2)
result, result.name
# (Int64Index([1, 2, 3], dtype='int64'), 'idx1')

@shoyer
Copy link
Member

shoyer commented Apr 19, 2015

@sinhrks Yes, I think you correctly understand this now.

Here are some notes on this from @cpcloud: blaze/blaze#458 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants