Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is pd.Index.union not commutative? #23525

Closed
ArtinSarraf opened this issue Nov 6, 2018 · 9 comments

Comments

@ArtinSarraf
Copy link
Contributor

commented Nov 6, 2018

Code Sample, a copy-pastable example if possible

>>> ix = pd.Index([1,2])
>>> eix = pd.Index([])
>>> pi = pd.PeriodIndex(['19910905', '19910906'], freq='D')

# Pair 1
>>> pi.union(eix)
ValueError: can only call with other PeriodIndex-ed objects
>>> eix.union(pi)
PeriodIndex(['1991-09-05', '1991-09-06'], dtype='period[D]', freq='D')

# Pair 2
>>> pi.union(ix)
ValueError: can only call with other PeriodIndex-ed objects
>>> ix.union(pi)
Index([1, 2, 1991-09-05, 1991-09-06], dtype='object')

Problem description

Conceptually I would imagine a union operation to be commutative. I was just wondering if there was an deliberate rationale behind not implementing pd.Index._assert_can_do_setop to only fail if the complementary self._assert_can_do_setop also fails.

This behavior also leads to some unexpected behaviors in pd.concat. For example:

>>> df1 = df1 = pd.DataFrame([[1,2,3],[1,2,3]], index=pd.PeriodIndex(['19910905', '19910906'], freq='D'))
>>> df2 = pd.DataFrame()
>>> pd.concat([df1, df2], axis=1, keys=['a', 'b'])
ValueError: can only call with other PeriodIndex-ed objects
>>> pd.concat([df2, df1], axis=1, keys=['a', 'b'])
Works!

Additionally (and perhaps this should be raised as a separate issue) should the specific implementation of pd.PeriodIndex._assert_can_do_setop not raise if the other index is empty? Since pd.Index([]).union(<instance of pd.PeriodIndex>) results in an instance of pd.PeriodIndex.

@gfyoung

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

we don't ignore empties. actually these should all convert to object dtype (and work).

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

Agreed.

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Nov 6, 2018

@TomAugspurger TomAugspurger added the Dtypes label Nov 6, 2018

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

@jreback - just to clarify - are you suggesting that:

  1. the union of an empty index with any other index should not result in an index of the same type as the "other" index.
    e.g.
>>> pd.Index([]).union(pd.period_range('19910905', periods=2))
Current: PeriodIndex(['1991-09-05', '1991-09-06'], dtype='period[D]', freq='D')
Desired: Index([1991-09-05, 1991-09-06], dtype='object')
>>> pd.Index([]).union(pd.interval_range(start=0, end=5))
Current: IntervalIndex([(0, 1], (1, 2]]
                       closed='right',
                       dtype='interval[int64]')
Desired: Index([(0, 1], (1, 2]], dtype='object')
  1. Any index should be able to form a union with any other type, as long as the resultant dtype is object?
    e.g.
>>> pd.period_range('19910905', periods=2).union(pd.Index([1,2,3]))
Current: ValueError
Desired: Index([1991-09-05, 1991-09-06, 1, 2, 3], dtype='object')

2b) And should this only be the case for which the _assert_can_do_setop is valid for at least one of the indexes?
e.g.
Consider the following pairs:
Index/PeriodIndex - Index can do setop with PeriodIndex, but not vice-versa.
IntervalIndex/PeriodIndex - neither index can do a setop with the other - should this still work and result in an object dtype?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

yes to both

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

Thoughts on 2b?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 7, 2018

no

like indexes are combined to form the same type
unlike are coerced to object and then combine

it’s likely that the coercion logic is not fully handling things

we should not have any special cases - these are general ops

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor Author

commented Nov 9, 2018

@jreback - how should the union of Int64Index and RangeIndex behave ideally? Should they also result in an object dtype or behave the same as they do now?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.