Why is pd.Index.union not commutative? #23525

Closed
opened this issue Nov 6, 2018 · 9 comments

Contributor

ArtinSarraf commented Nov 6, 2018 • edited

Code Sample, a copy-pastable example if possible

```>>> ix = pd.Index([1,2])
>>> eix = pd.Index([])
>>> pi = pd.PeriodIndex(['19910905', '19910906'], freq='D')

# Pair 1
>>> pi.union(eix)
ValueError: can only call with other PeriodIndex-ed objects
>>> eix.union(pi)
PeriodIndex(['1991-09-05', '1991-09-06'], dtype='period[D]', freq='D')

# Pair 2
>>> pi.union(ix)
ValueError: can only call with other PeriodIndex-ed objects
>>> ix.union(pi)
Index([1, 2, 1991-09-05, 1991-09-06], dtype='object')```

Problem description

Conceptually I would imagine a union operation to be commutative. I was just wondering if there was an deliberate rationale behind not implementing pd.Index._assert_can_do_setop to only fail if the complementary self._assert_can_do_setop also fails.

This behavior also leads to some unexpected behaviors in `pd.concat`. For example:

``````>>> df1 = df1 = pd.DataFrame([[1,2,3],[1,2,3]], index=pd.PeriodIndex(['19910905', '19910906'], freq='D'))
>>> df2 = pd.DataFrame()
>>> pd.concat([df1, df2], axis=1, keys=['a', 'b'])
ValueError: can only call with other PeriodIndex-ed objects
>>> pd.concat([df2, df1], axis=1, keys=['a', 'b'])
Works!
``````

Additionally (and perhaps this should be raised as a separate issue) should the specific implementation of `pd.PeriodIndex._assert_can_do_setop` not raise if the `other` index is empty? Since `pd.Index([]).union(<instance of pd.PeriodIndex>)` results in an instance of `pd.PeriodIndex`.

Member

gfyoung commented Nov 6, 2018

 cc @jreback
Contributor

jreback commented Nov 6, 2018

 we don't ignore empties. actually these should all convert to object dtype (and work).
Contributor

TomAugspurger commented Nov 6, 2018

 Agreed.

TomAugspurger added the Dtypes label Nov 6, 2018

Contributor Author

ArtinSarraf commented Nov 7, 2018 • edited

 @jreback - just to clarify - are you suggesting that: the union of an empty index with any other index should not result in an index of the same type as the "other" index. e.g. ```>>> pd.Index([]).union(pd.period_range('19910905', periods=2)) Current: PeriodIndex(['1991-09-05', '1991-09-06'], dtype='period[D]', freq='D') Desired: Index([1991-09-05, 1991-09-06], dtype='object') >>> pd.Index([]).union(pd.interval_range(start=0, end=5)) Current: IntervalIndex([(0, 1], (1, 2]] closed='right', dtype='interval[int64]') Desired: Index([(0, 1], (1, 2]], dtype='object')``` Any index should be able to form a union with any other type, as long as the resultant dtype is `object`? e.g. ```>>> pd.period_range('19910905', periods=2).union(pd.Index([1,2,3])) Current: ValueError Desired: Index([1991-09-05, 1991-09-06, 1, 2, 3], dtype='object')``` 2b) And should this only be the case for which the `_assert_can_do_setop` is valid for at least one of the indexes? e.g. Consider the following pairs: Index/PeriodIndex - Index can do setop with PeriodIndex, but not vice-versa. IntervalIndex/PeriodIndex - neither index can do a setop with the other - should this still work and result in an object dtype?
Contributor

jreback commented Nov 7, 2018

 yes to both
Contributor Author

ArtinSarraf commented Nov 7, 2018

 Thoughts on 2b?
Contributor

jreback commented Nov 7, 2018

 no like indexes are combined to form the same type unlike are coerced to object and then combine it’s likely that the coercion logic is not fully handling things we should not have any special cases - these are general ops

ArtinSarraf referenced this issue Nov 7, 2018

Merged

ENH - Index set operation modifications to address issue #23525 #23538

Contributor Author

ArtinSarraf commented Nov 9, 2018

 @jreback - how should the union of Int64Index and RangeIndex behave ideally? Should they also result in an object dtype or behave the same as they do now?
Contributor

TomAugspurger commented Nov 9, 2018

 RangeIndex should always be just an optimization of Int64Index. So you would return a RangeIndex if possible, else an Int64Index. … On Thu, Nov 8, 2018 at 9:03 PM ArtinSarraf ***@***.***> wrote: @jreback - how should the union of Int64Index and RangeIndex behave ideally? Should they also result in an object dtype or behave the same as they do now? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#23525 (comment)>, or mute the thread .

TomAugspurger added a commit that referenced this issue May 21, 2019

``` ENH - Index set operation modifications to address issue #23525 (#23538) ```
``` 20d0ad1 ```

yanglinlee added a commit to yanglinlee/pandas that referenced this issue May 29, 2019

``` ENH - Index set operation modifications to address issue pandas-dev#2… ```
`…3525 (pandas-dev#23538)`
``` 9af03ce ```