Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Index should support __inverse__ ops #8875

Closed
1 of 2 tasks
sinhrks opened this issue Nov 22, 2014 · 9 comments · Fixed by #45006
Closed
1 of 2 tasks

API: Index should support __inverse__ ops #8875

sinhrks opened this issue Nov 22, 2014 · 9 comments · Fixed by #45006
Assignees
Labels
Enhancement Index Related to the Index class or subclasses
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Nov 22, 2014

Related to #7979

Because Index is no more the subclass of np.array, Index.duplicated returns Index with dtype=object. And Index don't accepts logical not.

idx = pd.Index([1, 2, 1, 3])

idx.duplicated()
# Index([False, False, True, False], dtype='object')

~idx.duplicated()
# TypeError: bad operand type for unary ~: 'Index'

As a result, it is impossible to drop data which have duplicated index using expression like df[~df.index.duplicated()]. This expression was worked at the timing of #7979.

Does Index.duplicated should return np.array with dtype=bool? Or Index should accept logical not?

@jtratner
Copy link
Contributor

maybe index should allow ~? any reason why it shouldn't respond to that? (might not have been implemented previously)

@jreback
Copy link
Contributor

jreback commented Nov 22, 2014

need to define __invert__ in core/base.py (its defined in core/generic.py so Series works), but Index is left out

@jreback jreback added API Design Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 22, 2014
@jreback jreback added this to the 0.16.0 milestone Nov 22, 2014
@sinhrks sinhrks changed the title API: Index.duplicated() should return np.array ? API: Index should support __inverse__ ops Nov 22, 2014
@sinhrks
Copy link
Member Author

sinhrks commented Nov 22, 2014

OK. Changed the title.

Currently Index doesn't support bool dtype and values are stored as object. To invert it as expected, values should be once converted to bool explicitly. This conversion is specific in Index, thus adding __invert__ to Index looks better (until Index can support bool)?

~np.array([True, False], dtype=object)
# [-2 -1]
~np.array([True, False], dtype=bool)
# [False  True]

@shoyer
Copy link
Member

shoyer commented Nov 23, 2014

Actually, I think idx.duplicated() should indeed return an np.ndarray, as you originally suggested. There's not much use for Boolean indexes, since they can only take on two values.

That said, it's probably a good idea to support ~ for index objects, too. I think this was just overlooked in the removal of ndarray subclassing.

@jreback jreback changed the title API: Index should support __inverse__ ops API: Index should support __inverse__ ops Nov 24, 2014
@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

@sinhrks I agree with both of @shoyer suggestions. Currently boolness is detected in the constructor, so I just think .duplicated() is not creating the booled Index correctly, AND then if it supports the __invert__ op then should work.

@sinhrks
Copy link
Member Author

sinhrks commented Nov 27, 2014

OK. Then prepare 2 separate fixes:

  • Add __inverse__ to Index.
  • Change duplicated to return np.array

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2019
@mroeschke
Copy link
Member

This look to work on master now. Could use a test for the inverse behavior

In [2]: idx = pd.Index([1, 2, 1, 3])
   ...:
   ...: idx.duplicated()
Out[2]: array([False, False,  True, False])

In [3]: ~idx.duplicated()
Out[3]: array([ True,  True, False,  True])

In [4]: pd.__version__
Out[4]: '1.1.0.dev0+1466.ga3477c769.dirty'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Index Related to the Index class or subclasses labels May 5, 2020
@simonjayhawkins
Copy link
Member

This look to work on master now. Could use a test for the inverse behavior

see #8875 (comment)

Index.duplicated() has been changed to return an array, but the other part of the issue is to add _inverse_ to Index. This has not yet been done.

>>> pd.__version__
'1.1.0.dev0+1748.g0bd1f6f6f'
>>>
>>> idx = pd.Index([1, 2, 1, 3])
>>>
>>> pd.Index(idx.duplicated())
Index([False, False, True, False], dtype='object')
>>>
>>> ~pd.Index(idx.duplicated())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad operand type for unary ~: 'Index'
>>>

@YuRenee
Copy link

YuRenee commented Nov 9, 2020

take

@mroeschke mroeschke removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Apr 11, 2021
@mroeschke mroeschke added Enhancement Index Related to the Index class or subclasses labels Apr 11, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.4 Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants