Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: .equals for Extension Arrays #27081

Closed
jreback opened this issue Jun 27, 2019 · 8 comments · Fixed by #30652
Closed

API: .equals for Extension Arrays #27081

jreback opened this issue Jun 27, 2019 · 8 comments · Fixed by #30652
Labels
API Design ExtensionArray Extending pandas with custom dtypes or arrays. good first issue
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jun 27, 2019

We don't have a good method of testing for equality between EA's

In [19]: from pandas.tests.extension.decimal.array import DecimalArray                                                                                                                                                                                           

In [20]: from decimal import Decimal                                                                                                                                                                                                                             

In [21]: x = DecimalArray([Decimal('1'),Decimal('Nan')])                                                                                                                                                                                                         

In [22]: x == x                                                                                                                                                                                                                                                  
Out[22]: array([ True, False])

In [23]: x = pd.Series([1,np.nan], dtype='Int64').array                                                                                                                                                                                                          

In [24]: x == x                                                                                                                                                                                                                                                  
Out[24]: array([ True, False])

These happen to work with Series now because the null checks are handled at a higher level.

In [26]: x.equals(x)                                                                                                                                                                                                                                             
Out[26]: True

In [27]: x = pd.Series(DecimalArray([Decimal('1'),Decimal('Nan')]))                                                                                                                                                                                              

In [28]: x.equals(x)                                                                                                                                                                                                                                             
Out[28]: True

we could provide a default implementation that should just work in the presence of NA.

def equals(self, other):
    return ((self == other) | (self.isna() == other.isna())).all()

actually we should also implement a default __eq__

@jreback jreback added API Design ExtensionArray Extending pandas with custom dtypes or arrays. labels Jun 27, 2019
@jreback jreback added this to the 0.25.0 milestone Jun 27, 2019
@jreback
Copy link
Contributor Author

jreback commented Jun 27, 2019

@jreback
Copy link
Contributor Author

jreback commented Jun 27, 2019

actually we define __eq__ if you subclass from ExtensionArrayScalarMixin, but we need to change to use the above (and __ne__).

@jorisvandenbossche
Copy link
Member

+1 on adding equals and such a default implementation.

We might only need to add __eq__ as a required abstract method, to ensure people implement it (otherwise the default python one will always return True or False based on identity, and thus give wrong results for such an equals method)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 27, 2019 via email

@jorisvandenbossche
Copy link
Member

Removing the milestone as I think this is not 0.25.0 critical?

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.25.0, Contributions Welcome Jun 30, 2019
@jreback
Copy link
Contributor Author

jreback commented Jun 30, 2019

no this is slightly non trivial

@zys5945
Copy link
Contributor

zys5945 commented Sep 1, 2019

Should the __eq__ on ExtensionScalarOpsMixin be changed to take care nan, or should ExtensionArray expose a .equals() that does that?

I believe DecimalArray is the only class that inherits ExtensionScalarOpsMixin so to cover all EA's I think a .equals() on EA is more appropriate?

@jreback

@VijayantSoni
Copy link
Contributor

Hi, As this is labeled good first issue, I would like to take this on. Is this still needed ?

dwhu added a commit to dwhu/pandas that referenced this issue Jan 3, 2020
Adding __eq__ to ExtensionArray Abstract method doc string.

Adding ne implementation to EA base class.

Also removing other implementations.

Updating EA equals method and adding tests.

pandas-devGH-27081
dwhu added a commit to dwhu/pandas that referenced this issue Jan 3, 2020
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, 1.1 May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design ExtensionArray Extending pandas with custom dtypes or arrays. good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants