Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STYLE: fix pylint unneeded-not warnings #49382

Merged
merged 4 commits into from
Nov 17, 2022

Conversation

Moisan
Copy link
Contributor

@Moisan Moisan commented Oct 29, 2022

Related to #48855

I added # pylint: disable=unneeded-not for tests where the intention was clearly to test the not operator.

@mzeitlin11 mzeitlin11 added the Code Style Code style, linting, code_checks label Oct 29, 2022
Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Moisan, nice catch disabling those purposeful not usages instead of removing!

@mzeitlin11 mzeitlin11 added this to the 2.0 milestone Oct 29, 2022
Comment on lines 259 to 261
assert not p1 == p3
assert not p1 == p3 # pylint: disable=unneeded-not
assert hash(p1) == hash(p2)
assert not hash(p1) == hash(p3)
assert hash(p1) != hash(p3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my understanding, why does the first one need the check turning off, but the second one can be changed to LSH != RHS?

Copy link
Contributor Author

@Moisan Moisan Oct 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have been a bit too quick on that one. The first one seems to be a bug either on Pandas or Pyarrow side:

>>> p1 = ArrowIntervalType(pa.int64(), "left")
>>> p3 = ArrowIntervalType(pa.int64(), "right")
>>> p1 == p3
False
>>> p1 != p3
False
>>> p1.equals(p3)
False

I looked trough the ArrowIntervalType implementation and the pyarrow.ExtensionType source, and I cannot pinpoint why p1 != p3 would return False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 @jorisvandenbossche is this a bug? Does ArrowIntervalType need something like

    def __ne__(self, other):
        return not self.__eq__(other)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @WillAyd in case you have comments on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems a bug in ArrowIntervalType, and probably needs such a __ne__ (for example the base ExtensionDtype also has such a method)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea very interesting. My guess is that this stems from pyarrow somewhere

>>> getattr(pandas.core.arrays.arrow.extension_types.ArrowIntervalType, "__ne__")
<slot wrapper '__ne__' of 'pyarrow.lib.ExtensionType' objects>

If you traverse the mro its interesting to see this change through the inheritance hierarchy:

for cls in pandas.core.arrays.arrow.extension_types.ArrowIntervalType.__mro__: 
    print(cls, getattr(cls, '__ne__'))

<class 'pandas.core.arrays.arrow.extension_types.ArrowIntervalType'> <slot wrapper '__ne__' of 'pyarrow.lib.ExtensionType' objects>
<class 'pyarrow.lib.ExtensionType'> <slot wrapper '__ne__' of 'pyarrow.lib.ExtensionType' objects>
<class 'pyarrow.lib.BaseExtensionType'> <slot wrapper '__ne__' of 'pyarrow.lib.DataType' objects>
<class 'pyarrow.lib.DataType'> <slot wrapper '__ne__' of 'pyarrow.lib.DataType' objects>
<class 'pyarrow.lib._Weakrefable'> <slot wrapper '__ne__' of 'object' objects>
<class 'object'> <slot wrapper '__ne__' of 'object' objects>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough I didn't see anything in Arrow's build system that would set the language_level for the Cython extension to 3. We do this in setup.py, I think they would ideally do it in their CMake file.

So if its a matter of 2/3 compat with the generated source file could be a nice upstream patch:

https://github.com/apache/arrow/blob/8e3a1e1b7c17213eb8b79fba49c496de6199fd50/cpp/cmake_modules/UseCython.cmake#L138

@lithomas1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said for the scope of this PR I think should just ignore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough I didn't see anything in Arrow's build system that would set the language_level for the Cython extension to 3.

I don't think that matters; if you run Python 3, it has the effect that the default __ne__ will use __eq__ automatically.

But what is happening here is that pyarrow.lib.ExtensionType is a cython class that implements __eq__, and cython defaults to implement a richcompare slot that also adds __ne__ based on the __eq__ of that class. So just like when subclassing a builtin, if you override __eq__ you also need to override __ne__ when subclassing a cython class with a __eq__.
This also happens in pandas, for example we have PeriodDtypeBase in cython with __eq__, and that initially caused the same issue (#37265), and that was fixed by adding __ne__ in the python subclass

Now, we could maybe fix this on the PyArrow side as well by adding an explicit __ne__ that is defined to be return not self == other, which might ensure that subclasses would only need to override __eq__, because the parent __ne__ might then work correctly (something to test).
(but regardless of that we should fix this on the pandas side)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your investigations

If this can be resolved as simply as adding

    def __ne__(self, other):
        return not self.__eq__(other)

(with type annotations), then I'd suggest we do it directly in this PR

The primary motivation for using linters and all it to help catch issues, and I'm glad pylint has helped find this one - I'd suggest it just be fixed in this PR

Comment on lines 259 to 261
assert not p1 == p3
assert not p1 == p3 # pylint: disable=unneeded-not
assert hash(p1) == hash(p2)
assert not hash(p1) == hash(p3)
assert hash(p1) != hash(p3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @WillAyd in case you have comments on this

Comment on lines 259 to 261
assert not p1 == p3
assert not p1 == p3 # pylint: disable=unneeded-not
assert hash(p1) == hash(p2)
assert not hash(p1) == hash(p3)
assert hash(p1) != hash(p3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your investigations

If this can be resolved as simply as adding

    def __ne__(self, other):
        return not self.__eq__(other)

(with type annotations), then I'd suggest we do it directly in this PR

The primary motivation for using linters and all it to help catch issues, and I'm glad pylint has helped find this one - I'd suggest it just be fixed in this PR

@@ -35,6 +35,9 @@ def __eq__(self, other):
else:
return NotImplemented

def __ne__(self, other) -> bool:
return not self.__eq__(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return not self.__eq__(other)
return not self == other

Based on the links I read about this, they recommended using this instead of __eq__ (https://stackoverflow.com/questions/4352244/should-ne-be-implemented-as-the-negation-of-eq)

@@ -91,6 +94,9 @@ def __eq__(self, other):
else:
return NotImplemented

def __ne__(self, other) -> bool:
return not self.__eq__(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return not self.__eq__(other)
return not self == other

@MarcoGorelli MarcoGorelli mentioned this pull request Nov 7, 2022
5 tasks
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me pending approval from Will and Joris, thanks @Moisan !

@mroeschke mroeschke merged commit 694c0cd into pandas-dev:main Nov 17, 2022
@mroeschke
Copy link
Member

Awesome, thanks again @Moisan

@Moisan Moisan deleted the pylint-unneeded-not branch November 17, 2022 20:36
MarcoGorelli pushed a commit to MarcoGorelli/pandas that referenced this pull request Nov 18, 2022
* STYLE: fix pylint unneeded-not warnings

* fixup! STYLE: fix pylint unneeded-not warnings

* Add __ne__ method to ArrowPeriodType and ArrowIntervalType

* fixup! Add __ne__ method to ArrowPeriodType and ArrowIntervalType
mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
* STYLE: fix pylint unneeded-not warnings

* fixup! STYLE: fix pylint unneeded-not warnings

* Add __ne__ method to ArrowPeriodType and ArrowIntervalType

* fixup! Add __ne__ method to ArrowPeriodType and ArrowIntervalType
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants