-
-
Notifications
You must be signed in to change notification settings - Fork 19k
BUG: improve future warning for boolean operations with missaligned indexes #62367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.3.x
Are you sure you want to change the base?
Conversation
This change makes the warning more explicit about data types, where it clearly states that are talking about boolean numpy backend or object dtype. Also clearly states that it returns a numpy boolean series.
"Operation between non boolean Series with different " | ||
"indexes will no longer return a boolean result in " | ||
"a future version. Cast both Series to object type " | ||
"Operation between Series with different indexes " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add the nullable books to the check on L6161
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I am misunderstanding something, should I add BooleanDtype()
and "bool[pyarrow]"
to the right.dtype not in
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this would dismiss the warnings for these datatypes, which is not intended. The if
block just contains the warning call.
On v2.3.x
, boolean operations on misaligned indexes are converted to bool
(non-nullable) and this is the main reason for this warning. For example:
import pandas as pd
a = pd.Series([False], index=['a'], dtype=pd.BooleanDtype())
b = pd.Series([True], index=['b'], dtype=pd.BooleanDtype())
# Have warning, convert series to non nullable
print(a & b)
# a False
# b False
# dtype: bool
# Have warning, convert series to non nullable
print(a | b)
# a False
# b False
# dtype: bool
# Have warning, convert series to non nullable
print(a ^ b)
# a False
# b False
# dtype: bool
When indexes are aligned, the data type is kept:
import pandas as pd
a = pd.Series([False, None], index=['a', 'b'], dtype=pd.BooleanDtype())
b = pd.Series([None, True], index=['a', 'b'], dtype=pd.BooleanDtype())
# No warning, keep datatype
print(a & b)
# a False
# b <NA>
# dtype: boolean
# No warning, keep datatype
print(a | b)
# a <NA>
# b True
# dtype: boolean
# No warning, keep datatype
print(a ^ b)
# a <NA>
# b <NA>
# dtype: boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of those cases should be converting to object. For me they don't on main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of those cases should be converting to object
The behavior changed on the main branch, and the warning is specific to v2.3.x
to alert on the change.
On main branch, they no longer convert to object, but on v2.3.x
does.
I really don't think that the behavior should change on a patch or minor release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, didn't see this was going into 2.3.x, never mind.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Note
This change is for v.2.3.3 release
This change makes the warning more explicit about data types, where it states that are talking about boolean NumPy backend and object dtype.
This change just addresses the future warning, doesn't change the behavior to ensure parity between the results of arrow backend and numpy backend. Mainly beause of #62260 (comment):