Skip to content

Conversation

Alvaro-Kothe
Copy link
Contributor


Note

This change is for v.2.3.3 release

This change makes the warning more explicit about data types, where it states that are talking about boolean NumPy backend and object dtype.


This change just addresses the future warning, doesn't change the behavior to ensure parity between the results of arrow backend and numpy backend. Mainly beause of #62260 (comment):

For the result on main / pandas 3.0, this actually seems correct to me.

This change makes the warning more explicit about data types,
where it clearly states that are talking about boolean numpy backend
or object dtype.

Also clearly states that it returns a numpy boolean series.
"Operation between non boolean Series with different "
"indexes will no longer return a boolean result in "
"a future version. Cast both Series to object type "
"Operation between Series with different indexes "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add the nullable books to the check on L6161

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I am misunderstanding something, should I add BooleanDtype() and "bool[pyarrow]" to the right.dtype not in check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this would dismiss the warnings for these datatypes, which is not intended. The if block just contains the warning call.

On v2.3.x, boolean operations on misaligned indexes are converted to bool (non-nullable) and this is the main reason for this warning. For example:

import pandas as pd

a = pd.Series([False], index=['a'], dtype=pd.BooleanDtype())
b = pd.Series([True], index=['b'], dtype=pd.BooleanDtype())

# Have warning, convert series to non nullable
print(a & b)
# a    False
# b    False
# dtype: bool

# Have warning, convert series to non nullable
print(a | b)
# a    False
# b    False
# dtype: bool

# Have warning, convert series to non nullable
print(a ^ b)
# a    False
# b    False
# dtype: bool

When indexes are aligned, the data type is kept:

import pandas as pd

a = pd.Series([False, None], index=['a', 'b'], dtype=pd.BooleanDtype())
b = pd.Series([None, True], index=['a', 'b'], dtype=pd.BooleanDtype())

# No warning, keep datatype
print(a & b)
# a    False
# b     <NA>
# dtype: boolean

# No warning, keep datatype
print(a | b)
# a    <NA>
# b    True
# dtype: boolean

# No warning, keep datatype
print(a ^ b)
# a    <NA>
# b    <NA>
# dtype: boolean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of those cases should be converting to object. For me they don't on main.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of those cases should be converting to object

The behavior changed on the main branch, and the warning is specific to v2.3.x to alert on the change.

On main branch, they no longer convert to object, but on v2.3.x does.

I really don't think that the behavior should change on a patch or minor release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, didn't see this was going into 2.3.x, never mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants