Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: DataFrame.__and__ vs Series.__and__ fillna behavior mismatch #28466

Closed
jbrockmendel opened this issue Sep 16, 2019 · 5 comments
Closed

API: DataFrame.__and__ vs Series.__and__ fillna behavior mismatch #28466

jbrockmendel opened this issue Sep 16, 2019 · 5 comments
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jbrockmendel
Copy link
Member

Based on tests.series.test_operators test_logical_ops_df_compat

# GH#1134
s1 = pd.Series([True, False, True], index=list("ABC"))
s2 = pd.Series([True, True, False], index=list("ABD"))

exp_ser = pd.Series([True, False, False, False], index=list("ABCD"))
res_ser = s1 & s2
tm.assert_series_equal(res_ser, exp_ser)

# DataFrame doesn't fill nan with False
exp_frame = pd.DataFrame({"x": [True, False, np.nan, np.nan]}, index=list("ABCD"))
res_frame = s1.to_frame() & s2.to_frame()
assert_frame_equal(res_frame, exp_frame)

Is there still a compelling reason to keep this mismatched behavior?

@TomAugspurger
Copy link
Contributor

Which, if any, would you propose to accept?

Filling with NaN is consistent with other ops I suppose.

@jorisvandenbossche jorisvandenbossche added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Sep 17, 2019
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 17, 2019

I don't think there is a good reason to keep the mismatch, but I also don't think there is a clear correct expected result right now, as we basically do not support (or have defined behaviour for) boolean data with missing values.

This relates to the discussion on missing values in #28095 and specifically what the behaviour of NA should be in logical operations.

Also | has inconsistency, and if you follow the "three-valued logic" (mentioned in the issue), both Series and DataFrame behaviour is wrong.

And if you do it on the underlying numpy values, that just raises (that might actually be a safer default for now, making it more easily to choose whatever behaviour later)

In [47]: s1 = pd.Series([True, False, True], index=list("ABC")) 
    ...: s2 = pd.Series([True, True, False], index=list("ABD")) 

In [50]: s1a, s2a = s1.align(s2)  

In [51]: s1a 
Out[51]: 
A     True
B    False
C     True
D      NaN
dtype: object

In [52]: s2a
Out[52]: 
A     True
B     True
C      NaN
D    False
dtype: object


In [53]: s1a & s2a   
Out[53]: 
A     True
B    False
C    False
D    False
dtype: bool

In [54]: s1a.values & s2a.values
...
TypeError: unsupported operand type(s) for &: 'bool' and 'float'

In [59]: s1a | s2a 
Out[59]: 
A     True
B     True
C     True
D    False
dtype: bool

In [60]: pd.DataFrame(s1a) | pd.DataFrame(s2a) 
Out[60]: 
      0
A  True
B  True
C   NaN
D   NaN

This relates

@jorisvandenbossche jorisvandenbossche added API Design Needs Discussion Requires discussion from core team before further action labels Sep 17, 2019
@jbrockmendel
Copy link
Member Author

@jorisvandenbossche are you suggesting anything in particular for the short-term to bring these into alignment?

@jorisvandenbossche
Copy link
Member

Nope .. except for raising an error instead, but that's probably a bit too drastic.

@jbrockmendel jbrockmendel added API - Consistency Internal Consistency of API/Behavior and removed API Design labels Dec 18, 2019
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 11, 2020
@jbrockmendel
Copy link
Member Author

closing as duplicate of #22724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants