Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate fill_value argument for df.add #32608

Open
WillAyd opened this issue Mar 11, 2020 · 9 comments
Open

Deprecate fill_value argument for df.add #32608

WillAyd opened this issue Mar 11, 2020 · 9 comments
Labels
Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@WillAyd
Copy link
Member

WillAyd commented Mar 11, 2020

This came up in the submission for #13488

This is a documented argument but strangely not universally implemented for all argument types. Both @jbrockmendel and I agreed it is probably better to deprecate and push users towards using something like what @jreback suggested in #13488 (comment)

@WillAyd WillAyd added the Deprecate Functionality to remove in pandas label Mar 11, 2020
@WillAyd WillAyd added this to the Contributions Welcome milestone Mar 11, 2020
@jorisvandenbossche
Copy link
Member

Some questions / possible counter-arguments:

  • It is implemented for DataFrames (df1.add(df2, fill_value=..). So would we deprecate that as well? (I suppose so, but not fully clear from your description) Then we are deprecating something that works fine and is documented, just because it is not implemented for Series.
  • The alternative that you link to is df.fillna(0).add(s). But, this is actually not fully equivalent (for better or worse, fill_value will keep the ouput NaN if both left and right are NaN)
  • You mention add, but I suppose the same goes for the other flexible binary operators (mul, sub, ..)?

@jorisvandenbossche jorisvandenbossche added the Needs Discussion Requires discussion from core team before further action label Mar 11, 2020
@TomAugspurger
Copy link
Contributor

+1 to all of Joris' questions.

fill_value will keep the ouput NaN if both left and right are NaN)

FWIW, I've found this behavior useful in the past (if not immediately obvious).

@jbrockmendel
Copy link
Member

Strength of my opinion: I agree with Will, but wouldn't have brought it up on my own.

FWIW1: We have a total of 6 tests that pass fill_value to any of the DataFrame flex methods.

FWIW2: the place I've found this most troublesome is with complex dtypes, where numpy's behavior doesnt match the builtin complex. (xref #28050)

It is implemented for DataFrames (df1.add(df2, fill_value=..). So would we deprecate that as well?
You mention add, but I suppose the same goes for the other flexible binary operators (mul, sub, ..)?

I think that is the idea, yes, on both.

fill_value will keep the ouput NaN if both left and right are NaN

Do we have an idea how often this is used? I agree that if a user wants this, there isnt a simple "do X instead".

@WillAyd
Copy link
Member Author

WillAyd commented Mar 11, 2020

  • It is implemented for DataFrames (df1.add(df2, fill_value=..). So would we deprecate that as well? (I suppose so, but not fully clear from your description) Then we are deprecating something that works fine and is documented, just because it is not implemented for Series.

Yea I was thinking across the board deprecation

  • The alternative that you link to is df.fillna(0).add(s). But, this is actually not fully equivalent (for better or worse, fill_value will keep the ouput NaN if both left and right are NaN)

Maybe @TomAugspurger can offer more insight, but what use cases are there for this? Would have never thought to use this as a quasi-XOR fillna

  • You mention add, but I suppose the same goes for the other flexible binary operators (mul, sub, ..)?

Yes I think would make sense to do across the board if we wanted to do at all

@jbrockmendel

This comment has been minimized.

@osvill
Copy link

osvill commented Mar 18, 2020

Do we have an idea how often this is used?

I thought I'd jump in as I'm using this feature a lot at the moment. Maybe it is a very special case, but there it is important if operating (add, sub, mul, div) with two dataframes with same index and columns that the result distinguishes whether both values are NaN (resulting in NaN), only one (use fill_value) or neither.

On this topic I found the following behavior curious:
import pandas as pd
df = pd.DataFrame( {'a': [None, 1.0, 2.0], 'b': [None, None, 4.0]})
#THIS WORKS!
df.sub(df.rename(columns={'a':'b', 'b':'a'}), fill_value=0)
#THIS DOESN'T WORK!
df.sub(df['a'], axis=0, fill_value=0)

I know in the second example I subtract a pd.Series but I find it strange.

@jbrockmendel
Copy link
Member

@WillAyd shoould we add this to the agenda for tomorrow's call?

@WillAyd
Copy link
Member Author

WillAyd commented May 13, 2020 via email

@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Sep 17, 2020
@jbrockmendel
Copy link
Member

Another option is to enable the case that currently raises NotImplementedError (i.e. frame.add(series, fill_value=foo))

One way to do that would be to tile the series to the same shape as the frame and then use the frame path. Easy+cheap for non-EA dtypes. Would be the same for EA with 2D EAs. For EAs we can cobble together a non-copying solution for axis=1, but that doesnt work for axis=0.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

6 participants