Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: nansum for all nan arrays #22764

Open
ustcfdm opened this issue Dec 9, 2022 · 2 comments
Open

ENH: nansum for all nan arrays #22764

ustcfdm opened this issue Dec 9, 2022 · 2 comments

Comments

@ustcfdm
Copy link

ustcfdm commented Dec 9, 2022

Proposed new feature or change:

For the function of nansum, when all elements are nan, it returns 0. However, in some cases I want it return 'nan'. Is it possible to add an option to let user determine which value (0 or nan) to return? For example:

a = np.array([[np.nan, np.nan], [1,np.nan]])

np.nansum(a, axis=1)    # returns array([0, 1])

# This is what I want
np.nansum(a, axis=1, keep_nan=True)    # return array([nan, 1])

The reason that I need this is because when I process image data, I need to distinguish pixels with value 0 and value nan. "Zero" means the pixel has a meaningful value, which is 0; "nan" means the pixel has a meaningless value.

I noticed that there are alrealy some disscussions about this (#6549), i.e. whether should return 0 or nan. Since it is controversial, I suggest given users an option to determine which one they want.

@LeonaTaric
Copy link
Contributor

LeonaTaric commented Dec 12, 2022

Now, you can do like this

>>> np.where(np.all(np.isnan(a), axis=1), np.nan, np.nansum(a, axis=1))
array([nan,  1.])

And I think it's a good idea to add a boolean flag to choose.

@seberg
Copy link
Member

seberg commented Dec 12, 2022

I am not too deep into the nan functions. From a general perspective, for sum etc. (the non-NaN versions), what would make most sense to me would be default= which would complement/replace initial= with a slightly different meaning.

default= would only be used if there are no values along the axes. It would be tricky to implement it for where=, though. On first sight such an addition seems fine, but possibly very niche.
For example mean() naturally "distinguishes" since it divides by the number of non-NaN values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants