Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy.nanmean() does not skip nan±… or …±nan #59

Open
ricleal opened this issue Sep 7, 2016 · 6 comments
Open

numpy.nanmean() does not skip nan±… or …±nan #59

ricleal opened this issue Sep 7, 2016 · 6 comments

Comments

@ricleal
Copy link

ricleal commented Sep 7, 2016

Hello!

First of all, great piece of work! It's saving me a lot of time :)

I'm having issues with numpy.nanmean that should ignore nan values when calculating the mean.

Here some test code:

from uncertainties import unumpy
import numpy as np
v = np.arange(16,dtype=np.float64)
e = np.sqrt(v)
v[1:3] = np.nan
print(v)
print(np.isnan(v[1:3]))
un = unumpy.uarray(v,e)
print(un)
print(un.mean())
print(np.nanmean(un))
print(v.mean())
print(np.nanmean(v))

Here the output:

[  0.  nan  nan   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.]
[ True  True]
[0.0+/-0 nan+/-1.0 nan+/-1.4142135623730951 3.0+/-1.7320508075688772
 4.0+/-2.0 5.0+/-2.23606797749979 6.0+/-2.449489742783178
 7.0+/-2.6457513110645907 8.0+/-2.8284271247461903 9.0+/-3.0
 10.0+/-3.1622776601683795 11.0+/-3.3166247903554 12.0+/-3.4641016151377544
 13.0+/-3.605551275463989 14.0+/-3.7416573867739413
 15.0+/-3.872983346207417]
nan+/-0.6846531968814576
nan+/-0.6846531968814576
nan
8.35714285714

From the output, you can see that both mean and nanmean are returning nan+/-error. I'd say that the later should return the mean ignoring the nan values.

I hope you can help with that!
Thanks

@lebigot
Copy link
Collaborator

lebigot commented Sep 8, 2016

Thanks.

Strictly speaking, this is the expected behavior: nan±… is not nan, and NumPy skips nan (only).

Now, unumpy.isnan() works as you want and could be used as a mask, or for boolean indexing.

I will check whether there is any way to make NumPy understand that nan±… should be treated like nan by nanmean().

@rth
Copy link
Contributor

rth commented Sep 8, 2016

Wouldn't it be preferable to make ufloat(np.nan, 2) return a np.nan directly? As nan+/-2.0 doesn't really make sense anyway (same as 2.0+/-nan)?

@lebigot
Copy link
Collaborator

lebigot commented Sep 8, 2016

The general idea of never producing nan±… but producing nan instead seems reasonable, since we have basically no information on the number (with uncertainty) in question. Implementing this goes beyond changing the creation of nan±… with ufloat(), as they are many other ways of creating a number with uncertainty. I guess that this is quite doable, though. So, something to be implemented, probably.

±inf±… seems like it could be handled in a similar way.

Now, I would have to think about 2±nan a bit more: the nominal value is still relevant (it is the same as in a calculation with uncertainty), and the nan just shows that calculating the uncertainty with linear error propagation theory does not give a good result. The mean of numbers that include this one could thus have a relevant nominal value (with an uncertainty of nan that indicates that the uncertainty is not to be trusted, which is an important piece of information, that does not invalidate the relevance of the nominal value).

@lebigot lebigot changed the title np.nanmean numpy.nanmean() does not skip nan±… or …±nan Feb 14, 2017
@thriveth
Copy link

First, athanks a lot for this extremely useful module!

I have just been playing around with this, and discovered that if I convert all occurrences of nan+/-nan to simply be NaN, and then run np.nanmean(), I get values of nan+/-23.4 etc.

So apparently, there is no way to do a nanmean with uncertainties...?

@lebigot
Copy link
Collaborator

lebigot commented Jan 13, 2018

Thanks!

It is actually possible to a NaN-mean even when you are using uncertainties. With

>>> import uncertainties as unc
>>> from uncertainties import unumpy
>>> import numpy as np

>>> nan = float("nan")
>>> arr = np.array([nan, unc.ufloat(nan, 1), unc.ufloat(1, nan), 2])
>>> arr
array([nan, nan+/-1.0, 1.0+/-nan, 2], dtype=object)

you can get the NaN-mean by selecting only the values with a non-NaN nominal value:

>>> arr[~unumpy.isnan(arr)].mean()
1.5+/-nan

or more directly by asking NumPy to skip them:

>>> np.ma.array(arr, mask=unumpy.isnan(arr))
masked_array(data=[--, --, 1.0+/-nan, 2],
             mask=[ True,  True, False, False],
       fill_value='?',
            dtype=object)
>>> _.mean()
1.5+/-nan

In this case the uncertainty is NaN as it should be, because one of the numbers does have an undefined uncertainty, which makes the final uncertainty undefined (but not the average). In general, uncertainties are not NaN and you obtain the mean of the non-NaN values.

(Edited so as to reflect the fact that the uncertainties module already provides uncertainties.umath.isnan() and uncertainties.unumpy.isnan().

@lebigot
Copy link
Collaborator

lebigot commented Jan 13, 2018

PS: I added all the information (and more) from my post above to the documentation: http://uncertainties-python-package.readthedocs.io/en/latest/genindex.html#N. Thank you for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants