numpy.nanmean() does not skip nan±… or …±nan #59

ricleal · 2016-09-07T15:44:19Z

Hello!

First of all, great piece of work! It's saving me a lot of time :)

I'm having issues with numpy.nanmean that should ignore nan values when calculating the mean.

Here some test code:

from uncertainties import unumpy
import numpy as np
v = np.arange(16,dtype=np.float64)
e = np.sqrt(v)
v[1:3] = np.nan
print(v)
print(np.isnan(v[1:3]))
un = unumpy.uarray(v,e)
print(un)
print(un.mean())
print(np.nanmean(un))
print(v.mean())
print(np.nanmean(v))

Here the output:

[  0.  nan  nan   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.]
[ True  True]
[0.0+/-0 nan+/-1.0 nan+/-1.4142135623730951 3.0+/-1.7320508075688772
 4.0+/-2.0 5.0+/-2.23606797749979 6.0+/-2.449489742783178
 7.0+/-2.6457513110645907 8.0+/-2.8284271247461903 9.0+/-3.0
 10.0+/-3.1622776601683795 11.0+/-3.3166247903554 12.0+/-3.4641016151377544
 13.0+/-3.605551275463989 14.0+/-3.7416573867739413
 15.0+/-3.872983346207417]
nan+/-0.6846531968814576
nan+/-0.6846531968814576
nan
8.35714285714

From the output, you can see that both mean and nanmean are returning nan+/-error. I'd say that the later should return the mean ignoring the nan values.

I hope you can help with that!
Thanks

The text was updated successfully, but these errors were encountered:

lebigot · 2016-09-08T08:48:46Z

Thanks.

Strictly speaking, this is the expected behavior: nan±… is not nan, and NumPy skips nan (only).

Now, unumpy.isnan() works as you want and could be used as a mask, or for boolean indexing.

I will check whether there is any way to make NumPy understand that nan±… should be treated like nan by nanmean().

rth · 2016-09-08T09:02:03Z

Wouldn't it be preferable to make ufloat(np.nan, 2) return a np.nan directly? As nan+/-2.0 doesn't really make sense anyway (same as 2.0+/-nan)?

lebigot · 2016-09-08T20:19:07Z

The general idea of never producing nan±… but producing nan instead seems reasonable, since we have basically no information on the number (with uncertainty) in question. Implementing this goes beyond changing the creation of nan±… with ufloat(), as they are many other ways of creating a number with uncertainty. I guess that this is quite doable, though. So, something to be implemented, probably.

±inf±… seems like it could be handled in a similar way.

Now, I would have to think about 2±nan a bit more: the nominal value is still relevant (it is the same as in a calculation with uncertainty), and the nan just shows that calculating the uncertainty with linear error propagation theory does not give a good result. The mean of numbers that include this one could thus have a relevant nominal value (with an uncertainty of nan that indicates that the uncertainty is not to be trusted, which is an important piece of information, that does not invalidate the relevance of the nominal value).

thriveth · 2018-01-11T20:39:38Z

First, athanks a lot for this extremely useful module!

I have just been playing around with this, and discovered that if I convert all occurrences of nan+/-nan to simply be NaN, and then run np.nanmean(), I get values of nan+/-23.4 etc.

So apparently, there is no way to do a nanmean with uncertainties...?

lebigot · 2018-01-13T10:30:01Z

Thanks!

It is actually possible to a NaN-mean even when you are using uncertainties. With

>>> import uncertainties as unc
>>> from uncertainties import unumpy
>>> import numpy as np

>>> nan = float("nan")
>>> arr = np.array([nan, unc.ufloat(nan, 1), unc.ufloat(1, nan), 2])
>>> arr
array([nan, nan+/-1.0, 1.0+/-nan, 2], dtype=object)

you can get the NaN-mean by selecting only the values with a non-NaN nominal value:

>>> arr[~unumpy.isnan(arr)].mean()
1.5+/-nan

or more directly by asking NumPy to skip them:

>>> np.ma.array(arr, mask=unumpy.isnan(arr))
masked_array(data=[--, --, 1.0+/-nan, 2],
             mask=[ True,  True, False, False],
       fill_value='?',
            dtype=object)
>>> _.mean()
1.5+/-nan

In this case the uncertainty is NaN as it should be, because one of the numbers does have an undefined uncertainty, which makes the final uncertainty undefined (but not the average). In general, uncertainties are not NaN and you obtain the mean of the non-NaN values.

(Edited so as to reflect the fact that the uncertainties module already provides uncertainties.umath.isnan() and uncertainties.unumpy.isnan().

lebigot · 2018-01-13T17:35:22Z

PS: I added all the information (and more) from my post above to the documentation: http://uncertainties-python-package.readthedocs.io/en/latest/genindex.html#N. Thank you for your feedback!

lebigot added the NumPy+uncertainties label Sep 10, 2016

lebigot changed the title ~~np.nanmean~~ numpy.nanmean() does not skip nan±… or …±nan Feb 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy.nanmean() does not skip nan±… or …±nan #59

numpy.nanmean() does not skip nan±… or …±nan #59

ricleal commented Sep 7, 2016 •

edited

lebigot commented Sep 8, 2016

rth commented Sep 8, 2016

lebigot commented Sep 8, 2016 •

edited

thriveth commented Jan 11, 2018

lebigot commented Jan 13, 2018 •

edited

lebigot commented Jan 13, 2018

numpy.nanmean() does not skip nan±… or …±nan #59

numpy.nanmean() does not skip nan±… or …±nan #59

Comments

ricleal commented Sep 7, 2016 • edited

lebigot commented Sep 8, 2016

rth commented Sep 8, 2016

lebigot commented Sep 8, 2016 • edited

thriveth commented Jan 11, 2018

lebigot commented Jan 13, 2018 • edited

lebigot commented Jan 13, 2018

ricleal commented Sep 7, 2016 •

edited

lebigot commented Sep 8, 2016 •

edited

lebigot commented Jan 13, 2018 •

edited