ENH: add handling of NaN in RuntimeWarning except clause #10140

oleksandr-pavlyk · 2019-05-06T13:35:50Z

In Intel distribution for Python numpy.log raises RuntimeWarning
when input contains np.nan, and the gstd in that case return None.
This happens on MacOSX.

import numpy as np
from scipy.stats import gstd

a = np.array([[1, 1, 1, 16], [np.nan, 1, 2, 3]])
print(gstd(a, axis=1)) # prints None, instead of np.array([4., np.nan])

Also when it does, the np.less_equal throws another warning when processing np.nan:

In [4]: gstd(np.array([[1, 1, 1, 16], [np.nan, 1, 2, 3]]), axis=1)
env_root/t_scipy_rc/bin/ipython:23: RuntimeWarning: invalid value encountered in less_equal
env_root/t_scipy_rc/bin/ipython:31: RuntimeWarning: invalid value encountered in log

WarrenWeckesser · 2019-05-06T13:52:21Z

Thanks! But in scipy functions, we generally don't ignore nan by default, and doing so here creates an inconsistent API. The correct behavior here is that, for example, gstd([1, 2, np.nan]) returns nan.

We have been adding a parameter called nan_policy to many of the functions to control how nan is handled. nan is ignored when nan_policy is set to "omit".

oleksandr-pavlyk · 2019-05-06T13:56:59Z

@WarrenWeckesser Sure, my changes are not ignoring them, simply make TestGeometricStandardDeviation::test_propogates_nan_values pass.

WarrenWeckesser · 2019-05-06T14:08:53Z

@oleksandr-pavlyk, ah right. I read the change too quickly.

pvanmulbregt

I think that the treatment of nan should go after the treatment of less than O and the check on ddof. Otherwise the Type of Exception returned for negative data depends on whether a nan is also present in the data.

Kai-Striega · 2019-05-09T12:43:29Z

@oleksandr-pavlyk thanks for taking the time to bring this up.

The warning occurs in np.less_equal(a, 0), so if we are to avoid the warning we must check for nan before it is called. @pvanmulbregt is correct that doing the nan check before will introduce new issues w.r.t. negatives/ddof - so I agree that's probably not the best way forward.

np.less_equal's handling of nan values is an open issue on NumPy (see: numpy/numpy#8945) where the warning is treated as a bug. Looking at the referenced issues it seems to be platform/version dependent and difficult to reproduce. I'm not really sure how we should go forward from here.

Also possibly related: numpy/numpy#9513, numpy/numpy#8230

In Intel distribution for Python numpy.log raises RuntimeWarning when input contains np.nan, and the gstd in that case return None. ``` import numpy as np from scipy.stats import gstd a = np.array([[1, 1, 1, 16], [np.nan, 1, 2, 3]]) print(gstd(a, axis=1)) # prints None, instead of np.array([4., np.nan]) ```

Also, the last clause (warning is harmless) was returning Null (I suspect this was unintentional), so I added return of actual computation.

oleksandr-pavlyk · 2019-06-26T14:44:03Z

@Kai-Striega @pvanmulbregt I have made a change to address your concerns. Please look it over.

Kai-Striega · 2019-06-28T11:10:26Z

scipy/stats/stats.py

+        a_nan = np.isnan(a)
+        a_nan_any = a_nan.any()
+        if a_nan.any():
+            return np.exp(np.std(log(a, where=~a_nan), axis=axis, ddof=ddof))


The check for nan should go after the less equal 0 check. Otherwise the type of exception for negative data will depend on whether a nan is also present.

Kai-Striega · 2019-06-28T11:18:08Z

scipy/stats/stats.py

+        a_nan_any = a_nan.any()
+        if a_nan.any():
+            return np.exp(np.std(log(a, where=~a_nan), axis=axis, ddof=ddof))
+        elif ((a_nan_any and np.less_equal(a[~a_nan], 0).any()) or


With the above if statement a_nan_any will always be False. So the statement becomes:

(False and np.less_equal(a[~a_nan], 0).any()) or (not False and np.less_equal(a, 0).any()
<==> (False) or (np.less_equal(a, 0).any())
<==> np.less_equal(a, 0).any().

Without the above if statement it looks like a good solution.

I am not sure I follow. The conditional reads:

( (a_nan_any and np.less_equal(a[~a_nan], 0).any() ) or (not a_nan_any and np.less_equal(a, 0).any() ) )

If a_nan_any is True, we apply mask selector and only check non-nan numbers with zero.

if a_nan_any is False, we avoid applying the mask, and associated extra copying, and compare all elements with zero.

I thought, however, that I had removed the previous if statement, but evidently not.

I pushed the change deleting the leading if branch.

I could have made that clearer.

The code goes something like this:

a_nan = np.isnan(a) a_nan_any = a_nan.any() if a_nan.any(): # do stuff.... # If we get to here a_nan.any() is False. # As a_nan_any = a_nan.any(), a_nan_any will also be False. elif ((a_nan_any and np.less_equal(a[~a_nan], 0).any()) or (not a_nan_any and np.less_equal(a, 0).any())): # handle negativity check

If a_nan.any() is True the first block is run, so, to reach the second statement, a_nan.any() must be False. As a_nan_any = a_nan.any(), a_nan_any must also be False, otherwise the second would not be reached.

Yes, thank you. I realized that moments after I posted the comment, and pushes changes removing the 'if a_nan_any:` branch of the if statement.

Kai-Striega · 2019-06-28T11:34:46Z

scipy/stats/stats.py

            raise ValueError(
                'Non positive value encountered. The geometric standard '
                'deviation is defined for strictly positive values only.')
        elif 'Degrees of freedom <= 0 for slice' == str(w):
            raise ValueError(w)
        else:
            #  Remaining warnings don't need to be exceptions.
-            warnings.warn(w)
+            return np.exp(np.std(log(a, where=~a_nan), axis=axis, ddof=ddof))


Just a quick confirmation; this avoids log dealing with the nan value that caused the spurious warning we're trying to avoid.

Yes.

Note that in current master have reached this branch we reissue the warning and return None, which is unexpected. I was expecting the result of gstd instead.

With this change, with or without use of where, an actual result would be returned.

Kai-Striega · 2019-06-29T03:45:43Z

@oleksandr-pavlyk I'm happy with the PR now.

@pvanmulbregt would you please take a look at the PR again?

pvanmulbregt · 2019-06-29T22:04:05Z

scipy/stats/stats.py

+        a_nan = np.isnan(a)
+        a_nan_any = a_nan.any()
+        if ((a_nan_any and np.less_equal(a[~a_nan], 0).any()) or
+              (not a_nan_any and np.less_equal(a, 0).any())):


Do the two cases (a contains NANs, a does not contain NANs) need to be treated separately in the if clause? I.e. Is this following snippet equivalent? (It also may remove the need for the a_nan_any variable)

a_nan = np.isnan(a) if np.less_equal(a[~a_nan], 0).any(): raise ValueError( 'Non positive value encountered. The geometric standard ' 'deviation is defined for strictly positive values only.')

Yes, for efficiency sake.

In [1]: import numpy as np In [3]: a = np.take([1.0, 2.0, 3.0], np.random.choice(3, 10**6)) In [4]: %timeit np.less(a, 0).any() 418 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [6]: a_nan = np.isnan(a) In [7]: %timeit np.less(a[~a_nan], 0).any() 2.86 ms ± 89.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

If there are no nans present, we do not want to run mask selection.

Thanks. The difference in time is small for a with a thousand elements but is quite substantial by the time a has a million elements. Can you add a comment in the code indicating that's why the calculation s being structured in this manner?

It is actually even better to avoid computing np.less(a[~a_nan], 0) at all, and do
np.less(np.nanmin(a), 0) instead:

In [6]: a = np.take([1.0, np.nan, 3.0], np.random.choice(3, 10**6)) In [7]: %timeit np.less(np.nanmin(a), 0) 4.75 ms ± 9.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [8]: a_nan = np.isnan(a) In [9]: %timeit np.less(a[~a_nan], 0).any() 8.23 ms ± 7.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [10]: (np.less(a[~a_nan], 0).any(), np.less(np.nanmin(a), 0)) Out[10]: (False, False)

It np.nanmin is a little slower than bare np.less for array of normal reals:

In [11]: a = np.take([1.0, 2.0, 3.0], np.random.choice(3, 10**6)) In [12]: %timeit np.less(np.nanmin(a), 0) 777 µs ± 922 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [13]: %timeit np.less(a, 0).any() 429 µs ± 363 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I like that.
np.less_equal(a, 0).any() warns if it encounters np.nan, and expecting it less_equal to correctly decide that np.nan < 0 is False may be asking a bit much on all platforms.
np.nanmin(a) avoids the warning. It's definitely slower than np.min(a) so separating the two paths seems reasonable, even in a catch Exception block, as an array containing np.nan may be quite a common occurrence.
[This does raise the question as to what arrays might generate a RuntimeWarning but don't contain negative or nan values and hence get called a second time at the end of the catch block, with the same arguments as the first time.]

…nan], 0).any() since the former is about twice faster

Kai-Striega · 2019-07-02T03:53:56Z

As @pvanmulbregt and I have approved the changes. I'm happy to merge this unless anyone voices any other objections.

Kai-Striega · 2019-07-03T13:48:18Z

Merged. Thanks, @oleksandr-pavlyk.

pvanmulbregt added the scipy.stats label May 9, 2019

pvanmulbregt requested changes May 9, 2019

View reviewed changes

oleksandr-pavlyk added 2 commits June 26, 2019 09:24

Moved checks for nan inside checks for less(_, 0)

34d7ec6

Also, the last clause (warning is harmless) was returning Null (I suspect this was unintentional), so I added return of actual computation.

oleksandr-pavlyk force-pushed the inspect-gstd branch from 44bb4ba to 34d7ec6 Compare June 26, 2019 14:37

Kai-Striega requested changes Jun 28, 2019

View reviewed changes

removed the early exit if is True

3e69d7f

Kai-Striega approved these changes Jun 29, 2019

View reviewed changes

pvanmulbregt reviewed Jun 29, 2019

View reviewed changes

added comment, used np.less(np.nanmin(a), 0) instead of np.less(a[~a_…

3c0ea08

…nan], 0).any() since the former is about twice faster

pvanmulbregt approved these changes Jul 2, 2019

View reviewed changes

Kai-Striega merged commit 1238d2d into scipy:master Jul 3, 2019

pv added this to the 1.4.0 milestone Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add handling of NaN in RuntimeWarning except clause #10140

ENH: add handling of NaN in RuntimeWarning except clause #10140

oleksandr-pavlyk commented May 6, 2019

WarrenWeckesser commented May 6, 2019

oleksandr-pavlyk commented May 6, 2019

WarrenWeckesser commented May 6, 2019

pvanmulbregt left a comment

Kai-Striega commented May 9, 2019 •

edited

Loading

oleksandr-pavlyk commented Jun 26, 2019

Kai-Striega Jun 28, 2019

Kai-Striega Jun 28, 2019

oleksandr-pavlyk Jun 28, 2019

oleksandr-pavlyk Jun 28, 2019

Kai-Striega Jun 28, 2019

oleksandr-pavlyk Jun 28, 2019

Kai-Striega Jun 28, 2019

oleksandr-pavlyk Jun 28, 2019

Kai-Striega commented Jun 29, 2019

pvanmulbregt Jun 29, 2019

oleksandr-pavlyk Jun 30, 2019

pvanmulbregt Jun 30, 2019

oleksandr-pavlyk Jul 1, 2019

pvanmulbregt Jul 2, 2019

Kai-Striega commented Jul 2, 2019

Kai-Striega commented Jul 3, 2019

ENH: add handling of NaN in RuntimeWarning except clause #10140

ENH: add handling of NaN in RuntimeWarning except clause #10140

Conversation

oleksandr-pavlyk commented May 6, 2019

WarrenWeckesser commented May 6, 2019

oleksandr-pavlyk commented May 6, 2019

WarrenWeckesser commented May 6, 2019

pvanmulbregt left a comment

Choose a reason for hiding this comment

Kai-Striega commented May 9, 2019 • edited Loading

oleksandr-pavlyk commented Jun 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kai-Striega commented Jun 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kai-Striega commented Jul 2, 2019

Kai-Striega commented Jul 3, 2019

Kai-Striega commented May 9, 2019 •

edited

Loading