Throw error on non-finite input for binned_statistic_dd() #10664

rlucas7 · 2019-08-15T03:58:31Z

Reference issue

(partially) Closes gh-9010

What does this implement/fix?

check if input contains non-finite data, if there are np.nan or np.inf throw a ValueError

Additional information

There is a possibility that the dedges.min() here:
https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L583
is a 0.0 value. However, I'm been actively trying to a couple hours to make data on that scenario and it seems rather difficult. If someone has a reproducing example I'm happy to add that to PR but at this time I cannot seem to generate a reproducing example for this scenario. Opening this PR sooner to get the change out in front of people in case they have suggestions or another edge case to add.

rlucas7 · 2019-08-15T15:00:03Z

it looks like travis builds failed for python 3.5 and 3.6, two of them look like flaky test failures. One is showing the build failing the test that I add in this PR. This test is passing on my local machine not sure what the issue is here.

scipy/stats/tests/test_binned_statistic.py

rlucas7 · 2019-08-22T03:36:08Z

@peterbell10 I got a chance to look at the 2 failing builds tonight. Can you read my comment here and let me know if this makes sense to you? (Please let me know if you have a better idea).

One of the 2 looks like something flaky that might resolve on a restart. However, the second one is a failing doctest in the doc for _binned_statistic_dd. When I try to run on my laptop I also get a failure (abort trap 6), w/a long stack trace. It looks like there may be some error in the matplotlib plotting in the docstring and some doctests not passing because of abort trap 6 issues. Here is stacktrace:

(scipy-dev) Lucass-MacBook:scipy rlucas$ python -m doctest scipy/stats/_binned_statistic.py
**********************************************************************
File "scipy/stats/_binned_statistic.py", line 108, in _binned_statistic.binned_statistic
Failed example:
    stats.binned_statistic([1, 1, 2, 5, 7], values, 'sum', bins=2)
Expected:
    (array([ 4. ,  4.5]), array([ 1.,  4.,  7.]), array([1, 1, 1, 2, 2]))
Got:
    BinnedStatisticResult(statistic=array([4. , 4.5]), bin_edges=array([1., 4., 7.]), binnumber=array([1, 1, 1, 2, 2]))
**********************************************************************
File "scipy/stats/_binned_statistic.py", line 115, in _binned_statistic.binned_statistic
Failed example:
    stats.binned_statistic([1, 1, 2, 5, 7], values, 'sum', bins=2)
Expected:
    (array([[ 4. ,  4.5], [ 8. ,  9. ]]), array([ 1.,  4.,  7.]),
        array([1, 1, 1, 2, 2]))
Got:
    BinnedStatisticResult(statistic=array([[4. , 4.5],
           [8. , 9. ]]), bin_edges=array([1., 4., 7.]), binnumber=array([1, 1, 1, 2, 2]))
**********************************************************************
File "scipy/stats/_binned_statistic.py", line 119, in _binned_statistic.binned_statistic
Failed example:
    stats.binned_statistic([1, 2, 1, 2, 4], np.arange(5), statistic='mean',
                           bins=3)
Expected:
    (array([ 1.,  2.,  4.]), array([ 1.,  2.,  3.,  4.]),
        array([1, 2, 1, 2, 3]))
Got:
    BinnedStatisticResult(statistic=array([1., 2., 4.]), bin_edges=array([1., 2., 3., 4.]), binnumber=array([1, 2, 1, 2, 3]))
2019-08-21 23:15:29.732 python[19348:1050799] -[NSApplication _setup:]: unrecognized selector sent to instance 0x7fbecb8f2280
2019-08-21 23:15:29.736 python[19348:1050799] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSApplication _setup:]: unrecognized selector sent to instance 0x7fbecb8f2280'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff32d4132b __exceptionPreprocess + 171
	1   libobjc.A.dylib                     0x00007fff5a3cac76 objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff32dd9e04 -[NSObject(NSObject) doesNotRecognizeSelector:] + 132
	3   CoreFoundation                      0x00007fff32cb7870 ___forwarding___ + 1456
	4   CoreFoundation                      0x00007fff32cb7238 _CF_forwarding_prep_0 + 120
	5   libtk8.6.dylib                      0x000000011ffc231d TkpInit + 413
	6   libtk8.6.dylib                      0x000000011ff1a17e Initialize + 2622
	7   _tkinter.cpython-36m-darwin.so      0x000000011d3c9a16 _tkinter_create + 1174
	8   python                              0x000000010aea1088 _PyCFunction_FastCallDict + 200
	9   python                              0x000000010af77f4f call_function + 143
	10  python                              0x000000010af75abf _PyEval_EvalFrameDefault + 46847
	11  python                              0x000000010af69209 _PyEval_EvalCodeWithName + 425
	12  python                              0x000000010af78b1c _PyFunction_FastCallDict + 364
	13  python                              0x000000010ae1f8b0 _PyObject_FastCallDict + 320
	14  python                              0x000000010ae46fe8 method_call + 136
	15  python                              0x000000010ae26efe PyObject_Call + 62
	16  python                              0x000000010aec8385 slot_tp_init + 117
	17  python                              0x000000010aecc8c1 type_call + 241
	18  python                              0x000000010ae1f821 _PyObject_FastCallDict + 177
	19  python                              0x000000010ae27a67 _PyObject_FastCallKeywords + 327
	20  python                              0x000000010af78048 call_function + 392
	21  python                              0x000000010af75b6f _PyEval_EvalFrameDefault + 47023
	22  python                              0x000000010af7830c fast_function + 188
	23  python                              0x000000010af77fac call_function + 236
	24  python                              0x000000010af75abf _PyEval_EvalFrameDefault + 46847
	25  python                              0x000000010af69209 _PyEval_EvalCodeWithName + 425
	26  python                              0x000000010af78b1c _PyFunction_FastCallDict + 364
	27  python                              0x000000010ae1f8b0 _PyObject_FastCallDict + 320
	28  python                              0x000000010ae46fe8 method_call + 136
	29  python                              0x000000010ae26efe PyObject_Call + 62
	30  python                              0x000000010af75cc0 _PyEval_EvalFrameDefault + 47360
	31  python                              0x000000010af69209 _PyEval_EvalCodeWithName + 425
	32  python                              0x000000010af783ba fast_function + 362
...
	66  python                              0x000000010afe8946 RunModule + 182
	67  python                              0x000000010afe7ad4 Py_Main + 3076
	68  python                              0x000000010ae17929 main + 313
	69  libdyld.dylib                       0x00007fff5afe4015 start + 1
	70  ???                                 0x0000000000000004 0x0 + 4
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Abort trap: 6
(scipy-dev) Lucass-MacBook:scipy rlucas$

The way I was able to get it to pass (locally) was to pass skip directives to the plotting commands. Example hunk:

tage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? n
@@ -129,11 +126,11 @@ def binned_statistic(x, values, statistic='mean',
     >>> boatspeed = .3 * windspeed**.5 + .2 * np.random.rand(500)
     >>> bin_means, bin_edges, binnumber = stats.binned_statistic(windspeed,
     ...                 boatspeed, statistic='median', bins=[1,2,3,4,5,6,7])
-    >>> plt.figure()
-    >>> plt.plot(windspeed, boatspeed, 'b.', label='raw data')
+    >>> plt.figure() # doctest: +SKIP
+    >>> plt.plot(windspeed, boatspeed, 'b.', label='raw data') # doctest: +SKIP
     >>> plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], colors='g', lw=5,
-    ...            label='binned statistic of data')
-    >>> plt.legend()
+    ...            label='binned statistic of data') # doctest: +SKIP
+    >>> plt.legend() # doctest: +SKIP
 
     Now we can use ``binnumber`` to select all datapoints with a windspeed
     below 1:
Stage this hunk [y,n,q,a,d,K,j,J,g,/,s,e,?]?

I recall -> #9785 tried that-didn't solve the issue.
Also there are several changes in the doctests that have them failing locally. These are mostly because the return type isn't a tuple anymore. The return types are now namedtuple() instances but the doctests weren't updated since 2015 when git blame says that the changes were implemented.

We the travis builds changed recently to run the doctests?

I can change the doctests in this file too but would like a second pair of eyes to confirm what I'm proposing makes sense (and to include in this PR) before I go to make those changes to the namedtuple and the skip directives for the matplotlib plots, unless you have a better solution for the matplotlib issues.

ev-br · 2019-08-22T04:04:16Z

Just copy-paste the doctest output into the doctests. This can in principle be fixed in the refguide-checker, but updating the example is simpler.
(The other fix would be to update the regex https://github.com/scipy/scipy/blob/master/tools/refguide_check.py#L596 to make understand multiple arrays in namedtuples)

rlucas7 · 2019-08-22T11:11:34Z

That will fix the issues with the changes output to namedtuple but wil not fix the abort trap 6 issue with matplotlib.

…

On Aug 22, 2019, at 12:04 AM, Evgeni Burovski ***@***.***> wrote: Just copy-paste the doctest output into the doctests. This can in principle be fixed in the refguide-checker, but updating the example is simpler. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rlucas7 · 2019-08-23T02:58:13Z

Just copy-paste the doctest output into the doctests.

I did that and this fixed the namedtuple output changes. Updating my conda virtual env via:
conda update matplotlib brought me from:

The following packages will be UPDATED:

  libpng                                  1.6.35-ha441bb4_0 --> 1.6.37-ha441bb4_0
  matplotlib                           3.0.2-py36h54f8f79_0 --> 3.1.0-py36h54f8f79_0

which resolved the abort trap 6 error. However, when I omit the # doctest: +SKIP directives (locally) on lines that have matplotlib plotting commands the doctests still fail. I'm not sure if this is an issue with my development env being improperly configured or if something else. Any guidance on the dev setup for the doctests to pass w/matplotlib plots is appreciated.

I'll push the changes to the PR omitting the skip directive and see if that fixes the issue in the CI build.

rlucas7 · 2019-08-23T18:30:34Z

@ev-br thanks for your help. I adjusted docstring/doctests and seems to resolve the build issue(s).
the matplotlib abort trap 6 issue was resolved by updating to 3.1.0 in conda virtual env

rgommers · 2019-08-25T19:47:25Z

However, when I omit the # doctest: +SKIP directives (locally) on lines that have matplotlib plotting commands the doctests still fail. I'm not sure if this is an issue with my development env being improperly configured or if something else. Any guidance on the dev setup for the doctests to pass w/matplotlib plots is appreciated.

I ran into an issue as well with latest matplotlib and binned_statistic_dd, see gh-10720. Probably doesn't help for you looking at the above failures, but something is fishy it seems.

rlucas7 · 2019-08-27T00:11:53Z

Thanks Ralf, I don’t think it helps either, good to know though thanks. I see the test harness in #10724 is getting a doc build/test option. I’ll update my local repo and see if that fixes the issue. Given that the Travis builds are fine I’m guessing this is some sort of local setup that is mismatched with Travis.

…

On Aug 25, 2019, at 3:47 PM, Ralf Gommers ***@***.***> wrote: However, when I omit the # doctest: +SKIP directives (locally) on lines that have matplotlib plotting commands the doctests still fail. I'm not sure if this is an issue with my development env being improperly configured or if something else. Any guidance on the dev setup for the doctests to pass w/matplotlib plots is appreciated. I ran into an issue as well with latest matplotlib and binned_statistic_dd, see gh-10720. Probably doesn't help for you looking at the above failures, but something is fishy it seems. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rgommers · 2019-10-21T11:38:29Z

this needs a rebase now

MAINT: throw value error in binned_statistic_dd for non-finite DOC: update doctest examples with namedtuple returns

MAINT: throw value error in binned_statistic_dd for non-finite DOC: update doctest examples with namedtuple returns MAINT: throw value error in binned_statistic_dd for non-finite DOC: update doctest examples with namedtuple returns

rlucas7 · 2019-10-22T01:11:45Z

Always a little scary doing a rebase :)
@rgommers I rebased off 81994b1d46ab5ec8fa51f26d8e3de754bf7fe900 (current tip of master) let's see if there are any failing tests.

rlucas7 · 2019-10-22T03:23:59Z

all the travis builds succeeded except 1 and the lone failure is an unrelated error when one of the underlying C lang libraries is free()ing allocated memory.

Error starts here:
https://travis-ci.org/scipy/scipy/jobs/601043434#L985

rgommers · 2019-10-26T20:11:56Z

LGTM now, merged. Thanks @rlucas7

rlucas7 · 2020-01-15T02:02:07Z

@courcelm I'm just seeing your post now and there is an opened issue related to this PR that was just opened. Can you check #11365 and if your issue is not the same one, open an issue with a reproducible example of what you error you're getting?

Once I have an error message I can take a look but github doesn't notify me about comments on merged PRs.

WarrenWeckesser · 2020-03-10T20:27:56Z

@rlucas7, #11382 fixes one problem, but the issue reported by @courcelm is different. The docstrings for binned_statistic_2d and binned_statistic_dd explain that when statistic is 'count', the values argument is not referenced. This PR broke that promise, because it now always references values to check for finiteness.

The finiteness check should be changed to something like

    if statistic != 'count' and not np.isfinite(values).all():
        raise ValueError('The 'values' argument contains non-finite values.')
    if not np.isfinite(sample).all():
        raise ValueError('The 'sample' argument contains non-finite values.')

WarrenWeckesser · 2020-03-10T20:32:47Z

For the record: the problem was reported in a stackoverflow question: https://stackoverflow.com/questions/60623899/why-is-binned-statistic-2d-now-throwing-typeerror

WarrenWeckesser · 2020-03-10T20:38:10Z

scipy/stats/_binned_statistic.py

@@ -310,10 +309,10 @@ def binned_statistic_2d(x, y, values, statistic='mean',
    >>> y = [2.1, 2.6, 2.1, 2.1]
    >>> binx = [0.0, 0.5, 1.0]
    >>> biny = [2.0, 2.5, 3.0]
-    >>> ret = stats.binned_statistic_2d(x, y, None, 'count', bins=[binx,biny])
+    >>> ret = stats.binned_statistic_2d(x, y, x, 'count', bins=[binx,biny])


This change should not have been necessary, and should be undone when the issue noted in my comment is fixed.

yes. I'll add a change to revert this one from x -> None

rlucas7 · 2020-03-10T22:01:24Z

@rlucas7, #11382 fixes one problem, but the issue reported by @courcelm is different. The docstrings for binned_statistic_2d and binned_statistic_dd explain that when statistic is 'count', the values argument is not referenced. This PR broke that promise, because it now always references values to check for finiteness.

The finiteness check should be changed to something like
    if statistic != 'count' and not np.isfinite(values).all():
        raise ValueError('The 'values' argument contains non-finite values.')
    if not np.isfinite(sample).all():
        raise ValueError('The 'sample' argument contains non-finite values.')

I think the issue is fixed in the #11382 PR, the check no longer references values and the PR fixes the underlying issue if number of bins is specified.

TST: add a nan test case for the backport fix MAINT: revert doctest changes from scipygh-10664

…es or samples (#11382) * MAINT: remove error throw on non-finite values or samples TST: add a nan test case for the backport fix MAINT: revert doctest changes from gh-10664 * TST: add a test for numpy int64 type * MAINT: add support for non python based int types * any int type that does not come directly from python but implements __index__ needs to be caught with the isinstance(bins, int) logic but would not. This changeset adds support for other int types.

rlucas7 added maintenance Items related to regular maintenance tasks scipy.stats labels Aug 15, 2019

peterbell10 reviewed Aug 15, 2019

View reviewed changes

scipy/stats/tests/test_binned_statistic.py Outdated Show resolved Hide resolved

rlucas7 added 2 commits October 21, 2019 20:55

TST: add test for non-finite values input to binned_statistic_dd()

c509c0f

MAINT: throw value error in binned_statistic_dd for non-finite DOC: update doctest examples with namedtuple returns

rlucas7 force-pushed the gh_issue_9010 branch from 2527efb to fe045bd Compare October 22, 2019 01:08

rgommers added this to the 1.4.0 milestone Oct 26, 2019

rgommers merged commit ba207e9 into scipy:master Oct 26, 2019

barentsen mentioned this pull request Jan 15, 2020

scipy.stats.binned_statistic regressed in v1.4.0 #11365

Closed

rlucas7 mentioned this pull request Jan 17, 2020

MAINT: remove error throw in binned_statistic_dd() on non-finite values or samples #11382

Merged

WarrenWeckesser reviewed Mar 10, 2020

View reviewed changes

rlucas7 added a commit to rlucas7/scipy that referenced this pull request Mar 11, 2020

MAINT: revert doctest changes from scipygh-10664

26f9619

rlucas7 added a commit to rlucas7/scipy that referenced this pull request Mar 11, 2020

MAINT: remove error throw on non-finite values or samples

3275865

TST: add a nan test case for the backport fix MAINT: revert doctest changes from scipygh-10664

rlucas7 added a commit to rlucas7/scipy that referenced this pull request Mar 20, 2020

MAINT: remove error throw on non-finite values or samples

363d377

TST: add a nan test case for the backport fix MAINT: revert doctest changes from scipygh-10664

rlucas7 added a commit to rlucas7/scipy that referenced this pull request May 10, 2020

MAINT: remove error throw on non-finite values or samples

2201703

TST: add a nan test case for the backport fix MAINT: revert doctest changes from scipygh-10664

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throw error on non-finite input for binned_statistic_dd() #10664

Throw error on non-finite input for binned_statistic_dd() #10664

rlucas7 commented Aug 15, 2019

rlucas7 commented Aug 15, 2019

rlucas7 commented Aug 22, 2019 •

edited

ev-br commented Aug 22, 2019 •

edited

rlucas7 commented Aug 22, 2019 via email

rlucas7 commented Aug 23, 2019

rlucas7 commented Aug 23, 2019

rgommers commented Aug 25, 2019

rlucas7 commented Aug 27, 2019 via email

rgommers commented Oct 21, 2019

rlucas7 commented Oct 22, 2019

rlucas7 commented Oct 22, 2019

rgommers commented Oct 26, 2019

rlucas7 commented Jan 15, 2020

WarrenWeckesser commented Mar 10, 2020

WarrenWeckesser commented Mar 10, 2020

WarrenWeckesser Mar 10, 2020

rlucas7 Mar 10, 2020

rlucas7 commented Mar 10, 2020

Throw error on non-finite input for binned_statistic_dd() #10664

Throw error on non-finite input for binned_statistic_dd() #10664

Conversation

rlucas7 commented Aug 15, 2019

Reference issue

What does this implement/fix?

Additional information

rlucas7 commented Aug 15, 2019

rlucas7 commented Aug 22, 2019 • edited

ev-br commented Aug 22, 2019 • edited

rlucas7 commented Aug 22, 2019 via email

rlucas7 commented Aug 23, 2019

rlucas7 commented Aug 23, 2019

rgommers commented Aug 25, 2019

rlucas7 commented Aug 27, 2019 via email

rgommers commented Oct 21, 2019

rlucas7 commented Oct 22, 2019

rlucas7 commented Oct 22, 2019

rgommers commented Oct 26, 2019

rlucas7 commented Jan 15, 2020

WarrenWeckesser commented Mar 10, 2020

WarrenWeckesser commented Mar 10, 2020

WarrenWeckesser Mar 10, 2020

Choose a reason for hiding this comment

rlucas7 Mar 10, 2020

Choose a reason for hiding this comment

rlucas7 commented Mar 10, 2020

rlucas7 commented Aug 22, 2019 •

edited

ev-br commented Aug 22, 2019 •

edited