Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw error on non-finite input for binned_statistic_dd() #10664

Merged
merged 2 commits into from Oct 26, 2019

Conversation

rlucas7
Copy link
Member

@rlucas7 rlucas7 commented Aug 15, 2019

Reference issue

(partially) Closes gh-9010

What does this implement/fix?

check if input contains non-finite data, if there are np.nan or np.inf throw a ValueError

Additional information

There is a possibility that the dedges.min() here:
https://github.com/scipy/scipy/blob/master/scipy/stats/_binned_statistic.py#L583
is a 0.0 value. However, I'm been actively trying to a couple hours to make data on that scenario and it seems rather difficult. If someone has a reproducing example I'm happy to add that to PR but at this time I cannot seem to generate a reproducing example for this scenario. Opening this PR sooner to get the change out in front of people in case they have suggestions or another edge case to add.

@rlucas7 rlucas7 added maintenance Items related to regular maintenance tasks scipy.stats labels Aug 15, 2019
@rlucas7
Copy link
Member Author

rlucas7 commented Aug 15, 2019

it looks like travis builds failed for python 3.5 and 3.6, two of them look like flaky test failures. One is showing the build failing the test that I add in this PR. This test is passing on my local machine not sure what the issue is here.

@rlucas7
Copy link
Member Author

rlucas7 commented Aug 22, 2019

@peterbell10 I got a chance to look at the 2 failing builds tonight. Can you read my comment here and let me know if this makes sense to you? (Please let me know if you have a better idea).

One of the 2 looks like something flaky that might resolve on a restart. However, the second one is a failing doctest in the doc for _binned_statistic_dd. When I try to run on my laptop I also get a failure (abort trap 6), w/a long stack trace. It looks like there may be some error in the matplotlib plotting in the docstring and some doctests not passing because of abort trap 6 issues. Here is stacktrace:

(scipy-dev) Lucass-MacBook:scipy rlucas$ python -m doctest scipy/stats/_binned_statistic.py
**********************************************************************
File "scipy/stats/_binned_statistic.py", line 108, in _binned_statistic.binned_statistic
Failed example:
    stats.binned_statistic([1, 1, 2, 5, 7], values, 'sum', bins=2)
Expected:
    (array([ 4. ,  4.5]), array([ 1.,  4.,  7.]), array([1, 1, 1, 2, 2]))
Got:
    BinnedStatisticResult(statistic=array([4. , 4.5]), bin_edges=array([1., 4., 7.]), binnumber=array([1, 1, 1, 2, 2]))
**********************************************************************
File "scipy/stats/_binned_statistic.py", line 115, in _binned_statistic.binned_statistic
Failed example:
    stats.binned_statistic([1, 1, 2, 5, 7], values, 'sum', bins=2)
Expected:
    (array([[ 4. ,  4.5], [ 8. ,  9. ]]), array([ 1.,  4.,  7.]),
        array([1, 1, 1, 2, 2]))
Got:
    BinnedStatisticResult(statistic=array([[4. , 4.5],
           [8. , 9. ]]), bin_edges=array([1., 4., 7.]), binnumber=array([1, 1, 1, 2, 2]))
**********************************************************************
File "scipy/stats/_binned_statistic.py", line 119, in _binned_statistic.binned_statistic
Failed example:
    stats.binned_statistic([1, 2, 1, 2, 4], np.arange(5), statistic='mean',
                           bins=3)
Expected:
    (array([ 1.,  2.,  4.]), array([ 1.,  2.,  3.,  4.]),
        array([1, 2, 1, 2, 3]))
Got:
    BinnedStatisticResult(statistic=array([1., 2., 4.]), bin_edges=array([1., 2., 3., 4.]), binnumber=array([1, 2, 1, 2, 3]))
2019-08-21 23:15:29.732 python[19348:1050799] -[NSApplication _setup:]: unrecognized selector sent to instance 0x7fbecb8f2280
2019-08-21 23:15:29.736 python[19348:1050799] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSApplication _setup:]: unrecognized selector sent to instance 0x7fbecb8f2280'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff32d4132b __exceptionPreprocess + 171
	1   libobjc.A.dylib                     0x00007fff5a3cac76 objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff32dd9e04 -[NSObject(NSObject) doesNotRecognizeSelector:] + 132
	3   CoreFoundation                      0x00007fff32cb7870 ___forwarding___ + 1456
	4   CoreFoundation                      0x00007fff32cb7238 _CF_forwarding_prep_0 + 120
	5   libtk8.6.dylib                      0x000000011ffc231d TkpInit + 413
	6   libtk8.6.dylib                      0x000000011ff1a17e Initialize + 2622
	7   _tkinter.cpython-36m-darwin.so      0x000000011d3c9a16 _tkinter_create + 1174
	8   python                              0x000000010aea1088 _PyCFunction_FastCallDict + 200
	9   python                              0x000000010af77f4f call_function + 143
	10  python                              0x000000010af75abf _PyEval_EvalFrameDefault + 46847
	11  python                              0x000000010af69209 _PyEval_EvalCodeWithName + 425
	12  python                              0x000000010af78b1c _PyFunction_FastCallDict + 364
	13  python                              0x000000010ae1f8b0 _PyObject_FastCallDict + 320
	14  python                              0x000000010ae46fe8 method_call + 136
	15  python                              0x000000010ae26efe PyObject_Call + 62
	16  python                              0x000000010aec8385 slot_tp_init + 117
	17  python                              0x000000010aecc8c1 type_call + 241
	18  python                              0x000000010ae1f821 _PyObject_FastCallDict + 177
	19  python                              0x000000010ae27a67 _PyObject_FastCallKeywords + 327
	20  python                              0x000000010af78048 call_function + 392
	21  python                              0x000000010af75b6f _PyEval_EvalFrameDefault + 47023
	22  python                              0x000000010af7830c fast_function + 188
	23  python                              0x000000010af77fac call_function + 236
	24  python                              0x000000010af75abf _PyEval_EvalFrameDefault + 46847
	25  python                              0x000000010af69209 _PyEval_EvalCodeWithName + 425
	26  python                              0x000000010af78b1c _PyFunction_FastCallDict + 364
	27  python                              0x000000010ae1f8b0 _PyObject_FastCallDict + 320
	28  python                              0x000000010ae46fe8 method_call + 136
	29  python                              0x000000010ae26efe PyObject_Call + 62
	30  python                              0x000000010af75cc0 _PyEval_EvalFrameDefault + 47360
	31  python                              0x000000010af69209 _PyEval_EvalCodeWithName + 425
	32  python                              0x000000010af783ba fast_function + 362
...
	66  python                              0x000000010afe8946 RunModule + 182
	67  python                              0x000000010afe7ad4 Py_Main + 3076
	68  python                              0x000000010ae17929 main + 313
	69  libdyld.dylib                       0x00007fff5afe4015 start + 1
	70  ???                                 0x0000000000000004 0x0 + 4
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Abort trap: 6
(scipy-dev) Lucass-MacBook:scipy rlucas$ 

The way I was able to get it to pass (locally) was to pass skip directives to the plotting commands. Example hunk:

tage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? n
@@ -129,11 +126,11 @@ def binned_statistic(x, values, statistic='mean',
     >>> boatspeed = .3 * windspeed**.5 + .2 * np.random.rand(500)
     >>> bin_means, bin_edges, binnumber = stats.binned_statistic(windspeed,
     ...                 boatspeed, statistic='median', bins=[1,2,3,4,5,6,7])
-    >>> plt.figure()
-    >>> plt.plot(windspeed, boatspeed, 'b.', label='raw data')
+    >>> plt.figure() # doctest: +SKIP
+    >>> plt.plot(windspeed, boatspeed, 'b.', label='raw data') # doctest: +SKIP
     >>> plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], colors='g', lw=5,
-    ...            label='binned statistic of data')
-    >>> plt.legend()
+    ...            label='binned statistic of data') # doctest: +SKIP
+    >>> plt.legend() # doctest: +SKIP
 
     Now we can use ``binnumber`` to select all datapoints with a windspeed
     below 1:
Stage this hunk [y,n,q,a,d,K,j,J,g,/,s,e,?]? 

I recall -> #9785 tried that-didn't solve the issue.
Also there are several changes in the doctests that have them failing locally. These are mostly because the return type isn't a tuple anymore. The return types are now namedtuple() instances but the doctests weren't updated since 2015 when git blame says that the changes were implemented.

We the travis builds changed recently to run the doctests?

I can change the doctests in this file too but would like a second pair of eyes to confirm what I'm proposing makes sense (and to include in this PR) before I go to make those changes to the namedtuple and the skip directives for the matplotlib plots, unless you have a better solution for the matplotlib issues.

@ev-br
Copy link
Member

ev-br commented Aug 22, 2019

Just copy-paste the doctest output into the doctests. This can in principle be fixed in the refguide-checker, but updating the example is simpler.
(The other fix would be to update the regex https://github.com/scipy/scipy/blob/master/tools/refguide_check.py#L596 to make understand multiple arrays in namedtuples)

@rlucas7
Copy link
Member Author

rlucas7 commented Aug 22, 2019 via email

@rlucas7
Copy link
Member Author

rlucas7 commented Aug 23, 2019

Just copy-paste the doctest output into the doctests.

I did that and this fixed the namedtuple output changes. Updating my conda virtual env via:
conda update matplotlib brought me from:

The following packages will be UPDATED:

  libpng                                  1.6.35-ha441bb4_0 --> 1.6.37-ha441bb4_0
  matplotlib                           3.0.2-py36h54f8f79_0 --> 3.1.0-py36h54f8f79_0

which resolved the abort trap 6 error. However, when I omit the # doctest: +SKIP directives (locally) on lines that have matplotlib plotting commands the doctests still fail. I'm not sure if this is an issue with my development env being improperly configured or if something else. Any guidance on the dev setup for the doctests to pass w/matplotlib plots is appreciated.

I'll push the changes to the PR omitting the skip directive and see if that fixes the issue in the CI build.

@rlucas7
Copy link
Member Author

rlucas7 commented Aug 23, 2019

@ev-br thanks for your help. I adjusted docstring/doctests and seems to resolve the build issue(s).
the matplotlib abort trap 6 issue was resolved by updating to 3.1.0 in conda virtual env

@rgommers
Copy link
Member

However, when I omit the # doctest: +SKIP directives (locally) on lines that have matplotlib plotting commands the doctests still fail. I'm not sure if this is an issue with my development env being improperly configured or if something else. Any guidance on the dev setup for the doctests to pass w/matplotlib plots is appreciated.

I ran into an issue as well with latest matplotlib and binned_statistic_dd, see gh-10720. Probably doesn't help for you looking at the above failures, but something is fishy it seems.

@rlucas7
Copy link
Member Author

rlucas7 commented Aug 27, 2019 via email

@rgommers
Copy link
Member

this needs a rebase now

MAINT: throw value error in binned_statistic_dd for non-finite

DOC: update doctest examples with namedtuple returns
MAINT: throw value error in binned_statistic_dd for non-finite
DOC: update doctest examples with namedtuple returns

MAINT: throw value error in binned_statistic_dd for non-finite

DOC: update doctest examples with namedtuple returns
@rlucas7
Copy link
Member Author

rlucas7 commented Oct 22, 2019

Always a little scary doing a rebase :)
@rgommers I rebased off 81994b1d46ab5ec8fa51f26d8e3de754bf7fe900 (current tip of master) let's see if there are any failing tests.

@rlucas7
Copy link
Member Author

rlucas7 commented Oct 22, 2019

all the travis builds succeeded except 1 and the lone failure is an unrelated error when one of the underlying C lang libraries is free()ing allocated memory.

Error starts here:
https://travis-ci.org/scipy/scipy/jobs/601043434#L985

@rgommers rgommers added this to the 1.4.0 milestone Oct 26, 2019
@rgommers rgommers merged commit ba207e9 into scipy:master Oct 26, 2019
@rgommers
Copy link
Member

LGTM now, merged. Thanks @rlucas7

@rlucas7
Copy link
Member Author

rlucas7 commented Jan 15, 2020

@courcelm I'm just seeing your post now and there is an opened issue related to this PR that was just opened. Can you check #11365 and if your issue is not the same one, open an issue with a reproducible example of what you error you're getting?

Once I have an error message I can take a look but github doesn't notify me about comments on merged PRs.

@WarrenWeckesser
Copy link
Member

@rlucas7, #11382 fixes one problem, but the issue reported by @courcelm is different. The docstrings for binned_statistic_2d and binned_statistic_dd explain that when statistic is 'count', the values argument is not referenced. This PR broke that promise, because it now always references values to check for finiteness.

The finiteness check should be changed to something like

    if statistic != 'count' and not np.isfinite(values).all():
        raise ValueError('The 'values' argument contains non-finite values.')
    if not np.isfinite(sample).all():
        raise ValueError('The 'sample' argument contains non-finite values.')

@WarrenWeckesser
Copy link
Member

For the record: the problem was reported in a stackoverflow question: https://stackoverflow.com/questions/60623899/why-is-binned-statistic-2d-now-throwing-typeerror

@@ -310,10 +309,10 @@ def binned_statistic_2d(x, y, values, statistic='mean',
>>> y = [2.1, 2.6, 2.1, 2.1]
>>> binx = [0.0, 0.5, 1.0]
>>> biny = [2.0, 2.5, 3.0]
>>> ret = stats.binned_statistic_2d(x, y, None, 'count', bins=[binx,biny])
>>> ret = stats.binned_statistic_2d(x, y, x, 'count', bins=[binx,biny])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should not have been necessary, and should be undone when the issue noted in my comment is fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I'll add a change to revert this one from x -> None

@rlucas7
Copy link
Member Author

rlucas7 commented Mar 10, 2020

@rlucas7, #11382 fixes one problem, but the issue reported by @courcelm is different. The docstrings for binned_statistic_2d and binned_statistic_dd explain that when statistic is 'count', the values argument is not referenced. This PR broke that promise, because it now always references values to check for finiteness.

The finiteness check should be changed to something like

    if statistic != 'count' and not np.isfinite(values).all():
        raise ValueError('The 'values' argument contains non-finite values.')
    if not np.isfinite(sample).all():
        raise ValueError('The 'sample' argument contains non-finite values.')

I think the issue is fixed in the #11382 PR, the check no longer references values and the PR fixes the underlying issue if number of bins is specified.

rlucas7 added a commit to rlucas7/scipy that referenced this pull request Mar 11, 2020
rlucas7 added a commit to rlucas7/scipy that referenced this pull request Mar 11, 2020
TST: add a nan test case for the backport fix

MAINT: revert doctest changes from scipygh-10664
rlucas7 added a commit to rlucas7/scipy that referenced this pull request Mar 20, 2020
TST: add a nan test case for the backport fix

MAINT: revert doctest changes from scipygh-10664
rlucas7 added a commit to rlucas7/scipy that referenced this pull request May 10, 2020
TST: add a nan test case for the backport fix

MAINT: revert doctest changes from scipygh-10664
WarrenWeckesser pushed a commit that referenced this pull request May 11, 2020
…es or samples (#11382)

* MAINT: remove error throw on non-finite values or samples

TST: add a nan test case for the backport fix

MAINT: revert doctest changes from gh-10664

* TST: add a test for numpy int64 type

* MAINT: add support for non python based int types

  * any int type that does not come
    directly from python but implements
    __index__ needs to be caught with
    the isinstance(bins, int) logic
    but would not. This changeset adds
    support for other int types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overflow Error from binned_statistic_dd
6 participants