median of empty array now produces IndexError #6462

ethankruse · 2015-10-12T22:25:56Z

Recently upgraded from 1.9.3 to 1.10.0 in Python 2.7 (but I get the same behavior in Python 3.5 on both Mac and Linux).

In 1.9.3, running np.median(np.array([])) produces a NaN and a warning of warnings.warn("Mean of empty slice.", RuntimeWarning).

Meanwhile, in 1.10.0, I'm now getting

IndexError: index -1 is out of bounds for axis 0 with size 0

The traceback points to the line in
python2.7/site-packages/numpy/lib/function_base.pyc in _median(a, axis, out, overwrite_input):

-> 3138         n = np.isnan(part[..., -1])

Is this intended behavior now in 1.10.0? I could switch all my code over to using np.nanmedian, which still produces the NaN result on empty arrays from 1.9.3, but this looks to be more of an accident/bug based on the tracebacks.

The text was updated successfully, but these errors were encountered:

lesteve · 2015-10-14T11:13:28Z

Is this intended behavior now in 1.10.0? ... but this looks to be more of an accident/bug based on the tracebacks.

I am wondering about the exact same thing. The context if you want to know: scikit-learn tests (on master) currently fail with numpy 1.10 and this is the source of one of the errors.

charris · 2015-10-14T14:30:12Z

Looks like a regression to me. @empeeu @juliantaylor?

njsmith · 2015-10-14T15:52:02Z

@lesteve: I don't think this is anyone's fault in particular or anything, but it also sounds like beyond this issue with median there's some deeper failure of process or coordination, if it's only now after 2 months of prereleases and two actual releases (!) that anyone is running the scikit-learn test suite against numpy 1.10. Any thoughts on how we could avoid this kind of situation in the future?

Cc: @amueller @ogrisel

amueller · 2015-10-14T16:02:57Z

@njsmith I agree. Sorry, I've been way behind on all my open source stuff. I was a bit shocked to see it yesterday. We had a similar thing with scipy. But 1.10 was released 8 days ago, right?

It's hard to make downstream people use release candidates. I think we never get any useful feedback from ours.

It requires reading another projects mailing list and seeing if they announce a release candidate, and then actually test it ^^.

Clearly it's impossible for numpy to run downstream test suites, so this one is really on us. I don't know if there is a better mechanism than just "andy and olivier should pay more attention to the numpy and scipy mailing lists".

argriffing · 2015-10-14T16:13:49Z

Clearly it's impossible for numpy to run downstream test suites, so this one is really on us.

I vaguely recall something about @matthew-brett having assembled an automated testing system for a collection of downstream packages. Maybe it was this: https://github.com/MacPython/scipy-stack-osx-testing.

amueller · 2015-10-14T16:15:49Z

It would be cool if we could easily run the test-suites for all packages in anaconda. @teoliphant ?

njsmith · 2015-10-14T16:31:49Z

Anaconda is kinda a mess right now for this purpose, because (a) there's no
public numpy build recipe, (b) the way they encode numpy versions into
their platform tag means that if you do rebuild numpy then you are also
forced to rebuild everything else too, e.g. scipy (and I don't think
there's any public build recipe for scipy either). It would be awesome if
there were any easy way to get an Anaconda environment using numpy
prereleases, but it's difficult for anyone outside of Continuum to take the
next step there...
.
But I don't really like the "Andy and Olivier feel extra guilty" strategy
either. It doesn't feel very scalable :-).
.
If the problem is that no one sees the prerelease announcements, then would
some other notification mechanism work better? numpy-prelease-announce@...?
Filing a bug on scikit-learn saying "numpy 1.12b1 is released, please test
it"? (The latter could even be automated for all interested downstream
projects using a little script against the github api.)
On Oct 14, 2015 9:15 AM, "Andreas Mueller" notifications@github.com wrote:

It would be cool if we could have the test-suites for all packages in
anaconda.

—
Reply to this email directly or view it on GitHub
#6462 (comment).

amueller · 2015-10-14T16:35:05Z

Opening an issue on the issue tracker would be helpful, I think. It is more likely to be picked up by someone than the announcement on the numpy mailing list.

Maintaining a list of downstream packages would be an additional burden on you, but I guess it wouldn't be that bad?

juliantaylor · 2015-10-14T17:00:10Z

we should put into our how to release docs a step to test at least:
scipy
pandas
sklearn
skimage
astropy
(thats the set I always tested for 1.9)

those typically already catch a lot of issues and are easily available. One can even just use the debian packages which helps finding backward compatibility issues

ogrisel · 2015-10-14T17:32:07Z

We could have numpy master branch builds generate linux test wheels for the Ubuntu 12.04 platform and upload them to a rackspace containers so that downstream projects can use them to test their own master branch against it in their CI with an allow_failure option for that entry in their build matrix (to avoid noisy messages):

http://docs.travis-ci.com/user/customizing-the-build/#Rows-that-are-Allowed-to-Fail

amueller · 2015-10-14T17:35:04Z

But do we want to test against master? Or just RCs? Who would get the failures from the CI?

ogrisel · 2015-10-14T17:40:55Z

We could also tests against RC (we can pip install --pre numpy to get the latest beta / RC) but to avoid we would still need to automate build linux wheels to avoid rebuilding numpy (and scipy) from source in our own CI.

charris · 2015-10-14T18:13:09Z

Testing against the master branch would give you early notice of changes/regressions. For instance, a lot of DeprecationWarnings are going to raise errors in 1.11. Master breaks occasionally, but not often.

charris · 2015-10-14T18:14:57Z

I wouldn't mind tweeting releases if there are developers who follow tweets.

shoyer · 2015-10-14T18:32:10Z

pandas already has tests against NumPy master running Travis-CI as "allowed failures". I think I'll set this up for xray, too, and I would recommend it for scikit learn as well. Even if you're building from source NumPy only takes a few minutes to install, which is not unreasonable.

@charris Yes, there are loads of us on Twitter talking about open source software :).

charris · 2015-10-14T18:51:40Z

#numpy, #python ?

ogrisel · 2015-10-14T18:59:55Z

numpy builds quickly enough but this is not the case for scipy and ideally we would like to test scikit-learn against the master branch of both of them.

njsmith · 2015-10-14T19:19:10Z

Maybe someone could set up a wheelhouse on Rackspace (or wherever), and add
code to numpy, scipy, etc.'s Travis builds to upload a new wheel whenever
the tests pass?
On Oct 14, 2015 11:59 AM, "Olivier Grisel" notifications@github.com wrote:

numpy builds quickly enough but this is not the case for scipy and ideally
we would like to test scikit-learn against the master branch of both of
them.

—
Reply to this email directly or view it on GitHub
#6462 (comment).

ogrisel · 2015-10-14T19:26:35Z

Maybe someone could set up a wheelhouse on Rackspace (or wherever), and add code to numpy, scipy, etc.'s Travis builds to upload a new wheel whenever the tests pass?

I can do that, I have automated most of it for scikit-learn for windows on appveyor but it should for linux on travis. However we are trying to get the scikit-learn release out this week and we will do a sprint on scikit-learn next week so I am not sure I can find the time to do it in the very short term.

ethankruse · 2015-10-15T00:01:11Z

I'll leave the discussion of broader implications and catching things like this sooner to everyone seriously involved.

But as far as fixing this particular issue, maybe @empeeu or @juliantaylor can help as they seemed involved in this particular change over on Issue #586.

The problem occurs in line 3346 (currently) with the part[...,-1] throwing an error if part has size 0.

3341        # Check if the array contains any nan's
3342        if np.issubdtype(a.dtype, np.inexact):
3343            # warn and return nans like mean would
3344            rout = mean(part[indexer], axis=axis, out=out)
3345            part = np.rollaxis(part, axis, part.ndim)
3346            n = np.isnan(part[..., -1])

One possible simple fix is to just to skip this block if the array is of size 0 (local variable sz in the function), which effectively reverts back to the previous functionality of version 1.9.3.

3341        # Check if the array contains any nan's
3342        if np.issubdtype(a.dtype, np.inexact) and sz > 0:
3343            # warn and return nans like mean would
3344            rout = mean(part[indexer], axis=axis, out=out)
3345            part = np.rollaxis(part, axis, part.ndim)
3346            n = np.isnan(part[..., -1])

njsmith · 2015-10-17T05:58:19Z

Okay, summarized the conclusions from the general discussion in #6493, scipy/scipy#5379, #6494. Sorry for derailing this issue for that, but it sounds like there's some clear steps we can take to do better next time!

@ethankruse: regarding the issue with median, it sounds like you have a pretty good handle on what's needed here -- want to just submit a PR with the fix?

matthew-brett · 2015-10-17T06:40:19Z

It would be pretty trivial to set up travis wheel uploading to http://travis-wheels.sckicit-image.org when all the tests pass.

njsmith · 2015-10-17T06:59:14Z

@matthew-brett: if you want to jump in and do it before @ogrisel gets
around to it then I'm sure no one would object :-)
On Oct 16, 2015 23:40, "Matthew Brett" notifications@github.com wrote:

It would be pretty trivial to set up travis wheel uploading to
http://travis-wheels.sckicit-image.org when all the tests pass.

—
Reply to this email directly or view it on GitHub
#6462 (comment).

ethankruse · 2015-10-19T20:57:08Z

Ok, @njsmith, I submitted my simple fix. As far as I can tell, the median function now acts appropriately, even with empty arrays of various shapes.

I'm not used to contributing to large projects though, so I hope I did things right.

njsmith · 2015-10-19T22:04:06Z

Thanks Ethan! And don't worry, we all started somewhere :-)
On Oct 19, 2015 1:57 PM, "Ethan Kruse" notifications@github.com wrote:

Ok, @njsmith https://github.com/njsmith, I submitted my simple fix. As
far as I can tell, the median function now acts appropriately, even with
empty arrays of various shapes.

I'm not used to contributing to large projects though, so I hope I did
things right.

—
Reply to this email directly or view it on GitHub
#6462 (comment).

Potential fix for #6462

np.median([]) returns NaN. Fixes bug/regression that raised an IndexError. Added tests to ensure continued support of empty arrays.

charris · 2015-10-24T16:56:13Z

This is fixed, but I wonder if a better targeted warning would be better.

empeeu · 2015-10-25T19:58:13Z

Sorry for being late to the party. @ethankruse thanks for fixing my error. Seems like this could have been avoided if there had been a unit test in place to test this scenario. When I was working on the nan update I only added tests to check the new nan behavior, and checking for empty arrays did not occur to me. Perhaps a checklist of scenarios that should be tested would have been helpful?

njsmith · 2015-10-25T20:48:18Z

@empeeu: Don't worry, it could have happened to anyone :-). We try to test all the cases we can think of, but there are inevitably gaps in the existing tests (many of which date back to the old days when we were less thorough about writing tests), and even when we're careful stuff will sometimes slip through. Nothing really to be done about it except to test pre-releases thoroughly, and patch up any test suite gaps when they're discovered. (Think of it this way: you've successfully discovered a bug in our test suite, and guaranteed that no-one else will ever make the same mistake again ;-).)

* 'master' of https://github.com/numpy/numpy: (384 commits) BUG: fix MANIFEST.in for removal of a file in numpygh-8047. DOC: Release notes for Numpy 1.10.2. MAINT: remove useless files with outdated info from repo root and doc/. MAINT: fix mistake in doc upload rule TST: attempt to make test_load_refcount deterministic BUG: Fix for numpy#6569, allowing build_ext --inplace TST: Added regression test empty percentile, in ref to numpy#6530 and numpy#6553 TST: Added tests for empty partition and argpartition BUG: revert view safety checks TST: Remove tests of view safety checks (see next commit) BUG: Revert some import * fixes in f2py. BUG: Fixed partition errors on empty input. Closes numpy#6530 DOC: import "numpy for matlab users" from the wiki DOC: reorganize user guide a bit + import "tentative numpy tutorial" from wiki DOC: remove placeholders and incompleteness warnings MAINT: minor update to "make upload" doc build command. BUG: error in broadcast_arrays with as_strided array BUG: fix inner() by copying if needed to enforce contiguity DOC: clarify usage of 'argparse' return value. BUG: Make median work for empty arrays (issue numpy#6462) ...

np.median([]) returns NaN. Fixes bug/regression that raised an IndexError. Added tests to ensure continued support of empty arrays.

charris added 06 - Regression component: numpy.lib labels Oct 14, 2015

charris added this to the 1.10.2 release milestone Oct 14, 2015

amueller mentioned this issue Oct 14, 2015

[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10 scikit-learn/scikit-learn#5398

Merged

rgommers mentioned this issue Oct 14, 2015

TST: do one of the test runs on TravisCI with latest numpy master. scipy/scipy#4729

Closed

ethankruse mentioned this issue Oct 19, 2015

Potential fix for #6462 #6527

Merged

kwgoodman mentioned this issue Oct 20, 2015

FAIL: Test median. with numpy-1.10 pydata/bottleneck#105

Closed

charris added a commit that referenced this issue Oct 21, 2015

Merge pull request #6527 from ethankruse/6462fix

bf28b44

Potential fix for #6462

charris pushed a commit to charris/numpy that referenced this issue Oct 21, 2015

BUG: Make median work for empty arrays (issue numpy#6462)

04b16fd

np.median([]) returns NaN. Fixes bug/regression that raised an IndexError. Added tests to ensure continued support of empty arrays.

charris mentioned this issue Oct 21, 2015

backport-6527 #6547

Merged

charris closed this as completed Oct 24, 2015

jaimefrio pushed a commit to jaimefrio/numpy that referenced this issue Mar 22, 2016

BUG: Make median work for empty arrays (issue numpy#6462)

7e6959e

np.median([]) returns NaN. Fixes bug/regression that raised an IndexError. Added tests to ensure continued support of empty arrays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

median of empty array now produces IndexError #6462

median of empty array now produces IndexError #6462

ethankruse commented Oct 12, 2015

lesteve commented Oct 14, 2015

charris commented Oct 14, 2015

njsmith commented Oct 14, 2015

amueller commented Oct 14, 2015

argriffing commented Oct 14, 2015

amueller commented Oct 14, 2015

njsmith commented Oct 14, 2015

amueller commented Oct 14, 2015

juliantaylor commented Oct 14, 2015

ogrisel commented Oct 14, 2015

amueller commented Oct 14, 2015

ogrisel commented Oct 14, 2015

charris commented Oct 14, 2015

charris commented Oct 14, 2015

shoyer commented Oct 14, 2015

charris commented Oct 14, 2015

ogrisel commented Oct 14, 2015

njsmith commented Oct 14, 2015

ogrisel commented Oct 14, 2015

ethankruse commented Oct 15, 2015

njsmith commented Oct 17, 2015

matthew-brett commented Oct 17, 2015

njsmith commented Oct 17, 2015

ethankruse commented Oct 19, 2015

njsmith commented Oct 19, 2015

charris commented Oct 24, 2015

empeeu commented Oct 25, 2015

njsmith commented Oct 25, 2015

median of empty array now produces IndexError #6462

median of empty array now produces IndexError #6462

Comments

ethankruse commented Oct 12, 2015

lesteve commented Oct 14, 2015

charris commented Oct 14, 2015

njsmith commented Oct 14, 2015

amueller commented Oct 14, 2015

argriffing commented Oct 14, 2015

amueller commented Oct 14, 2015

njsmith commented Oct 14, 2015

amueller commented Oct 14, 2015

juliantaylor commented Oct 14, 2015

ogrisel commented Oct 14, 2015

amueller commented Oct 14, 2015

ogrisel commented Oct 14, 2015

charris commented Oct 14, 2015

charris commented Oct 14, 2015

shoyer commented Oct 14, 2015

charris commented Oct 14, 2015

ogrisel commented Oct 14, 2015

njsmith commented Oct 14, 2015

ogrisel commented Oct 14, 2015

ethankruse commented Oct 15, 2015

njsmith commented Oct 17, 2015

matthew-brett commented Oct 17, 2015

njsmith commented Oct 17, 2015

ethankruse commented Oct 19, 2015

njsmith commented Oct 19, 2015

charris commented Oct 24, 2015

empeeu commented Oct 25, 2015

njsmith commented Oct 25, 2015