Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

median of empty array now produces IndexError #6462

Closed
ethankruse opened this issue Oct 12, 2015 · 28 comments
Closed

median of empty array now produces IndexError #6462

ethankruse opened this issue Oct 12, 2015 · 28 comments

Comments

@ethankruse
Copy link
Contributor

Recently upgraded from 1.9.3 to 1.10.0 in Python 2.7 (but I get the same behavior in Python 3.5 on both Mac and Linux).

In 1.9.3, running np.median(np.array([])) produces a NaN and a warning of warnings.warn("Mean of empty slice.", RuntimeWarning).

Meanwhile, in 1.10.0, I'm now getting

IndexError: index -1 is out of bounds for axis 0 with size 0

The traceback points to the line in
python2.7/site-packages/numpy/lib/function_base.pyc in _median(a, axis, out, overwrite_input):

-> 3138         n = np.isnan(part[..., -1])

Is this intended behavior now in 1.10.0? I could switch all my code over to using np.nanmedian, which still produces the NaN result on empty arrays from 1.9.3, but this looks to be more of an accident/bug based on the tracebacks.

@lesteve
Copy link
Contributor

lesteve commented Oct 14, 2015

Is this intended behavior now in 1.10.0? ... but this looks to be more of an accident/bug based on the tracebacks.

I am wondering about the exact same thing. The context if you want to know: scikit-learn tests (on master) currently fail with numpy 1.10 and this is the source of one of the errors.

@charris
Copy link
Member

charris commented Oct 14, 2015

Looks like a regression to me. @empeeu @juliantaylor?

@charris charris added this to the 1.10.2 release milestone Oct 14, 2015
@njsmith
Copy link
Member

njsmith commented Oct 14, 2015

@lesteve: I don't think this is anyone's fault in particular or anything, but it also sounds like beyond this issue with median there's some deeper failure of process or coordination, if it's only now after 2 months of prereleases and two actual releases (!) that anyone is running the scikit-learn test suite against numpy 1.10. Any thoughts on how we could avoid this kind of situation in the future?

Cc: @amueller @ogrisel

@amueller
Copy link

@njsmith I agree. Sorry, I've been way behind on all my open source stuff. I was a bit shocked to see it yesterday. We had a similar thing with scipy. But 1.10 was released 8 days ago, right?

It's hard to make downstream people use release candidates. I think we never get any useful feedback from ours.

It requires reading another projects mailing list and seeing if they announce a release candidate, and then actually test it ^^.

Clearly it's impossible for numpy to run downstream test suites, so this one is really on us. I don't know if there is a better mechanism than just "andy and olivier should pay more attention to the numpy and scipy mailing lists".

@argriffing
Copy link
Contributor

Clearly it's impossible for numpy to run downstream test suites, so this one is really on us.

I vaguely recall something about @matthew-brett having assembled an automated testing system for a collection of downstream packages. Maybe it was this: https://github.com/MacPython/scipy-stack-osx-testing.

@amueller
Copy link

It would be cool if we could easily run the test-suites for all packages in anaconda. @teoliphant ?

@njsmith
Copy link
Member

njsmith commented Oct 14, 2015

Anaconda is kinda a mess right now for this purpose, because (a) there's no
public numpy build recipe, (b) the way they encode numpy versions into
their platform tag means that if you do rebuild numpy then you are also
forced to rebuild everything else too, e.g. scipy (and I don't think
there's any public build recipe for scipy either). It would be awesome if
there were any easy way to get an Anaconda environment using numpy
prereleases, but it's difficult for anyone outside of Continuum to take the
next step there...
.
But I don't really like the "Andy and Olivier feel extra guilty" strategy
either. It doesn't feel very scalable :-).
.
If the problem is that no one sees the prerelease announcements, then would
some other notification mechanism work better? numpy-prelease-announce@...?
Filing a bug on scikit-learn saying "numpy 1.12b1 is released, please test
it"? (The latter could even be automated for all interested downstream
projects using a little script against the github api.)
On Oct 14, 2015 9:15 AM, "Andreas Mueller" notifications@github.com wrote:

It would be cool if we could have the test-suites for all packages in
anaconda.


Reply to this email directly or view it on GitHub
#6462 (comment).

@amueller
Copy link

Opening an issue on the issue tracker would be helpful, I think. It is more likely to be picked up by someone than the announcement on the numpy mailing list.

Maintaining a list of downstream packages would be an additional burden on you, but I guess it wouldn't be that bad?

@juliantaylor
Copy link
Contributor

we should put into our how to release docs a step to test at least:
scipy
pandas
sklearn
skimage
astropy
(thats the set I always tested for 1.9)

those typically already catch a lot of issues and are easily available. One can even just use the debian packages which helps finding backward compatibility issues

@ogrisel
Copy link
Contributor

ogrisel commented Oct 14, 2015

We could have numpy master branch builds generate linux test wheels for the Ubuntu 12.04 platform and upload them to a rackspace containers so that downstream projects can use them to test their own master branch against it in their CI with an allow_failure option for that entry in their build matrix (to avoid noisy messages):

http://docs.travis-ci.com/user/customizing-the-build/#Rows-that-are-Allowed-to-Fail

@amueller
Copy link

But do we want to test against master? Or just RCs? Who would get the failures from the CI?

@ogrisel
Copy link
Contributor

ogrisel commented Oct 14, 2015

We could also tests against RC (we can pip install --pre numpy to get the latest beta / RC) but to avoid we would still need to automate build linux wheels to avoid rebuilding numpy (and scipy) from source in our own CI.

@charris
Copy link
Member

charris commented Oct 14, 2015

Testing against the master branch would give you early notice of changes/regressions. For instance, a lot of DeprecationWarnings are going to raise errors in 1.11. Master breaks occasionally, but not often.

@charris
Copy link
Member

charris commented Oct 14, 2015

I wouldn't mind tweeting releases if there are developers who follow tweets.

@shoyer
Copy link
Member

shoyer commented Oct 14, 2015

pandas already has tests against NumPy master running Travis-CI as "allowed failures". I think I'll set this up for xray, too, and I would recommend it for scikit learn as well. Even if you're building from source NumPy only takes a few minutes to install, which is not unreasonable.

@charris Yes, there are loads of us on Twitter talking about open source software :).

@charris
Copy link
Member

charris commented Oct 14, 2015

#numpy, #python ?

@ogrisel
Copy link
Contributor

ogrisel commented Oct 14, 2015

numpy builds quickly enough but this is not the case for scipy and ideally we would like to test scikit-learn against the master branch of both of them.

@njsmith
Copy link
Member

njsmith commented Oct 14, 2015

Maybe someone could set up a wheelhouse on Rackspace (or wherever), and add
code to numpy, scipy, etc.'s Travis builds to upload a new wheel whenever
the tests pass?
On Oct 14, 2015 11:59 AM, "Olivier Grisel" notifications@github.com wrote:

numpy builds quickly enough but this is not the case for scipy and ideally
we would like to test scikit-learn against the master branch of both of
them.


Reply to this email directly or view it on GitHub
#6462 (comment).

@ogrisel
Copy link
Contributor

ogrisel commented Oct 14, 2015

Maybe someone could set up a wheelhouse on Rackspace (or wherever), and add code to numpy, scipy, etc.'s Travis builds to upload a new wheel whenever the tests pass?

I can do that, I have automated most of it for scikit-learn for windows on appveyor but it should for linux on travis. However we are trying to get the scikit-learn release out this week and we will do a sprint on scikit-learn next week so I am not sure I can find the time to do it in the very short term.

@ethankruse
Copy link
Contributor Author

I'll leave the discussion of broader implications and catching things like this sooner to everyone seriously involved.

But as far as fixing this particular issue, maybe @empeeu or @juliantaylor can help as they seemed involved in this particular change over on Issue #586.

The problem occurs in line 3346 (currently) with the part[...,-1] throwing an error if part has size 0.

3341        # Check if the array contains any nan's
3342        if np.issubdtype(a.dtype, np.inexact):
3343            # warn and return nans like mean would
3344            rout = mean(part[indexer], axis=axis, out=out)
3345            part = np.rollaxis(part, axis, part.ndim)
3346            n = np.isnan(part[..., -1])

One possible simple fix is to just to skip this block if the array is of size 0 (local variable sz in the function), which effectively reverts back to the previous functionality of version 1.9.3.

3341        # Check if the array contains any nan's
3342        if np.issubdtype(a.dtype, np.inexact) and sz > 0:
3343            # warn and return nans like mean would
3344            rout = mean(part[indexer], axis=axis, out=out)
3345            part = np.rollaxis(part, axis, part.ndim)
3346            n = np.isnan(part[..., -1])

@njsmith
Copy link
Member

njsmith commented Oct 17, 2015

Okay, summarized the conclusions from the general discussion in #6493, scipy/scipy#5379, #6494. Sorry for derailing this issue for that, but it sounds like there's some clear steps we can take to do better next time!

@ethankruse: regarding the issue with median, it sounds like you have a pretty good handle on what's needed here -- want to just submit a PR with the fix?

@matthew-brett
Copy link
Contributor

It would be pretty trivial to set up travis wheel uploading to http://travis-wheels.sckicit-image.org when all the tests pass.

@njsmith
Copy link
Member

njsmith commented Oct 17, 2015

@matthew-brett: if you want to jump in and do it before @ogrisel gets
around to it then I'm sure no one would object :-)
On Oct 16, 2015 23:40, "Matthew Brett" notifications@github.com wrote:

It would be pretty trivial to set up travis wheel uploading to
http://travis-wheels.sckicit-image.org when all the tests pass.


Reply to this email directly or view it on GitHub
#6462 (comment).

@ethankruse
Copy link
Contributor Author

Ok, @njsmith, I submitted my simple fix. As far as I can tell, the median function now acts appropriately, even with empty arrays of various shapes.

I'm not used to contributing to large projects though, so I hope I did things right.

@njsmith
Copy link
Member

njsmith commented Oct 19, 2015

Thanks Ethan! And don't worry, we all started somewhere :-)
On Oct 19, 2015 1:57 PM, "Ethan Kruse" notifications@github.com wrote:

Ok, @njsmith https://github.com/njsmith, I submitted my simple fix. As
far as I can tell, the median function now acts appropriately, even with
empty arrays of various shapes.

I'm not used to contributing to large projects though, so I hope I did
things right.


Reply to this email directly or view it on GitHub
#6462 (comment).

charris added a commit that referenced this issue Oct 21, 2015
charris pushed a commit to charris/numpy that referenced this issue Oct 21, 2015
np.median([]) returns NaN. Fixes bug/regression that raised an IndexError.
Added tests to ensure continued support of empty arrays.
@charris charris mentioned this issue Oct 21, 2015
@charris
Copy link
Member

charris commented Oct 24, 2015

This is fixed, but I wonder if a better targeted warning would be better.

@charris charris closed this as completed Oct 24, 2015
@empeeu
Copy link
Contributor

empeeu commented Oct 25, 2015

Sorry for being late to the party. @ethankruse thanks for fixing my error. Seems like this could have been avoided if there had been a unit test in place to test this scenario. When I was working on the nan update I only added tests to check the new nan behavior, and checking for empty arrays did not occur to me. Perhaps a checklist of scenarios that should be tested would have been helpful?

@njsmith
Copy link
Member

njsmith commented Oct 25, 2015

@empeeu: Don't worry, it could have happened to anyone :-). We try to test all the cases we can think of, but there are inevitably gaps in the existing tests (many of which date back to the old days when we were less thorough about writing tests), and even when we're careful stuff will sometimes slip through. Nothing really to be done about it except to test pre-releases thoroughly, and patch up any test suite gaps when they're discovered. (Think of it this way: you've successfully discovered a bug in our test suite, and guaranteed that no-one else will ever make the same mistake again ;-).)

tpn added a commit to pyparallel/numpy that referenced this issue Oct 30, 2015
* 'master' of https://github.com/numpy/numpy: (384 commits)
  BUG: fix MANIFEST.in for removal of a file in numpygh-8047.
  DOC: Release notes for Numpy 1.10.2.
  MAINT: remove useless files with outdated info from repo root and doc/.
  MAINT: fix mistake in doc upload rule
  TST: attempt to make test_load_refcount deterministic
  BUG: Fix for numpy#6569, allowing build_ext --inplace
  TST: Added regression test empty percentile, in ref to numpy#6530 and numpy#6553
  TST: Added tests for empty partition and argpartition
  BUG: revert view safety checks
  TST: Remove tests of view safety checks (see next commit)
  BUG: Revert some import * fixes in f2py.
  BUG: Fixed partition errors on empty input. Closes numpy#6530
  DOC: import "numpy for matlab users" from the wiki
  DOC: reorganize user guide a bit + import "tentative numpy tutorial" from wiki
  DOC: remove placeholders and incompleteness warnings
  MAINT: minor update to "make upload" doc build command.
  BUG: error in broadcast_arrays with as_strided array
  BUG: fix inner() by copying if needed to enforce contiguity
  DOC: clarify usage of 'argparse' return value.
  BUG: Make median work for empty arrays (issue numpy#6462)
  ...
jaimefrio pushed a commit to jaimefrio/numpy that referenced this issue Mar 22, 2016
np.median([]) returns NaN. Fixes bug/regression that raised an IndexError.
Added tests to ensure continued support of empty arrays.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests