New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
median of empty array now produces IndexError #6462
Comments
I am wondering about the exact same thing. The context if you want to know: scikit-learn tests (on master) currently fail with numpy 1.10 and this is the source of one of the errors. |
Looks like a regression to me. @empeeu @juliantaylor? |
@lesteve: I don't think this is anyone's fault in particular or anything, but it also sounds like beyond this issue with median there's some deeper failure of process or coordination, if it's only now after 2 months of prereleases and two actual releases (!) that anyone is running the scikit-learn test suite against numpy 1.10. Any thoughts on how we could avoid this kind of situation in the future? |
@njsmith I agree. Sorry, I've been way behind on all my open source stuff. I was a bit shocked to see it yesterday. We had a similar thing with scipy. But 1.10 was released 8 days ago, right? It's hard to make downstream people use release candidates. I think we never get any useful feedback from ours. It requires reading another projects mailing list and seeing if they announce a release candidate, and then actually test it ^^. Clearly it's impossible for numpy to run downstream test suites, so this one is really on us. I don't know if there is a better mechanism than just "andy and olivier should pay more attention to the numpy and scipy mailing lists". |
I vaguely recall something about @matthew-brett having assembled an automated testing system for a collection of downstream packages. Maybe it was this: https://github.com/MacPython/scipy-stack-osx-testing. |
It would be cool if we could easily run the test-suites for all packages in anaconda. @teoliphant ? |
Anaconda is kinda a mess right now for this purpose, because (a) there's no
|
Opening an issue on the issue tracker would be helpful, I think. It is more likely to be picked up by someone than the announcement on the numpy mailing list. Maintaining a list of downstream packages would be an additional burden on you, but I guess it wouldn't be that bad? |
we should put into our how to release docs a step to test at least: those typically already catch a lot of issues and are easily available. One can even just use the debian packages which helps finding backward compatibility issues |
We could have numpy master branch builds generate linux test wheels for the Ubuntu 12.04 platform and upload them to a rackspace containers so that downstream projects can use them to test their own master branch against it in their CI with an http://docs.travis-ci.com/user/customizing-the-build/#Rows-that-are-Allowed-to-Fail |
But do we want to test against master? Or just RCs? Who would get the failures from the CI? |
We could also tests against RC (we can |
Testing against the master branch would give you early notice of changes/regressions. For instance, a lot of |
I wouldn't mind tweeting releases if there are developers who follow tweets. |
pandas already has tests against NumPy master running Travis-CI as "allowed failures". I think I'll set this up for xray, too, and I would recommend it for scikit learn as well. Even if you're building from source NumPy only takes a few minutes to install, which is not unreasonable. @charris Yes, there are loads of us on Twitter talking about open source software :). |
#numpy, #python ? |
numpy builds quickly enough but this is not the case for scipy and ideally we would like to test scikit-learn against the master branch of both of them. |
Maybe someone could set up a wheelhouse on Rackspace (or wherever), and add
|
I can do that, I have automated most of it for scikit-learn for windows on appveyor but it should for linux on travis. However we are trying to get the scikit-learn release out this week and we will do a sprint on scikit-learn next week so I am not sure I can find the time to do it in the very short term. |
I'll leave the discussion of broader implications and catching things like this sooner to everyone seriously involved. But as far as fixing this particular issue, maybe @empeeu or @juliantaylor can help as they seemed involved in this particular change over on Issue #586. The problem occurs in line 3346 (currently) with the
One possible simple fix is to just to skip this block if the array is of size 0 (local variable
|
Okay, summarized the conclusions from the general discussion in #6493, scipy/scipy#5379, #6494. Sorry for derailing this issue for that, but it sounds like there's some clear steps we can take to do better next time! @ethankruse: regarding the issue with median, it sounds like you have a pretty good handle on what's needed here -- want to just submit a PR with the fix? |
It would be pretty trivial to set up travis wheel uploading to http://travis-wheels.sckicit-image.org when all the tests pass. |
@matthew-brett: if you want to jump in and do it before @ogrisel gets
|
Ok, @njsmith, I submitted my simple fix. As far as I can tell, the median function now acts appropriately, even with empty arrays of various shapes. I'm not used to contributing to large projects though, so I hope I did things right. |
Thanks Ethan! And don't worry, we all started somewhere :-)
|
np.median([]) returns NaN. Fixes bug/regression that raised an IndexError. Added tests to ensure continued support of empty arrays.
This is fixed, but I wonder if a better targeted warning would be better. |
Sorry for being late to the party. @ethankruse thanks for fixing my error. Seems like this could have been avoided if there had been a unit test in place to test this scenario. When I was working on the nan update I only added tests to check the new nan behavior, and checking for empty arrays did not occur to me. Perhaps a checklist of scenarios that should be tested would have been helpful? |
@empeeu: Don't worry, it could have happened to anyone :-). We try to test all the cases we can think of, but there are inevitably gaps in the existing tests (many of which date back to the old days when we were less thorough about writing tests), and even when we're careful stuff will sometimes slip through. Nothing really to be done about it except to test pre-releases thoroughly, and patch up any test suite gaps when they're discovered. (Think of it this way: you've successfully discovered a bug in our test suite, and guaranteed that no-one else will ever make the same mistake again ;-).) |
* 'master' of https://github.com/numpy/numpy: (384 commits) BUG: fix MANIFEST.in for removal of a file in numpygh-8047. DOC: Release notes for Numpy 1.10.2. MAINT: remove useless files with outdated info from repo root and doc/. MAINT: fix mistake in doc upload rule TST: attempt to make test_load_refcount deterministic BUG: Fix for numpy#6569, allowing build_ext --inplace TST: Added regression test empty percentile, in ref to numpy#6530 and numpy#6553 TST: Added tests for empty partition and argpartition BUG: revert view safety checks TST: Remove tests of view safety checks (see next commit) BUG: Revert some import * fixes in f2py. BUG: Fixed partition errors on empty input. Closes numpy#6530 DOC: import "numpy for matlab users" from the wiki DOC: reorganize user guide a bit + import "tentative numpy tutorial" from wiki DOC: remove placeholders and incompleteness warnings MAINT: minor update to "make upload" doc build command. BUG: error in broadcast_arrays with as_strided array BUG: fix inner() by copying if needed to enforce contiguity DOC: clarify usage of 'argparse' return value. BUG: Make median work for empty arrays (issue numpy#6462) ...
np.median([]) returns NaN. Fixes bug/regression that raised an IndexError. Added tests to ensure continued support of empty arrays.
Recently upgraded from 1.9.3 to 1.10.0 in Python 2.7 (but I get the same behavior in Python 3.5 on both Mac and Linux).
In 1.9.3, running
np.median(np.array([]))
produces a NaN and a warning ofwarnings.warn("Mean of empty slice.", RuntimeWarning)
.Meanwhile, in 1.10.0, I'm now getting
The traceback points to the line in
python2.7/site-packages/numpy/lib/function_base.pyc in _median(a, axis, out, overwrite_input):
Is this intended behavior now in 1.10.0? I could switch all my code over to using
np.nanmedian
, which still produces the NaN result on empty arrays from 1.9.3, but this looks to be more of an accident/bug based on the tracebacks.The text was updated successfully, but these errors were encountered: