Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.10 regression: partition errors out on empty input #6530

Closed
njsmith opened this issue Oct 20, 2015 · 8 comments
Closed

1.10 regression: partition errors out on empty input #6530

njsmith opened this issue Oct 20, 2015 · 8 comments

Comments

@njsmith
Copy link
Member

njsmith commented Oct 20, 2015

maintenance/1.9.x:

In [8]: ap = np.array([0, 2, 4, 6, 8, 10])

# Passing an empty 'k' array to partition is a no-op:
In [9]: ap.partition(np.array([], dtype=np.int64))

In [10]: ap
Out[10]: array([ 0,  2,  4,  6,  8, 10])

1.10.1:

In [2]: ap = np.array([0, 2, 4, 6, 8, 10])

In [3]: ap.partition(np.array([], dtype=np.int64))
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-3-c708b3f943a6> in <module>()
----> 1 ap.partition(np.array([], dtype=np.int64))

MemoryError: 

Noticed this because it's breaking some tests in patsy.

@njsmith njsmith added this to the 1.10.2 release milestone Oct 20, 2015
@njsmith
Copy link
Member Author

njsmith commented Oct 20, 2015

git bisect says:

3937a87 is the first bad commit
commit 3937a87
Author: jaimefrio jaime.frio@gmail.com
Date: Wed Jan 21 17:15:00 2015 -0800

MAINT: Refactor _new_sortlike and _new_argsortlike

Simplify the signature of both functions, remove code duplication
inside, add proper handling of byte-swapping for compound dtypes
(like complex, fixes #5441), and add functionality to deal with
dtypes containing Python object references. This last enhancement is
not needed for this PR, but supports subsequent ones.

@jaimefrio

@jaimefrio
Copy link
Member

The error being returned is terribly misleading, but this should be easy to fix.

What's happening is that _new_sortlike takes a "guilty unless proven innocent" approach to life. It initializes its return value to -1 (see here), expecting it to be turned into a 0 by the calls to either the sort or part functions.

Failure of the sort and part functions can only happen if there is a comparison error (in which case the Python error would be set), or if there is a memory allocation error in mergesort (in which case there is no error set). Assuming those are the only possibilities, _new_sortlike sets the error to a MemoryError if the return value indicates failure but no error is set.

So, after this very long explanation, and unless I am missing something else, it would seem to me that the fix is simple to be civilized, and go to an "innocent unless proven guilty" approach, i.e. simply changing the line linked above to int ret = 0; should fix this issue, and I don't think it will create any new ones.

The exact same issue should be affecting _new_argsortlike, (see here), and hence argpartition.

Plus tests, of course...

If no one beats me to it, I'll try to send a PR during this week. But anyone, please feel free to beat me to it!

@njsmith
Copy link
Member Author

njsmith commented Oct 22, 2015

I haven't had time to go properly spelunking, but is there some deep reason
why it is impossible for mergesort to raise an error properly? It seems
like that would make this much easier to reason about.

On Tue, Oct 20, 2015 at 9:21 AM, Jaime notifications@github.com wrote:

The error being returned is terribly misleading, but this should be easy
to fix.

What's happening is that _new_sortlike takes a "guilty unless proven
innocent" approach to life. It initializes its return value to -1 (see
here
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L812),
expecting it to be turned into a 0 by the calls to either the sort or part
functions.

Failure of the sort and part functions can only happen if there is a
comparison error (in which case the Python error would be set), or if there
is a memory allocation error in mergesort (in which case there is no error
set). Assuming those are the only possibilities, _new_sortlike sets the
error to a MemoryError if the return value indicates failure but no error
is set.

So, after this very long explanation, and unless I am missing something
else, it would seem to me that the fix is simple to be civilized, and go to
an "innocent unless proven guilty" approach, i.e. simply changing the line
linked above to int ret = 0; should fix this issue, and I don't think it
will create any new ones.

The exact same issue should be affecting _new_argsortlike, (see here
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L950),
and hence argpartition.

Plus tests, of course...

If no one beats me to it, I'll try to send a PR during this week. But
anyone, please feel free to beat me to it!


Reply to this email directly or view it on GitHub
#6530 (comment).

@jaimefrio
Copy link
Member

The GIL is the main reason.

@njsmith
Copy link
Member Author

njsmith commented Oct 22, 2015

Raising an exception from GIL-less code is pretty straightforward -- you write something like

void my_func_that_is_called_without_the_GIL() {
    /* Has to be at the top with variable definitions */
    NPY_ALLOW_C_API_DEF
    ...
    if (something_went_wrong) {
        NPY_ALLOW_C_API;
        /* In here we have the GIL, and are allowed to use Python C API like PyErr_Format */
        NPY_DISABLE_C_API;
        return -1;
    }
    ...
}

@yashmehrotra
Copy link
Contributor

@jaimefrio I would like to fix this issue. Can you assign it to me ?

@njsmith
Copy link
Member Author

njsmith commented Oct 23, 2015

We don't really use the assignment feature in github, just posting to say
you're working on it is generally good enough to warn people :-). Go for it!
On Oct 22, 2015 3:38 PM, "Yash Mehrotra" notifications@github.com wrote:

@jaimefrio https://github.com/jaimefrio I would like to fix this issue.
Can you assign it to me ?


Reply to this email directly or view it on GitHub
#6530 (comment).

@yashmehrotra
Copy link
Contributor

@njsmith Cool. I'll start working on it. :)

charris added a commit that referenced this issue Oct 27, 2015
BUG: Fix partition and argpartition error for empty input. Closes #6530
charris pushed a commit to charris/numpy that referenced this issue Oct 27, 2015
@charris charris mentioned this issue Oct 27, 2015
tpn added a commit to pyparallel/numpy that referenced this issue Oct 30, 2015
* 'master' of https://github.com/numpy/numpy: (384 commits)
  BUG: fix MANIFEST.in for removal of a file in numpygh-8047.
  DOC: Release notes for Numpy 1.10.2.
  MAINT: remove useless files with outdated info from repo root and doc/.
  MAINT: fix mistake in doc upload rule
  TST: attempt to make test_load_refcount deterministic
  BUG: Fix for numpy#6569, allowing build_ext --inplace
  TST: Added regression test empty percentile, in ref to numpy#6530 and numpy#6553
  TST: Added tests for empty partition and argpartition
  BUG: revert view safety checks
  TST: Remove tests of view safety checks (see next commit)
  BUG: Revert some import * fixes in f2py.
  BUG: Fixed partition errors on empty input. Closes numpy#6530
  DOC: import "numpy for matlab users" from the wiki
  DOC: reorganize user guide a bit + import "tentative numpy tutorial" from wiki
  DOC: remove placeholders and incompleteness warnings
  MAINT: minor update to "make upload" doc build command.
  BUG: error in broadcast_arrays with as_strided array
  BUG: fix inner() by copying if needed to enforce contiguity
  DOC: clarify usage of 'argparse' return value.
  BUG: Make median work for empty arrays (issue numpy#6462)
  ...
jaimefrio pushed a commit to jaimefrio/numpy that referenced this issue Mar 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants