Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Use the same exception for all bad axis requests #8584

Merged
merged 5 commits into from
Feb 21, 2017

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Feb 8, 2017

Lots of functions accept an axis argument that can be negative, implying wraparound. The same few lines of code are repeated all over the place to implement this.

When the axis is just plain invalid, an error is raised. However:

  • Some things raise IndexError, others raise ValueError. This PR somehwat arbitrarily assumes that IndexError is the right type of exception to raise.
  • Some messages are more useful than others
  • Some functions don't check for axis < -ndim

It seems it would be better to put all this in one place.

This was previously filed as an issue at #8583. Function name and location inspired by check_and_adjust_index. check_and_adjust_index(...) < 0 is used to match check_and_adjust_axis(...) < 0.

This was previously discussed on the mailing list by @charris and @seberg

@eric-wieser eric-wieser force-pushed the resolve_axis branch 2 times, most recently from 730f87c to 7ec4d8f Compare February 8, 2017 18:48
* 0 <= *axis < ndim, and returns 0.
*/
static NPY_INLINE int
check_and_adjust_axis(int *axis, int ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust seems to imply something always happens; how about check_and_normalize_axis. Or just normalize_axis?

PyErr_Format(PyExc_ValueError,
"'axis' entry %d is out of bounds [-%d, %d)",
axis_orig, ndim, ndim);
if (check_and_adjust_axis(&axis, ndim) < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below: check for status using != 0? Or just omit the comparison altogether (as above in error_converting(axis)

axis_orig, ndim, ndim);
return NPY_FAIL;
if (check_and_adjust_axis(&axis, ndim) < 0) {
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't one still return NPY_FAIL here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

PyErr_Format(PyExc_ValueError, "axis(=%d) out of bounds", axis_orig);
return -1;
if (check_and_adjust_axis(&axis, n) < 0) {
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: I think the return argument shouldn't change, i.e., return -1 here?

PyErr_Format(PyExc_ValueError, "axis(=%d) out of bounds", axis_orig);
return -1;
if (check_and_adjust_axis(&axis, n) < 0) {
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

"axis(=%d) out of bounds", axis);
goto fail;
if (check_and_adjust_axis(&axis, nd) < 0) {
return NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely should be goto fail here, since several DECREFs need to be done.

@@ -4109,6 +4103,24 @@ array_may_share_memory(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *
return array_shares_memory_impl(args, kwds, NPY_MAY_SHARE_BOUNDS, 0);
}

static PyObject *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this is for follow up in python code? If so, I think it should be part of a follow-up PR, not be done here (if only to give a chance to discuss whether it is best to go through C for something like this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the python stuff to this PR now...

@mhvk
Copy link
Contributor

mhvk commented Feb 8, 2017

This makes sense to me, but note that I think you'll have to ensure the code keeps behaving the same in case of failure. As you will have seen, I suggest postponing the method to a possible python follow-up PR.

There are also a few more places where the code could be used:

  1. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L1832
  2. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/shape.c#L708
  3. https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L3983

@mhvk
Copy link
Contributor

mhvk commented Feb 8, 2017

Actually, looking more closely at anything that includes axis.*<.*0, I find there is already a method which would seem to do what you want... check_and_adjust_index -- https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/common.h#L110

@mhvk
Copy link
Contributor

mhvk commented Feb 8, 2017

p.s. Well, the function is very similar but doesn't exactly do the same thing, since that one checks an index rather than an axis, but with the same logic. I'm not sure I understand the threading stuff...

@mhvk
Copy link
Contributor

mhvk commented Feb 8, 2017

p.p.s. And I really should learn to read commit messages -- you of course refer to check_and_adjust_index already...

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 8, 2017

Wow, that was a clumsy bunch of mistakes. Thanks for catching me out!

you of course refer to check_and_adjust_index already...

Yep - I've addressed your concerns in an edit to the original post.

There are also a few more places where the code could be used:

Great, thanks for spotting those - I'll add a fixup commit for those shortly

you'll have to ensure the code keeps behaving the same in case of failure.

Meaning what? Produce the same error message?

@eric-wieser eric-wieser changed the title MAINT: Use the same error message for most C code handling negative axes MAINT: Use the same exception for all bad axis requests Feb 8, 2017
@mhvk
Copy link
Contributor

mhvk commented Feb 8, 2017

you'll have to ensure the code keeps behaving the same in case of failure.

Meaning what? Produce the same error message?

No, just that what the code returns (error code or otherwise) remains the same (see in-line comments)

@eric-wieser eric-wieser force-pushed the resolve_axis branch 2 times, most recently from a9c76e6 to 353107a Compare February 9, 2017 00:12
@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

I like how much the code cleans up using this! A few things I'm uncertain about:

  1. For the python side, a pure python implementation would have the advantage that it is easier to see what it does (this is partially why I thought it might be good to do it in a separate PR);
  2. For the python side, I'd prefer a name that makes more obvious what it does, e.g., normalize_axis);
  3. Am not quite sure what the appropriate error is; people may count on getting ValueError, though I would agree that IndexError is more logical.

@eric-wieser
Copy link
Member Author

@mhvk

  1. I don't like the idea of implementing the same code in both C and python, leaving the only alternative as having the C code call python - but that would be a performance hit
  2. I've deliberately not exposed it publically at np.resolve_axis because I was not sure of the name - I'd welcome some more input - I don't have a good argument for one or the other
  3. I'd be in favor of making a special AxisError type, because that would make the tests more explicit. Of course, the question then becomes "should this subclass ValueError, IndexError, or just Exception"...

@charris
Copy link
Member

charris commented Feb 9, 2017

Grrr, github is now notifying me of commits without referencing a PR. It's an "improvement" too far.

@eric-wieser
Copy link
Member Author

@charris : that's my fault for doing some url mangling in order to comment on a commit and not a diff line

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

Hmm, possibly one needs class AxisError(IndexError, ValueError)...

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

More seriously: which MaskedArray test is actually broken by the change from ValueError to IndexError -- I tried looking at the travis results, but only noticed 2 failures, which got me confused (maybe I just looked badly).

@eric-wieser
Copy link
Member Author

@mhvk: Look in the test results before the commit that claims to fix them ;)

@@ -1527,15 +1527,12 @@ def rollaxis(a, axis, start=0):

"""
n = a.ndim
if axis < 0:
axis += n
axis = resolve_axis(axis, n)
if start < 0:
start += n
Copy link
Member Author

@eric-wieser eric-wieser Feb 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has always struck me as a bug - IMO, rollaxis(np.zeros(3,4,5), 0, -1).shape == (4, 5, 3), ie -1 should mean "move this axis to the end".

This would be consistent with the way that np.stack(..., axis=-1) means insert an axis at the end. The fact that this is the only place that can't use resolve_axis sort of backs up the fact that this is non-intuitive and breaks the pattern.

I would like this to be written start = resolve_axis(axis, n+1), but that would be a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have been caught by that too, realising only slowly that I had to put array.ndim if I wanted something moved to the last axis. I think this is partially why we now have moveaxis -- in any case, I think it is fine to leave this one untouched!

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

I did... E.g., https://travis-ci.org/numpy/numpy/jobs/199817411 (2 errors only). But now I look at the appveyor results, I see that there are also 3 errors in test_recfunctions. Most bizarre. Maybe travis deletes jobs sooner? Or the github link is incorrect?

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 9, 2017

This appveyor page shows the bug

travis now fails due to warnings.

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

@eric-wieser - with #8590, you should not need to fix ma/core.py any more.

@eric-wieser
Copy link
Member Author

you should not need to fix ma/core.py any more.

I still need to swap ValueError for IndexError for the tests to pass. But yes, the change will be overwritten when you merge that PR

@shoyer
Copy link
Member

shoyer commented Feb 9, 2017

I don't think IndexError is quite right here, because that exception is quite specifically for subscripting a sequence. I would lean towards class AxisError(ValueError), or maybe class AxisError(IndexError, ValueError) if we're concerned about backwards compatibility.

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 9, 2017

because that exception is quite specifically for subscripting a sequence

Are we not essentially subscripting arr.shape[axis] here though? I suspect that was the argument being used in the past for IndexError

What would be an appropriate name/prefix for AxisError in the C code, and which file would it be best defined in?

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

I think I'd also pick IndexError over ValueError as we're indexing the array's dimensions. But I worry about breaking code in either case... That said, the fact that so little broke in numpy by doing this is a good sign (the MaskedArray one I'd chalk up to pure coincidence).

@eric-wieser
Copy link
Member Author

Ok, renamed to np.core.multiarray.normalize_axis_index, and a docstring added.

@seberg
Copy link
Member

seberg commented Feb 21, 2017

I did not follow this much, but I have a believe that someone will find any weird function and think its fine to use it. So, unless we are fine with people possibly starting to use it, adding an underscore to underscore ;) that its considered private just seems like a save bet.

@eric-wieser
Copy link
Member Author

@seberg: I'd argue that this is something we actually want people to use - for anyone writing their own function taking an axis argument that can't just forward it, this is a simple way to make it behave as people expect.

@seberg
Copy link
Member

seberg commented Feb 21, 2017

Ok, sounds fine to me, it was more of an "if we are not sure its stable API" thing, to be honest.

@mhvk
Copy link
Contributor

mhvk commented Feb 21, 2017

I think I agree with @eric-wieser - it is good if this gets used (but would not quite yet put it at the top level)

@mhvk mhvk added this to the 1.13.0 release milestone Feb 21, 2017
@mhvk
Copy link
Contributor

mhvk commented Feb 21, 2017

Thanks for adding the docstring; I think all is set now, so will merge.

@mhvk mhvk merged commit 2aabeaf into numpy:master Feb 21, 2017
@eric-wieser
Copy link
Member Author

Should this have had a release note, or will that be handled during the release stage?

@mhvk
Copy link
Contributor

mhvk commented Feb 21, 2017

Hmm, yes, probably big enough to have a release note... Now best to do separately?

@charris
Copy link
Member

charris commented Feb 22, 2017

Yes, should have a release note as it is a behavior change. The type warning for the check_and_adjust_axis call also needs a fix that would be good to get in before it is forgotten.

@charris
Copy link
Member

charris commented Feb 22, 2017

@mhvk It's in, so will need a separate PR for the release note.

@pv
Copy link
Member

pv commented Feb 23, 2017

This appears to cause a crash in scipy, gh-8671

if (axis < 0 || axis >= PyArray_NDIM(ap)) {
PyErr_SetString(PyExc_ValueError,
"invalid axis for this array");
if (check_and_adjust_axis(&axis, PyArray_NDIM(ap)) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis is npy_intp, not int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe useful to double check this everywhere.

Copy link
Member Author

@eric-wieser eric-wieser Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which do you think I should accept in check_and_adjust_axis? Sometimes axis can be int, sometimes it is intp. Obviously I can introduce some locals for conversion, but I still need to choose which is right

@charris
Copy link
Member

charris commented Feb 23, 2017

@eric-wieser Time to get going on that fix.

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 23, 2017

@charris: Will do that tomorrow now #8672

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants