Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Use AxisError in more places #8843

Merged
merged 9 commits into from
Mar 29, 2017
Merged

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Mar 26, 2017

Behavioural changes:

  • Lots of code throws AxisError instead of IndexError or ValueError - but this is backwards-compatible, since AxisError derives from both (MAINT: Use the same exception for all bad axis requests #8584)
  • _ureduce detects duplicate arguments correctly - previously, np.median([1, 2], axis=(0,0)) would fail, but np.median([1, 2], axis=(-1,-1)) and np.median([1, 2], axis=(0,-1)) would be fine. Now all three raise ValueError.
  • np.gradient now accepts axis=[1,2] as well as axis=(1,2). This might not be desirable, but is consistent with the majority of other functions. Now that this all goes through one place, we could deprecate it in future.
  • np.core.numeric._validate_axis is slightly more public at np.core.numeric.normalize_axis_tuple, to complement np.core.multiarray.normalize_axis_index
  • Functions taking a single axis argument now all call operator.index, rather than just a random subset of them.

}

/* Invoke the AxisError constructor */
exc = PyObject_CallObject(AxisError_cls, args);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the string formatting in C seemed like way too much effort for little gain - no need for performance once something starts going wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I should just use PyObject_CallFunction here

assert_raises(ValueError, np.percentile, d, axis=(1, 1), q=25)
assert_raises(ValueError, np.percentile, d, axis=(-1, -1), q=25)
assert_raises(ValueError, np.percentile, d, axis=(3, -1), q=25)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test the better duplicate-axis detection

assert_raises(ValueError, gradient, x, axis=3)
assert_raises(ValueError, gradient, x, axis=-3)
assert_raises(TypeError, gradient, x, axis=[1,])
assert_raises(np.AxisError, gradient, x, axis=3)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These exceptions were found by temporarily removing the base classes from AxisError, so that tests for ValueError and IndexError would not catch them.

@@ -4764,6 +4748,8 @@ def delete(arr, obj, axis=None):
else:
return arr.copy(order=arrorder)

axis = normalize_axis_index(axis, ndim)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guards against IndexError

np.moveaxis, x, -4, 0)
assert_raises_regex(np.AxisError, 'invalid axis .* `destination`',
assert_raises_regex(np.AxisError, 'destination.*out of bounds',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message comparison:

ValueError: invalid axis for this array in `destination` argument
AxisError: destination: axis 5 is out of bounds for array of dimension 3

assert_raises(TypeError, gradient, x, axis=[1,])
assert_raises(np.AxisError, gradient, x, axis=3)
assert_raises(np.AxisError, gradient, x, axis=-3)
# assert_raises(TypeError, gradient, x, axis=[1,])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to error, and ideally it would still error for consistency with ufunc.reduce. But all the functions using _ureduce do not error in this situation, so we've already failed at consistency.

Perhaps we should add a deprecation warning (in a future commit) about passing lists of axes?

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! only rather minor comments.


# do the string formatting here, to save work in the C code
else:
msg = "axis {} is out of bounds for array of dimension {}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, but I find the following much more readable

            msg = ("axis {} is out of bounds for array of dimension {}"
                   .format(axis, ndim))

def _validate_axis(axis, ndim, argname):
def normalize_axis_tuple(axis, ndim, argname=None, allow_duplicate=False):
"""
Normalizes a tuple of axes, such that they are valid positive indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a single line as "title" (with a full stop at the end)

A prefix to put before the error message, typically the name of the
argument
allow_duplicate : bool, optional
If True, the default, disallow an axis from being specified twice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If False!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops!

Parameters
----------
axis : int, tuple of int, or list of int
The un-normalized index or indices of the axis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation should have a full stop at the end (here and below); I think this actually matters for sphinx rendering...

axis = [normalize_axis_index(ax, ndim, argname) for ax in axis]
if not allow_duplicate and len(set(axis)) != len(axis):
if argname:
raise ValueError('repeated axis in `%s` argument' % argname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well change to new-style formatting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sensible

@@ -1544,17 +1542,59 @@ def rollaxis(a, axis, start=0):
return a.transpose(axes)


def _validate_axis(axis, ndim, argname):
def normalize_axis_tuple(axis, ndim, argname=None, allow_duplicate=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly wonder -- mostly because of #8819 -- whether it would be useful to have this in C...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whether it would be useful to have this in C

Yes, I think it would - and I think we even already have it for ufunc.reduce. I think this PR is a good stepping stone to implementing that, since it pulls all the python codepaths through one point for parsing axes - that point can now be moved to C later.

Copy link
Member Author

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, all fixed up

@@ -631,4 +631,17 @@ class TooHardError(RuntimeError):
pass

class AxisError(ValueError, IndexError):
pass
""" Axis supplied was invalid. """
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now with a docstring, which Ipython shows when you type an exception name. This is in a similar style to the builtin ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in numpy/_globals.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And imported into numpy/__init__.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it here in #8584 because it needs to be imported from C code, and that's what _internals seems to be used for. Is it safe to load _globals.py from C?

AxisError is already visible at the global scope as np.AxisError, as it goes through the __all__ chain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can import from practically anything. We also import from multiarray itself into C code. I'd import this from '"numpy"', The _globals is for singletons that may also be used by python code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should TooHardError move there too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now, numeric.py has from ._internal import TooHardError, AxisError

core/__init__.py also contains from . import _internal # for freeze programs, for some reason

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but it can be done in a separate PR as the imports also need to be changed. Maybe _globals.py should be renamed _global_singletons or _global_exceptions ;) There are other errors scattered about that maybe we should move at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe _numpy_exceptions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to do both in a separate PR? AxisError being in that file is not new to this PR.

(either way, fix coming for something else, so don't merge yet!)

@eric-wieser eric-wieser added this to the 1.13.0 release milestone Mar 27, 2017
@homu
Copy link
Contributor

homu commented Mar 28, 2017

☔ The latest upstream changes (presumably #8861) made this pull request unmergeable. Please resolve the merge conflicts.

Moving the string formatting to python makes this a lot easier
They were only separate before due to a requirement of a better error message
Previously we did not, as this wouldn't say which axis argument was the cause
of the problem.
This also means that its axis argument invokes operator.index like
others do.

_validate_axis currently accepts lists of axes, which is a bug.
Interestingly np.roll allows axes to be repeated.
This fixes an omission where duplicate axes would only be detected when positive
*axis, ndim);
/* Invoke the AxisError constructor */
exc = PyObject_CallFunction(AxisError_cls, "iiO",
*axis, ndim, msg_prefix);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the way I should have written this the first time...

@eric-wieser
Copy link
Member Author

eric-wieser commented Mar 28, 2017

Ok, this is rebased over merge conflicts, using more appropriate bits of the python API to construct the exception, and passes.

As @charris points out, this exception might be in the wrong place. But that:

  • Was already true before this PR
  • Is not the only exception with this problem

So should probably be moved to another issue/PR?

argname)
if len(set(axis)) != len(axis):
raise ValueError('repeated axis in `%s` argument' % argname)
axis = [normalize_axis_index(ax, ndim, argname) for ax in axis]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docstring, you write that you'll return a tuple, but you actually construct a list. Just replace [...] with tuple(...)` (also in the try/except above, for completeness.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - fixed to return a tuple. This is actually pretty important, given that things like sum require axis to be a tuple, so would previously not work for any kind of normalize-and-forward usage.

Thanks for the catch!

static NPY_INLINE int
check_and_adjust_axis(int *axis, int ndim)
{
return check_and_adjust_axis_msg(axis, ndim, Py_None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to Py_INCREF(Py_None) before passing it on. Since you then would have to Py_DECREF it as well, I suggest passing in NULL here, and doing the translation to Py_None above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_None is garantueed to exist and as long as it doesn't escape to the user and it isn't decref'd everything should be fine. PyObject_CallFunction increments it's reference count for the exception so what the user gets is a new reference.

Using NULL seems cleaner though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need to here, because check_and_adjust_axis_msg takes a borrowed reference. It doesn't actually need a refcount at all until it makes it to the python code, and PyObject_CallFunction() increments the refcount for me, AFAICT.

It took a lot of iteration to get this working, and attempts at doing what you describe only caused segfaults

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using NULL seems cleaner though.

Why? We already need to handle Py_None being passed in from user code. Furthermore, when we get Py_None from usercode, it's a borrowed reference - so like here, it doesn't need an incref unless it escapes to python code.

The only thing that's unclear to me is whether I need to INCREF msg_prefix on the line immediately before PyObject_CallFunction. I think the answer is no, (it was no when I used Py_Buildvalue here previously), but that's not clear from the documentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wieser How would it cause segfaults? You could just initialize it as NULL and when you call the "exception" you do PyObject_CallFunction(AxisError_cls, "iiO", *axis, ndim, msg_prefix?msg_prefix:Py_None);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Segfaults came from other attempts at refcounting. I wasn't aiming to be passing null originally. Sorry if that was unclear. You're correct, what you have would add support for passing null, and would work correctly assuming that this works correctly - but I see no purpose in adding that as a special case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the reason why it's generally not a good idea to pass borrowed references around. But I didn't check the numpy source code, maybe it's common here. In that case please ignore my comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my argument would be that if I manually inlined the contents of check_and_adjust_axis_msg into normalize_axis_index, you would not expect me to add any additional Py_INCREFs

I'm not sure what you mean there - can you elaborate on exactly why it is a bad idea? (and perhaps link me to some recommendations for working with python refcounting)

Counter-point to it generally not being a good idea - every function in a PyMethodDef object is passed borrowed references - so python seems to favor it internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this goes a bit off-topic. I still think the code is correct wrt reference counting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some confusion added here by my first comment being written before yours appeared (and not directed at you), I think :).

int axis;
int ndim;
PyObject *msg_prefix = Py_None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, again, I think you cannot just use Py_None. Also solve by initlializing to NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely correct. PyArg_ParseTuple does not INCREF O arguments, so this line is exactly equivalent to passing None from python.

The question remains if I should INCREF before calling check_and_adjust_axis_msg, but let's discuss that above.

@mhvk
Copy link
Contributor

mhvk commented Mar 28, 2017

@eric-wieser - I think you convinced me that your approach works ref-counting wise, but I'm still not quite convinced it is the handiest. My suggestion would be to simply pass string around in the C facing routines, i.e., make check_and_adjust_axis_msg more easily useful also from within C and create a corresponding object only when setting the AxisError. I realise this implies possibly getting the string from a Python object, and then recreating the object for the error message, but think this is OK.

Anyway, it is not a big deal.

@eric-wieser
Copy link
Member Author

eric-wieser commented Mar 28, 2017

would be to simply pass string around in the C facing routines, i.e., make check_and_adjust_axis_msg more easily useful also from within C

Seems reasonable. Although I didn't find myself needing any extra info in the C code, since everything there seems to be working with a single axis argument. So I think we do that if we need it, and otherwise don't bother.

and create a corresponding object only when setting the AxisError

Alternatively, the C code could just have a static pre-allocated error-message string

I think you convinced me that your approach works ref-counting wise

You were right to question it - it's something I'm not too experienced in, so the default there should be to assume I'm wrong!

@juliantaylor
Copy link
Contributor

looks good to me

@mhvk
Copy link
Contributor

mhvk commented Mar 29, 2017

Seems reasonable. Although I didn't find myself needing any extra info in the C code, since everything there seems to be working with a single axis argument. So I think we do that if we need it, and otherwise don't bother.

The only issue with this is that if you leave this in place, if anyone in the C code is going to use check_and_adjust_axis_msg, i.e., they want to make the error that might get raised more descriptive, they'll set up a PyObject and then it just becomes harder to undo. So, personally, I'd prefer to get it right now (but I realise I may be thinking too far ahead, so if you don't agree, that's OK, and we can just merge as is).

@eric-wieser
Copy link
Member Author

I think that if anyone doing that themselves finds they do want to be able to pass a c string, they'll be in a more capable decision to decide if they should just do it in place. We already create python strings in lots of places for error messages.

Besides, to support this, I think we'd need a ..._msg_or_cmsg(axis, ndim, PyObject *msg, char* cmsg) which could support both. This seems like a messy addition when it is not even needed yet.

@mhvk
Copy link
Contributor

mhvk commented Mar 29, 2017

OK, fair enough. Since both @juliantaylor and I approved this, let's just go ahead and merge.

@mhvk mhvk merged commit ab9b15e into numpy:master Mar 29, 2017
@eric-wieser
Copy link
Member Author

Thanks @mhvk!

@eric-wieser eric-wieser deleted the more-AxisError branch March 29, 2017 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants