MAINT: Use AxisError in more places #8843

eric-wieser · 2017-03-26T10:42:38Z

Behavioural changes:

Lots of code throws AxisError instead of IndexError or ValueError - but this is backwards-compatible, since AxisError derives from both (MAINT: Use the same exception for all bad axis requests #8584)
_ureduce detects duplicate arguments correctly - previously, np.median([1, 2], axis=(0,0)) would fail, but np.median([1, 2], axis=(-1,-1)) and np.median([1, 2], axis=(0,-1)) would be fine. Now all three raise ValueError.
np.gradient now accepts axis=[1,2] as well as axis=(1,2). This might not be desirable, but is consistent with the majority of other functions. Now that this all goes through one place, we could deprecate it in future.
np.core.numeric._validate_axis is slightly more public at np.core.numeric.normalize_axis_tuple, to complement np.core.multiarray.normalize_axis_index
Functions taking a single axis argument now all call operator.index, rather than just a random subset of them.

eric-wieser · 2017-03-26T10:45:41Z

numpy/core/src/multiarray/common.h

+        }
+
+        /* Invoke the AxisError constructor */
+        exc = PyObject_CallObject(AxisError_cls, args);


Implementing the string formatting in C seemed like way too much effort for little gain - no need for performance once something starts going wrong

Wait I should just use PyObject_CallFunction here

eric-wieser · 2017-03-26T10:46:53Z

numpy/lib/tests/test_function_base.py

        assert_raises(ValueError, np.percentile, d, axis=(1, 1), q=25)
+        assert_raises(ValueError, np.percentile, d, axis=(-1, -1), q=25)
+        assert_raises(ValueError, np.percentile, d, axis=(3, -1), q=25)


These test the better duplicate-axis detection

eric-wieser · 2017-03-26T10:47:33Z

numpy/lib/tests/test_function_base.py

-        assert_raises(ValueError, gradient, x, axis=3)
-        assert_raises(ValueError, gradient, x, axis=-3)
-        assert_raises(TypeError, gradient, x, axis=[1,])
+        assert_raises(np.AxisError, gradient, x, axis=3)


These exceptions were found by temporarily removing the base classes from AxisError, so that tests for ValueError and IndexError would not catch them.

eric-wieser · 2017-03-26T10:47:51Z

numpy/lib/function_base.py

@@ -4764,6 +4748,8 @@ def delete(arr, obj, axis=None):
        else:
            return arr.copy(order=arrorder)

+    axis = normalize_axis_index(axis, ndim)


This guards against IndexError

eric-wieser · 2017-03-26T10:50:39Z

numpy/core/tests/test_numeric.py

                            np.moveaxis, x, -4, 0)
-        assert_raises_regex(np.AxisError, 'invalid axis .* `destination`',
+        assert_raises_regex(np.AxisError, 'destination.*out of bounds',


Error message comparison:

ValueError: invalid axis for this array in `destination` argument AxisError: destination: axis 5 is out of bounds for array of dimension 3

eric-wieser · 2017-03-26T14:45:49Z

numpy/lib/tests/test_function_base.py

-        assert_raises(TypeError, gradient, x, axis=[1,])
+        assert_raises(np.AxisError, gradient, x, axis=3)
+        assert_raises(np.AxisError, gradient, x, axis=-3)
+        # assert_raises(TypeError, gradient, x, axis=[1,])


This used to error, and ideally it would still error for consistency with ufunc.reduce. But all the functions using _ureduce do not error in this situation, so we've already failed at consistency.

Perhaps we should add a deprecation warning (in a future commit) about passing lists of axes?

mhvk

This looks great! only rather minor comments.

mhvk · 2017-03-26T21:14:51Z

numpy/core/_internal.py

+
+        # do the string formatting here, to save work in the C code
+        else:
+            msg = "axis {} is out of bounds for array of dimension {}".format(


Nitpicking, but I find the following much more readable

msg = ("axis {} is out of bounds for array of dimension {}" .format(axis, ndim))

mhvk · 2017-03-26T21:16:04Z

numpy/core/numeric.py

-def _validate_axis(axis, ndim, argname):
+def normalize_axis_tuple(axis, ndim, argname=None, allow_duplicate=False):
+    """
+    Normalizes a tuple of axes, such that they are  valid positive indices


Should have a single line as "title" (with a full stop at the end)

mhvk · 2017-03-26T21:16:43Z

numpy/core/numeric.py

+        A prefix to put before the error message, typically the name of the
+        argument
+    allow_duplicate : bool, optional
+        If True, the default, disallow an axis from being specified twice


If False!

mhvk · 2017-03-26T21:18:28Z

numpy/core/numeric.py

+    Parameters
+    ----------
+    axis : int, tuple of int, or list of int
+        The un-normalized index or indices of the axis.


The explanation should have a full stop at the end (here and below); I think this actually matters for sphinx rendering...

mhvk · 2017-03-26T21:19:48Z

numpy/core/numeric.py

+    axis = [normalize_axis_index(ax, ndim, argname) for ax in axis]
+    if not allow_duplicate and len(set(axis)) != len(axis):
+        if argname:
+            raise ValueError('repeated axis in `%s` argument' % argname)


May as well change to new-style formatting.

mhvk · 2017-03-27T00:08:04Z

numpy/core/numeric.py

@@ -1544,17 +1542,59 @@ def rollaxis(a, axis, start=0):
    return a.transpose(axes)


-def _validate_axis(axis, ndim, argname):
+def normalize_axis_tuple(axis, ndim, argname=None, allow_duplicate=False):


I slightly wonder -- mostly because of #8819 -- whether it would be useful to have this in C...

whether it would be useful to have this in C

Yes, I think it would - and I think we even already have it for ufunc.reduce. I think this PR is a good stepping stone to implementing that, since it pulls all the python codepaths through one point for parsing axes - that point can now be moved to C later.

eric-wieser

Ok, all fixed up

eric-wieser · 2017-03-27T01:27:22Z

numpy/core/_internal.py

@@ -631,4 +631,17 @@ class TooHardError(RuntimeError):
    pass

 class AxisError(ValueError, IndexError):
-    pass
+    """ Axis supplied was invalid. """


Now with a docstring, which Ipython shows when you type an exception name. This is in a similar style to the builtin ones.

This should be in numpy/_globals.py.

And imported into numpy/__init__.py.

I put it here in #8584 because it needs to be imported from C code, and that's what _internals seems to be used for. Is it safe to load _globals.py from C?

AxisError is already visible at the global scope as np.AxisError, as it goes through the __all__ chain.

You can import from practically anything. We also import from multiarray itself into C code. I'd import this from '"numpy"', The _globals is for singletons that may also be used by python code.

Should TooHardError move there too?

right now, numeric.py has from ._internal import TooHardError, AxisError

core/__init__.py also contains from . import _internal # for freeze programs, for some reason

Probably, but it can be done in a separate PR as the imports also need to be changed. Maybe _globals.py should be renamed _global_singletons or _global_exceptions ;) There are other errors scattered about that maybe we should move at some point.

Or maybe _numpy_exceptions

Do you want me to do both in a separate PR? AxisError being in that file is not new to this PR.

(either way, fix coming for something else, so don't merge yet!)

homu · 2017-03-28T15:03:46Z

☔ The latest upstream changes (presumably #8861) made this pull request unmergeable. Please resolve the merge conflicts.

Moving the string formatting to python makes this a lot easier

They were only separate before due to a requirement of a better error message

Previously we did not, as this wouldn't say which axis argument was the cause of the problem.

This also means that its axis argument invokes operator.index like others do. _validate_axis currently accepts lists of axes, which is a bug.

Interestingly np.roll allows axes to be repeated.

This fixes an omission where duplicate axes would only be detected when positive

eric-wieser · 2017-03-28T20:23:01Z

numpy/core/src/multiarray/common.h

-                     *axis, ndim);
+        /* Invoke the AxisError constructor */
+        exc = PyObject_CallFunction(AxisError_cls, "iiO",
+                                    *axis, ndim, msg_prefix);


This was the way I should have written this the first time...

eric-wieser · 2017-03-28T20:24:36Z

Ok, this is rebased over merge conflicts, using more appropriate bits of the python API to construct the exception, and passes.

As @charris points out, this exception might be in the wrong place. But that:

Was already true before this PR
Is not the only exception with this problem

So should probably be moved to another issue/PR?

mhvk · 2017-03-28T20:34:05Z

numpy/core/numeric.py

-                         argname)
-    if len(set(axis)) != len(axis):
-        raise ValueError('repeated axis in `%s` argument' % argname)
+    axis = [normalize_axis_index(ax, ndim, argname) for ax in axis]


In the docstring, you write that you'll return a tuple, but you actually construct a list. Just replace [...] with tuple(...)` (also in the try/except above, for completeness.)

Yep, that's a bug.

Done - fixed to return a tuple. This is actually pretty important, given that things like sum require axis to be a tuple, so would previously not work for any kind of normalize-and-forward usage.

Thanks for the catch!

mhvk · 2017-03-28T20:37:55Z

numpy/core/src/multiarray/common.h

+static NPY_INLINE int
+check_and_adjust_axis(int *axis, int ndim)
+{
+    return check_and_adjust_axis_msg(axis, ndim, Py_None);


I think you have to Py_INCREF(Py_None) before passing it on. Since you then would have to Py_DECREF it as well, I suggest passing in NULL here, and doing the translation to Py_None above.

Py_None is garantueed to exist and as long as it doesn't escape to the user and it isn't decref'd everything should be fine. PyObject_CallFunction increments it's reference count for the exception so what the user gets is a new reference.

Using NULL seems cleaner though.

I don't need to here, because check_and_adjust_axis_msg takes a borrowed reference. It doesn't actually need a refcount at all until it makes it to the python code, and PyObject_CallFunction() increments the refcount for me, AFAICT.

It took a lot of iteration to get this working, and attempts at doing what you describe only caused segfaults

Using NULL seems cleaner though.

Why? We already need to handle Py_None being passed in from user code. Furthermore, when we get Py_None from usercode, it's a borrowed reference - so like here, it doesn't need an incref unless it escapes to python code.

The only thing that's unclear to me is whether I need to INCREF msg_prefix on the line immediately before PyObject_CallFunction. I think the answer is no, (it was no when I used Py_Buildvalue here previously), but that's not clear from the documentation

@eric-wieser How would it cause segfaults? You could just initialize it as NULL and when you call the "exception" you do PyObject_CallFunction(AxisError_cls, "iiO", *axis, ndim, msg_prefix?msg_prefix:Py_None);

Segfaults came from other attempts at refcounting. I wasn't aiming to be passing null originally. Sorry if that was unclear. You're correct, what you have would add support for passing null, and would work correctly assuming that this works correctly - but I see no purpose in adding that as a special case

Yeah, that's the reason why it's generally not a good idea to pass borrowed references around. But I didn't check the numpy source code, maybe it's common here. In that case please ignore my comments.

I guess my argument would be that if I manually inlined the contents of check_and_adjust_axis_msg into normalize_axis_index, you would not expect me to add any additional Py_INCREFs

I'm not sure what you mean there - can you elaborate on exactly why it is a bad idea? (and perhaps link me to some recommendations for working with python refcounting)

Counter-point to it generally not being a good idea - every function in a PyMethodDef object is passed borrowed references - so python seems to favor it internally.

ok, this goes a bit off-topic. I still think the code is correct wrt reference counting.

Some confusion added here by my first comment being written before yours appeared (and not directed at you), I think :).

mhvk · 2017-03-28T20:38:55Z

numpy/core/src/multiarray/multiarraymodule.c

    int axis;
    int ndim;
+    PyObject *msg_prefix = Py_None;


Here, again, I think you cannot just use Py_None. Also solve by initlializing to NULL?

This is definitely correct. PyArg_ParseTuple does not INCREF O arguments, so this line is exactly equivalent to passing None from python.

The question remains if I should INCREF before calling check_and_adjust_axis_msg, but let's discuss that above.

mhvk · 2017-03-28T23:37:34Z

@eric-wieser - I think you convinced me that your approach works ref-counting wise, but I'm still not quite convinced it is the handiest. My suggestion would be to simply pass string around in the C facing routines, i.e., make check_and_adjust_axis_msg more easily useful also from within C and create a corresponding object only when setting the AxisError. I realise this implies possibly getting the string from a Python object, and then recreating the object for the error message, but think this is OK.

Anyway, it is not a big deal.

eric-wieser · 2017-03-28T23:40:09Z

would be to simply pass string around in the C facing routines, i.e., make check_and_adjust_axis_msg more easily useful also from within C

Seems reasonable. Although I didn't find myself needing any extra info in the C code, since everything there seems to be working with a single axis argument. So I think we do that if we need it, and otherwise don't bother.

and create a corresponding object only when setting the AxisError

Alternatively, the C code could just have a static pre-allocated error-message string

I think you convinced me that your approach works ref-counting wise

You were right to question it - it's something I'm not too experienced in, so the default there should be to assume I'm wrong!

juliantaylor · 2017-03-29T12:43:31Z

looks good to me

mhvk · 2017-03-29T13:33:50Z

Seems reasonable. Although I didn't find myself needing any extra info in the C code, since everything there seems to be working with a single axis argument. So I think we do that if we need it, and otherwise don't bother.

The only issue with this is that if you leave this in place, if anyone in the C code is going to use check_and_adjust_axis_msg, i.e., they want to make the error that might get raised more descriptive, they'll set up a PyObject and then it just becomes harder to undo. So, personally, I'd prefer to get it right now (but I realise I may be thinking too far ahead, so if you don't agree, that's OK, and we can just merge as is).

eric-wieser · 2017-03-29T14:06:33Z

I think that if anyone doing that themselves finds they do want to be able to pass a c string, they'll be in a more capable decision to decide if they should just do it in place. We already create python strings in lots of places for error messages.

Besides, to support this, I think we'd need a ..._msg_or_cmsg(axis, ndim, PyObject *msg, char* cmsg) which could support both. This seems like a messy addition when it is not even needed yet.

mhvk · 2017-03-29T14:44:11Z

OK, fair enough. Since both @juliantaylor and I approved this, let's just go ahead and merge.

eric-wieser · 2017-03-29T14:50:57Z

Thanks @mhvk!

eric-wieser added 03 - Maintenance component: numpy._core component: numpy.lib component: numpy.ma masked arrays labels Mar 26, 2017

eric-wieser commented Mar 26, 2017

View reviewed changes

eric-wieser force-pushed the more-AxisError branch from c850df2 to 2847ada Compare March 26, 2017 10:56

eric-wieser commented Mar 26, 2017

View reviewed changes

mhvk reviewed Mar 26, 2017

View reviewed changes

mhvk reviewed Mar 27, 2017

View reviewed changes

eric-wieser force-pushed the more-AxisError branch from 2847ada to cb8f713 Compare March 27, 2017 01:19

eric-wieser commented Mar 27, 2017

View reviewed changes

eric-wieser added this to the 1.13.0 release milestone Mar 27, 2017

eric-wieser force-pushed the more-AxisError branch from cb8f713 to 17eec08 Compare March 28, 2017 15:15

eric-wieser closed this Mar 28, 2017

eric-wieser reopened this Mar 28, 2017

eric-wieser added 8 commits March 28, 2017 20:44

ENH: Allow AxisErrors to have a prefix on the message

7bf9115

Moving the string formatting to python makes this a lot easier

MAINT: have _validate_axis delegate to normalize_axis_index

4feb97e

They were only separate before due to a requirement of a better error message

MAINT: Use normalize_axis_index in cross

09d4d35

Previously we did not, as this wouldn't say which axis argument was the cause of the problem.

MAINT: Remove manual error message

cd01f01

MAINT: Reuse _validate_axis in np.gradient

efa1bd2

This also means that its axis argument invokes operator.index like others do. _validate_axis currently accepts lists of axes, which is a bug.

MAINT: use normalize_axis_index in np.roll

b6850e9

Interestingly np.roll allows axes to be repeated.

MAINT: Reuse _validate_axis in ma

539d4f7

MAINT: Use _validate_axis inside _ureduce

e3ed705

This fixes an omission where duplicate axes would only be detected when positive

eric-wieser force-pushed the more-AxisError branch from 17eec08 to 23e4cce Compare March 28, 2017 19:46

eric-wieser commented Mar 28, 2017

View reviewed changes

mhvk requested changes Mar 28, 2017

View reviewed changes

MAINT: Rename _validate_axis, and document it

17466ad

eric-wieser force-pushed the more-AxisError branch from 23e4cce to 17466ad Compare March 28, 2017 21:23

mhvk approved these changes Mar 28, 2017

View reviewed changes

mhvk merged commit ab9b15e into numpy:master Mar 29, 2017

eric-wieser deleted the more-AxisError branch March 29, 2017 14:51

MAINT: Use AxisError in more places #8843

MAINT: Use AxisError in more places #8843

Conversation

eric-wieser commented Mar 26, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Mar 27, 2017 • edited

Choose a reason for hiding this comment

eric-wieser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Mar 28, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Mar 28, 2017 • edited

Choose a reason for hiding this comment

homu commented Mar 28, 2017

Choose a reason for hiding this comment

eric-wieser commented Mar 28, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Mar 28, 2017 • edited

Choose a reason for hiding this comment

eric-wieser Mar 28, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Mar 28, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Mar 28, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk commented Mar 28, 2017

eric-wieser commented Mar 28, 2017 • edited

juliantaylor commented Mar 29, 2017

mhvk commented Mar 29, 2017

eric-wieser commented Mar 29, 2017

mhvk commented Mar 29, 2017

eric-wieser commented Mar 29, 2017

eric-wieser commented Mar 26, 2017 •

edited

eric-wieser Mar 27, 2017 •

edited

eric-wieser Mar 28, 2017 •

edited

eric-wieser Mar 28, 2017 •

edited

eric-wieser commented Mar 28, 2017 •

edited

eric-wieser Mar 28, 2017 •

edited

eric-wieser Mar 28, 2017 •

edited

eric-wieser Mar 28, 2017 •

edited

eric-wieser Mar 28, 2017 •

edited

eric-wieser commented Mar 28, 2017 •

edited