Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: revision of NEP-18 (__array_function__) #11303

Merged
merged 3 commits into from
Jun 15, 2018
Merged

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Jun 11, 2018

I would particularly like to highlight a draft implementation (in pure Python
for now). Hopefuly this will be a useful for driving the discussion forward:
https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006

TODOs before merging:

Other TODOs:

  • decide if we want to change what is passed on to __array_function__
    implementations: should we include overloaded_args and/or
    relevant_args as well as or instead of types?
  • add some discussion about static typing / PEP-484?

@mhvk @hameerabbasi @ngoldbaum @mattip please leave a note if there's anything I missed from our previous discussion that you would like to see addressed in the NEP itself. Detailed discussion should of course be saved for the mailing list.

I would particularly like to highlight a draft implementation (in pure Python
for now). Hopefuly this will be a useful for driving the discussion forward:
https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006

TODOs before merging:

- [ ] review from mrocklin

Other TODOs:

- [ ] decide if we want to change what is passed on to `__array_function__`
      implementations: should we include `overloaded_args` and/or
      `relevant_args` as well or instead?
- [ ] add some discussion about static typing / PEP-484?
Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two changes, mainly typos.

One thing: (I'll repeat this on the mailing list) It'd be nice to have some form of super if we do decide to implement np.NotImplementedButCoercible. Probably another helper function that calls the implementation for a certain class, without doing the dance, but raising TypeError and following np.NotImplementedButCoercible rules? This would address @mhvk's concerns. The reason I think this is acceptable as __array_function__ is a protocol, not a finished method to be used as-is. Something like the following:

def do_partial_dance(func, cls, args, kwargs):
    if hasattr(cls, '__array_function__'):
        retval = cls.__array_function__(...)
        if retval is np.NotImplementedButCoercible:
            return func(*args, **kwargs)
        if retval is NotImplemented:
            raise TypeError(...)
        return retval

    return func(*args, **kwargs)

Then, something inheriting from ndarray would just do do_partial_dance(func, ndarray, ...) instead of super(cls, self).__array_function__(...).

Also, I fully support making __array_function__ a class method as suggested, which would make the above possible. Also, we can simply remove ndarray.__array_function__.

or universal functions (like ``np.exp``). The semantics are very similar
to ``__array_ufunc__``, except the operation is specified by an
arbitrary callable object rather than a ufunc instance and method.
is not covered by the ``__array_func__`` protocol for universal functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: __array_func__ -> __array_ufunc__

# overloaded function again.
return func(*args, **kwargs)

To avoid recursion, the dispatch rules for ``__array_function__`` need also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid recursion -> To avoid infinite recursion

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly points of clarification, though not all minor.

And in particular I would like to discard the idea of ndarray returning NotImplementedButCoercible up front. There is no need to break expectations for super chains like that.

function call that will be checked for an ``__array_function__``
implementation.
- The tuple ``args`` and dict ``**kwargs`` are directly passed on from the
- ``types`` is a list of argument types from the original NumPy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name now has changed from types to possibly_overloaded - should add that I think this is a terrible name! Just overloaded would be better.

(Separate: still think an OrderedDict would be better - but have to look at implementation.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, switched back to types. But maybe overloaded_types would be better/more descriptive.

@@ -145,7 +149,9 @@ This will require two changes within the Numpy codebase:
1. A function to inspect available inputs, look for the
``__array_function__`` attribute on those inputs, and call those
methods appropriately until one succeeds. This needs to be fast in the
common all-NumPy case.
common all-NumPy case, and have acceptable performance (no worse than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that "It should also be possible to skip the test altogether".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree here. This isn't the case for __array_ufunc__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be if adding subok=False didn't cause a bigger delay than checking for __array_ufunc__. But perhaps this also suggests that the current phrasing of "it should be fast" is good enough.

functions to define how that function operates on them. This will allow
using NumPy as a high level API for efficient multi-dimensional array
operations, even with array implementations that differ greatly from
``numpy.ndarray``.

Detailed description
--------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include the mention of astropy as well? See comments on original NEP.

`current behavior <https://bugs.python.org/issue30140>`_ of Python.
- Implementations of ``__array_function__`` indicate that they can
handle the operation by returning any value other than
``NotImplemented``.
- If all ``__array_function__`` methods return ``NotImplemented``,
NumPy will raise ``TypeError``.

One deviation from the current behavior of ``__array_ufunc__`` is that NumPy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that sounds like something we should do in __array_ufunc__ as well. See #11306.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is desirable, then I'd argue that passing self is undesirable, as the function should never have a reason to know which object it was called on.

return func(*args, **kwargs)

To avoid recursion, the dispatch rules for ``__array_function__`` need also
the same special case they have for ``__array_ufunc__``: any arguments with an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR the non-wrapped function can be called. That would give a substantial speed advantage. (For __array_ufunc__, part of #8892)

``inspect`` module. NumPy won't be supporting Python 2 for
`very much longer <http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_, but a bigger issue is
performance: ``inspect`` is written in pure Python, so our prototype
decorator pretty slow, adding about 15 microseonds of overhead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the wrapper would be an issue is we use annotations, since then the wrapper can do all the work while wrapping the function, creating the relevant try_array_function_override. It also has the benefit of generically not having to duplicate names in the wrapper.

If we want to do this, we should consider exposing the helper function
``do_array_function_dance()`` above as a public API.
If we want to do this, we should expose the helper function
``try_array_function_override()`` as a public API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly, the wrapper should be exposed, and the annotation API agreed upon.

Alternatively, a separate namespace, e.g., ``numpy.array_only``, could be
created for a non-overloaded version of NumPy's high level API, for cases
where performance with NumPy arrays is a critical concern. This has most
of the same downsides as the separate namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the downsides are quite the same because if you use the wrappers, the separate "namespace" can be created automatically.


This approach also suggests an intriguing possibility: default implementations
of ``__array_function__`` and ``__array_ufunc__`` on ``numpy.ndarray`` could
change to simply return ``NotImplementedButCoercible``, i.e.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it extremely unexpected behaviour for ndarray itself to return this: the semantics of __array_function__ is that it provides an answer if it can, and in this case ndarray of course can provide an answer. To me, this strongly suggests this would be a design mistake.

Instead, please do discuss the alternative of ndarray.__array_function__ simply calling the unwrapped function - that would also remove the need to treat its __array_function__ differently, and has the benefit of actually doing what it is supposed to do.

special cases for ``numpy.ndarray`` in the NumPy's dispatch logic. It would
break `some existing implementations <https://mail.python.org/pipermail/numpy-discussion/2018-June/078175.html>`_
of ``__array_ufunc__`` on ndarray subclasses (e.g., in astropy), but this
would be acceptable given the still experimental status of``__array_ufunc__``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might as well object loudly now, and be quite clear that it will need a much more convincing case than this to override my objections.

@hameerabbasi
Copy link
Contributor

@mhvk Could you look at the text of my review for a possible alternative? Maybe you'll find it useful. In particular, I think the mistake you may be making is to treat __array_function__ like a method, when it is in fact, a protocol.

@mhvk
Copy link
Contributor

mhvk commented Jun 11, 2018

@hameerabbasi - I fear I don't like the suggestion much at all. Having had to work around the idiotic behaviour of __array_wrap__, which decided it should "help" subclasses by doing things for them that it has no business of doing, I really want to avoid the similar traps where methods on ndarray do not do what a sensible method on another class would do. Please trust me that for ndarray subclasses it is really horrible to break super expectations!

Just as one concrete example, consider the case of two subclasses, my Quantity and a possible Variable class that adds uncertainties (astropy/astropy#3715). Now I want to combine them in a class QuantityVariable(Quantity, Variable). In this class, I do need Quantity to super so that it gets passed on to Variable, but now suddenly the behaviour of __array_function__ would be have the quite different expectation of actually returning a result, not NotImplementedButCoercible. This means I need to add sanity checks in my __array_function__ implementation, which could be easily avoided if ndarray.__array_function__ just does what every other implementation is supposed to do. (And, yes, this is a problem with __array_wrap__.)

@mhvk
Copy link
Contributor

mhvk commented Jun 11, 2018

Thinking a bit more about NotImplementedButCoercible, it would seem there is already a good way for a class to tell that its instances are coercible, which is to provide an __array__ method. So, one could let the combination of __array_function__ and __array__ mean that NotImplemented equals NotImplementedButCoercible.

@mhvk
Copy link
Contributor

mhvk commented Jun 11, 2018

p.s. On the notebook: I'm not sure I actually understand why the inspect and apply defaults are needed, but I haven't got the time right now to investigate.

@shoyer
Copy link
Member Author

shoyer commented Jun 11, 2018

I'm going to remove the suggestion that ndarray could implement __array_function__ as returning NotImplementedButCoercible. By design, it's exactly equivalent to not defining ndarray.__array_function__ at all, but it's not any more useful to subclasses, so the real alternative is defining the methods at all (which obviously would also be contentious).

@shoyer
Copy link
Member Author

shoyer commented Jun 11, 2018

I'm not sure I actually understand why the inspect and apply defaults are needed, but I haven't got the time right now to investigate.

This is needed for handling arguments that might either be called either by position or keyword, e.,g., np.broadcast(array, shape) vs np.broadcast(array=array, shape=shape).

@mhvk
Copy link
Contributor

mhvk commented Jun 11, 2018

Ah, I see. But then the code could be made much more efficient if the wrapper created a function with the right signature and made the args from that, i.e., the wrapped function would look like

def broadcast(array, shape):
    override = _check_override(broadcast, (array,), (array, shape), {})
    # above, trailing {} is for kwargs that are not positional.
    if override is not NotImplemented:
       return override
    ... normal code

and for something like concatenate

def concatenate(arrays, axis=0, out=None):
    override = _check_override(concatenate, tuple(arrays) + (out,), (arrays, axis, out), {})
    ...

I.e., your wrapped-function creator should do as much as possible of the work up-front, so you don't loose time in the actual call.

@shoyer
Copy link
Member Author

shoyer commented Jun 11, 2018

the code could be made much more efficient if the wrapper created a function with the right signature and made the args from that

I think this would require using some sort of dynamic code generation, perhaps inspired by or using the decorator module. Implementation wise this would be a little tricky, but in principle I agree it could work.

@mhvk
Copy link
Contributor

mhvk commented Jun 11, 2018

Yes, that is what I'm thinking of. I thought in fact that was what was typically done, but looking at the one piece of code that I know that does wrapping and inspecting of arguments (quantity_input), I see that it follows the logic you had in your notebook.

Anyway, perhaps for the NEP all we need to note is that working with the decorator could be just as fast, and thus worries about performance should not prevent us from going for the most readable way to tell which arguments would be looked at - if we can do the type annotation and use it for the wrapping, we get two benefits in one go!

@shoyer
Copy link
Member Author

shoyer commented Jun 11, 2018

Anyway, perhaps for the NEP all we need to note is that working with the decorator could be just as fast, and thus worries about performance should not prevent us from going for the most readable way to tell which arguments would be looked at - if we can do the type annotation and use it for the wrapping, we get two benefits in one go!

Just to be clear, you're suggesting that we might be able to write overloads like:

@overload_for_array_function('numpy.broadcast_to')
def broadcast_to(array: ArrayLike, shape: Tuple[int, ...], subok: bool = False):
    ...  # current definition of broadcast_to

The overload_for_array_function decorator would automatically rewrite broadcast_to to support __array_function__ overloads on all ArrayLike arguments based on the type annotation, and could potentially even handle more sophisticated overloads like List[ArrayLike].

I agree that this would be pretty awesome! There are some potential issues (e.g., Python 3 only, potentially slow imports), but it should definitely be mentioned.

You'll note that I added the fully-qualified name 'numpy.broadcast_to' to the decorator. I realize now that one issue for any decorator based solution is getting access to the final decorated function for the func argument. Some import magic would be necessary to get this out.

@mhvk
Copy link
Contributor

mhvk commented Jun 11, 2018

Yes, that's what I would hope; nice to see it written out! I'd bet one could get rid of that fully-qualified name...

array libraries (e.g., scipy.sparse) really don't want implicit conversions
to NumPy arrays, and often avoid implementing ``__array__`` for exactly this
reason. Implicit conversions can result in silent bugs and performance
degradation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what would happen if the policy were "if everyone returned NotImplemented then do what we used to do, and try coercing with numpy.asarray, if that fails, then we fail as we did before"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could do this. I think this would a slightly inferior version of @mhvk's proposal to only try coercing classes that return NotImplemented if they implement __array__. The problem is that classes like scipy.sparse.matrix technically can be converted with np.asarray(), but you end up with a 0-dimensional array containing a sparse matrix (scipy/scipy#4239). (That's probably a bug that could be fixed by defining an __array__ method that raises TypeError.)

@shoyer
Copy link
Member Author

shoyer commented Jun 12, 2018

The main reason why I don't like returning NotImplemented to mean "try to coerce this argument instead" (at least if it defines __array__), is that it will make it challenging to write bug-free code that relies on this overloading.

For example, dask.array would certainly want to implement __array_function__, but also implements __array__. Implementing __array__ is convenient for most end users, at least in the original scenario of running dask arrays using multiple threads on a single machine. However, if I'm writing library code with dask or running it in a distributed context, I really want to know exactly what's overloaded and what isn't.

Maybe there are other ways to achieve this goal. In principle we could even remove __array__ from dask.

In the long term, my preference would be to stop using np.asarray()/__array__ everywhere for coercion to arrays that follow NumPy's high level API, and add another function/protocol for that. In this world, maybe __array__ would even be safe to implement on classes like sparse arrays. (@njsmith and I wrote most of an another NEP on this, but it's not quite done yet.)

@shoyer
Copy link
Member Author

shoyer commented Jun 12, 2018

Still to do after my latest edits:

  • Move discussion of the hypothetical overload_for_array_function decorator to "Alternatives"?
  • Under "Alternatives", mention why __array_function__ should not be a classmethod
  • Mention actually implementing numpy functions via ndarray.__array_function__ under "Alternatives". I agree with the noble goal of eliminating special case logic for np.ndarray, but there are a long list of reasons why I don't like this (I'll write at least a short paragraph in the NEP).

@mhvk
Copy link
Contributor

mhvk commented Jun 12, 2018

👍 to (for now at least) not coercing anything that returns NotImplemented - I think for that route the NotImplementedButCoercible is the right way to go (explicit is better than implicit and so on) -- though I do not prefer it, at least not as a way to be lazy in __array_function__ implementations.

@mhvk
Copy link
Contributor

mhvk commented Jun 12, 2018

@shoyer - on the Should __array_function__ be a classmethod? Reading #11306 (comment) and the following, I guess it does make sense, but I think we should certainly do it only if we also change __array_ufunc__.

Overall, I think there's little point in making it a class method beyond a sense of purity. And one has to balance that against confusion for people who are used to regular operator overloading or to __array_prepare__ etc. Also, I at least thought up to recently that class methods were just there to allow different ways of instantiating an instance (and that is still exclusively how I use them).

@shoyer
Copy link
Member Author

shoyer commented Jun 12, 2018

on the Should array_function be a classmethod? Reading #11306 (comment) and the following, I guess it does make sense, but I think we should certainly do it only if we also change array_ufunc.

Overall, I think there's little point in making it a class method beyond a sense of purity. And one has to balance that against confusion for people who are used to regular operator overloading or to array_prepare etc. Also, I at least thought up to recently that class methods were just there to allow different ways of instantiating an instance (and that is still exclusively how I use them).

I agree with this all of this .

@eric-wieser
Copy link
Member

eric-wieser commented Jun 12, 2018

There are some potential issues (e.g., Python 3 only, potentially slow imports), but it should definitely be mentioned.

How about

@np.dispatch_on
def concatenate(arrs, axis=0, out=None):
    # yield every operand that can be overloaded on
    # probably faster to return a list, but this is more readable
    for arr in arrs:
        yield arr
    yield out

@np.concatenate.register(np.ndarray)
def concatenate(arrs, axis=0, out=None):
    # the current implementation of concatenate

This eliminates the need for python 3 support, or to write a typing parser

@hameerabbasi
Copy link
Contributor

hameerabbasi commented Jun 12, 2018

I like @eric-wieser's logic, with one difference: The @np.dispatch_on decorator should be used to determine relevant_arguments, but the @np.concatenate.register should be changed to @np.concatenate.default (this would register it in ndarray.__array_function__ and make it the default implementation). @np.concatenate.register should be available for library authors to use, in any case.

@mhvk
Copy link
Contributor

mhvk commented Jun 12, 2018

@eric-wieser - that seems super-beautiful (and can still be auto-generated using annotations if someone wanted to do that).

@eric-wieser
Copy link
Member

There's still the question of how to expose the other available types to the implementation. Maybe as simple as a __types__ argument coming first, which is unlikely to clash with any kwargs?

@mhvk
Copy link
Contributor

mhvk commented Jun 12, 2018

@eric-wieser - maybe still just types as in the NEP draft? In __array_ufunc__, we also have method.

@shoyer - it would seem nicest if the types were ordered in the way they are going to be tried (i.e., the sub-class put before the class).

@shoyer
Copy link
Member Author

shoyer commented Jun 12, 2018

I really like the explicit decorator solution! It's low-boilerplate but still fully explicit, with a clean separation of code for generating dispatch arguments and implementation.

I've updated my notebook with a draft implementation, which I've called dispatch_with.

I'll note one important difference from what @eric-wieser wrote: I'm only decorating the implementation function, and don't pass the type np.ndarray anywhere into the dispatcher. My dispatched functions look like:

def _concatenate_dispatcher(arrs, axis=0, out=None):
    for arr in arrs:
        yield arr
    yield out

@dispatch_with(_concatenate_dispatcher)
def concatenate(arrs, axis=0, out=None):
    # the current implementation of concatenate

I like the idea of a dispatch mechanism with a register method for adding new implementations of higher level array functions (like functools.singledispatch), but I don't think that a simple decorator would suffice. At the very least, we would need support for registering with multiple types, so NumPy know which implementation to use for mixed cases like adding xarray/dask arrays. Eventually we would basically need a multipledispatch library.

One of the major advantages of using protocols rather than multiple dispatch is that we don't need to figure this all out ahead of time, and we can push this choice on implementations instead. The protocol is the simplest interface, but it doesn't require what implementers can do.

Instead, I think I'll update the NEP to recommend that implementers of __array_function__ define their own decorator helper functions for writing implementations of NumPy functions. The decorators themselves can be written in only a few lines of code, and don't need any awareness of how NumPy implements the override checking, e.g.,

class MyArray:
    def __array_function__(self, func, types, args, kwargs):
        if func not in HANDLED_FUNCTIONS:
            return NotImplemented
        if not all(issubclass(t, MyArray) for t in types):
            return NotImplemented
        return HANDLED_FUNCTIONS[func](*args, **kwargs)

HANDLED_FUNCTIONS = {}

def implements(numpy_function):
    def decorator(func):
        HANDLED_FUNCTIONS[numpy_function] = func
        return func
    return decorator

@implements(np.concatenate)
def concatenate(arrays, axis=0, out=None):
    ...  # implementation of concatenate for MyArray objects

@implements(np.broadcast_to)
def broadcast_to(array, shape, subok=False):
    ...  # implementation of broadcast_to for MyArray objects

@shoyer
Copy link
Member Author

shoyer commented Jun 12, 2018

it would seem nicest if the types were ordered in the way they are going to be tried (i.e., the sub-class put before the class)

In principle, I agree. In practice, I'm not sure it actually matters. If a class defines override behavior differently based upon in the order in which overrides were checked, it probably doesn't have a well-defined type casting hierarchy.

The only reason why I'm not doing this in my current implementation is that numpy.ndarray is included in types but not overloaded_args on which to call __array_function__, so the book-keeping got trickier. I'm open to alternative suggestions, especially given that eventually we'll need to rewrite try_array_function_override/get_overloaded_types_and_args in C anyways.

@shoyer
Copy link
Member Author

shoyer commented Jun 13, 2018

I'm getting really excited about the explicit decorator for internal use in NumPy. Although there are still a few edge cases where we would need to explicitly call try_array_function_override (e.g., np.vectorize and np.einsum), the decorator not only looks nicer but also results in a better working solution. Both the "Drawbacks of this approach" in the current NEP ("Future difficulty extending NumPy’s API" and "Difficulty adding implementation specific arguments") actually go away!

Here's why: with the explicit decorator solution, we don't need to awkwardly reassign parsed function arguments into *args and **kwargs for calling into try_array_function_override(). Instead, __array_function__ can get called with exactly the same arguments as were used to call the high-level NumPy function.

This means that if an array implementation doesn't need to or can't support some optional keyword arguments, e.g., the out argument for a library with immutable arrays, they don't need to include it in the function signature.

For example, you could write a simplified version of sum without worrying about dtype, axis, keepdims and out:

@implements(np.sum)
def sum(array, axis=None):
    ...  # implementation of sum for MyArray objects

This would just work, as long as nobody explicitly passes in keepdims. You don't have to pretend that you can handle everything and manually raise errors for non-supported arguments, like what we see on the sum method for libraries like xarray or dask that can't do in-place operations. There's no temptation to throw in a **kwargs to silently swallow errors, and you don't need to manually catch deviations from out=None. If you pass in out= when doing an operation on a dask array, you just get the usual TypeError: sum() got an unexpected keyword argument: 'out'.

This also automatically gives us a safe expansion path for new keyword arguments. If NumPy adds a new optional keyword argument (e.g., keepdims), existing code that uses NumPy's high level API on non-NumPy arrays continues to work.

Finally, if we make a point to write our dispatching functions as accepting **ignored_kwargs, we can also accept optional implementation-specific keyword arguments, like np.sum(tensor, name='foo') for TensorFlow or np.sum(array, split_every=2) for dask, which are simply skipped for dispatch. Example:

def _sum_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None,
                    **ignored_kwargs):
    yield a
    yield out

@dispatch_with(_sum_dispatcher)
def sum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
    ...  # current definition of np.sum

Adding the unused **ignored_kwargs to a function call does make things slightly slower, but the difference is truly negligible (~50 nanoseconds)

@mattip
Copy link
Member

mattip commented Jun 13, 2018

you just get the usual TypeError: sum() got an unexpected keyword argument: 'out'.

Could you add this to the notebook? I am curious what stack track you get with the exception on python3

@eric-wieser
Copy link
Member

How do we want to handle default argument values in the dispatcher? Do we set them all to np.NoValue / None?

@hameerabbasi
Copy link
Contributor

hameerabbasi commented Jun 13, 2018

@eric-wieser If you see @shoyer's example, the idea is to use the same signature as the function. If you mean the implementation, for that, we do things exactly as before, with the exception of the added decorator.

Edit: The idea is to not pass them in at all (via *args, **kwargs)

@eric-wieser
Copy link
Member

@hameerabbasi: My point is that _sum_dispatcher and sum have the default argument values repeated, which is a recipe for divergence.

@hameerabbasi
Copy link
Contributor

hameerabbasi commented Jun 13, 2018

Ah. To maintain compatibility with older code, Numpy will only add arguments at the end, so we can add *ignored_args, **ignored_kwargs there, and avoid any irrelevant args. This has the potential pitfall of not including any future array_like args into the relevant_args.

@mrocklin
Copy link
Contributor

Would the decorator solution introduce challenges when trying to pickle numpy functions?

@shoyer
Copy link
Member Author

shoyer commented Jun 13, 2018

Would the decorator solution introduce challenges when trying to pickle numpy functions?

It looks like functools.wraps is makes pickle work:

def f():
    pass
  
print(pickle.dumps(f))


def good_wrapper(func):
  @functools.wraps(func)
  def new_func(*args, **kwargs):
    return func(*args, **kwargs)
  return new_func

@good_wrapper
def f():
    pass

print(pickle.dumps(f))


def bad_wrapper(func):
  def new_func(*args, **kwargs):
    return func(*args, **kwargs)
  return new_func

@bad_wrapper
def f():
    pass

print(pickle.dumps(f))

Results in:

b'\x80\x03c__main__\nf\nq\x00.'
b'\x80\x03c__main__\nf\nq\x00.'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-0c33e20e4a4e> in <module>()
     27     pass
     28 
---> 29 print(pickle.dumps(f))

AttributeError: Can't pickle local object 'bad_wrapper.<locals>.new_func'

@shoyer
Copy link
Member Author

shoyer commented Jun 13, 2018

How do we want to handle default argument values in the dispatcher? Do we set them all to np.NoValue / None?

It really doesn't matter what the default argument values in the dispatcher are, as long as they aren't objects that implement __array_function__. So we might as well stick with a convention that they should all be None. It's only important that they have a default value so the function signature gets parsed identically.

@shoyer
Copy link
Member Author

shoyer commented Jun 13, 2018

Ah. To maintain compatibility with older code, Numpy will only add arguments at the end, so we can add *ignored_args, **ignored_kwargs there, and avoid any irrelevant args. This has the potential pitfall of not including any future array_like args into the relevant_args.

My inclination is that we would only want to include **ignored_kwargs. I don't think we should encourage or support using implementation-specific positional arguments with high level NumPy functions.

  1. First, it's confusing: at least if you use a different keyword name, it's obvious that something is being done that isn't in the NumPy function.
  2. Second, it's likely to cause conflicts when NumPy itself adds new arguments, including potential bugs if the new argument participates in dispatching. For example, suppose np.concatenate only had its first two arguments, and we now want to add the new out argument, which should participate in dispatching. If an implementation specific argument is being passed by position, then it would immediately be interpreted as the out argument for dispatch purposes.

@shoyer
Copy link
Member Author

shoyer commented Jun 13, 2018

you just get the usual TypeError: sum() got an unexpected keyword argument: 'out'.

Could you add this to the notebook? I am curious what stack track you get with the exception on python3

Done.

From the notebook, here's what a traceback looks like by default:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-f2617d7f2870> in <module>()
     25 assert sum(my_array) == (my_array, False)
     26 assert sum(my_array, something_new=True) == (my_array, True)
---> 27 sum(my_array, axis=0)  # not supported

<ipython-input-27-3a4647245cc8> in new_func(*args, **kwargs)
     79             relevant_arguments = get_relevant_args(*args, **kwargs)
     80             success, value = try_array_function_override(
---> 81                 new_func, relevant_arguments, args, kwargs)
     82             if success:
     83                 return value

<ipython-input-27-3a4647245cc8> in try_array_function_override(func, relevant_arguments, args, kwargs)
     63         # check all argument types.
     64         result = overloaded_arg.__array_function__(
---> 65             func, types, args, kwargs)
     66         if result is not NotImplemented:
     67             return True, result

<ipython-input-38-f2617d7f2870> in __array_function__(self, func, types, args, kwargs)
      5         if not all(issubclass(t, MyArray) for t in types):
      6             return NotImplemented
----> 7         return HANDLED_FUNCTIONS[func](*args, **kwargs)
      8 
      9 

TypeError: _() got an unexpected keyword argument 'axis'

We probably should catch such a TypeError in try_array_function_override and re-raise with a more informative message that mentions the offending type (MyArray in this case).

@mattip
Copy link
Member

mattip commented Jun 13, 2018

Thanks. The re-raised (?) error should also mention the offending function (sum). This prototyping is a great way to work these things out.

@shoyer
Copy link
Member Author

shoyer commented Jun 14, 2018

OK, I have a draft version of exception wrapping in the notebook, e.g., the error message is now:
TypeError: _() got an unexpected keyword argument 'axis' [while calling '__main__.MyArray' implementation of '__main__.sum']

In practice, for a function like np.sum, it would probably look something like:
TypeError: sum() got an unexpected keyword argument 'axis' [while calling 'mymodule.core.MyArray' implementation of 'numpy.core.fromnumeric.sum']

I'm not sure this is necessary in practice because the traceback will include the file/location of the offending class and numpy function, but easier to understand error messages are always a win. If we do this, it might not be a bad idea for __array_ufunc__, too.

@mhvk
Copy link
Contributor

mhvk commented Jun 14, 2018

@shoyer - yes, I definitely feel things here should go to __array_ufunc__ as well!

@mhvk
Copy link
Contributor

mhvk commented Jun 14, 2018

@shoyer - it might make sense to merge the changes you've made so far and start a new iteration...

@shoyer
Copy link
Member Author

shoyer commented Jun 15, 2018

I haven't had the time to rewrite things yet, but I added a big "Warning" to the in-progress section so I think we can merge this in its current, updated draft state.
@mrocklin any objections?

@mrocklin
Copy link
Contributor

mrocklin commented Jun 15, 2018 via email

@shoyer shoyer merged commit 16f19fc into numpy:master Jun 15, 2018
@shoyer shoyer deleted the nep-18-v2 branch June 15, 2018 16:15
@charris charris changed the title DOC: revision of NEP-18 (__array_function__) DOC: revision of NEP-18 (__array_function__) Nov 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants