Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: additional revision for NEP-18 (__array_function__) #11374

Merged
merged 12 commits into from Jun 27, 2018
Merged

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Jun 18, 2018

Adjusts the text to describe a variant of @eric-wieser's "explicit decorator" proposal from #11303 (comment)

There are a still a number of TODOs from the last PR, and I haven't even gotten to revising the "Alternatives" or "Drawbacks" sections yet, but I thought I might as well put this up. I promise we'll send this back to the mailing list for discussion soon!

CCing the usual @mrocklin @mhvk @hameerabbasi @ngoldbaum @mattip

Copy link
Contributor

@mhvk mhvk left a comment

Looks great. Only a persistent note about types. Given what you write, a set is more logical, but I still would favour a dict with types as the keys and list of arguments of that type as a value... I.e., in your notebook, I would do

from collections import defaultdict
import numpy as np

ndarray = np.ndarray
ndarray_array_function = ndarray.ndarray.__array_function__


def get_overloaded_types_and_args(relevant_args):
    """Returns a list of arguments on which to call __array_function__.

    __array_function__ implementations should be called in order on the return
    values from this function.
    """
    # Runtime is O(num_arguments * num_unique_types)
    overloaded_types = defaultdict(list)
    overloaded_args = []
    for arg in relevant_args:
        arg_type = type(arg)
        if arg_type is ndarray or (getattr(arg_type, '__array_function__',
                                           ndarray_array_function) is
                                   ndarray_array_function):
            continue

        if arg_type not in overloaded_types:
            # Are we a subclass of something already present?
            for i, old_arg in enumerate(overloaded_args):
                if issubclass(arg_type, type(old_arg)):
                    # We are a true subclass here, given upper if statement.
                    overloaded_args.insert(i, arg)
                    break
            else:
                overloaded_args.append(arg)

        overloaded_types[arg_type].append(arg)

    return overloaded_types, overloaded_args

As a convenience for ``__array_function__`` implementors, ``types`` contains a
list of argument types with an ``'__array_function__'`` attribute. This allows
downstream implementations to quickly determine if they are likely able to
support the operation. There are no guarantees about the order of argument types in ``types``: ``__array_funtion__`` implementations that rely on such behavior
Copy link
Contributor

@mhvk mhvk Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should types be a set if we don't guarantee order? Would be easier to implement (no checking needed for whether a given class was already seen).

Copy link
Member Author

@shoyer shoyer Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that eventually we'll implement try_array_function_override in C, so we're definitely not making a types a Python set/list object until we're sure we need to call __array_function__. So convenient implementation isn't really a concern, but I agree that it's a better for our interface to enforce invariants, as long as we don't pay a performance penalty. And indeed, the speed difference for creating a small set vs a list vs a tuple is pretty small:

class A:
  pass

class B:
  pass

x = [A, B]
%timeit tuple(x)  # 158 ns per loop
%timeit list(x)  # 186 ns per loop
%timeit set(x)  # 225 ns per loop

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 18, 2018

Only a persistent note about types. Given what you write, a set is more logical

Purely for speed of construction of Python objects, a tuple (158 ns) is a faster than a list (186 ns) which is faster than a set (225 ns). But we're really in nanobenchmarks at this point, quite unlikely to matter in practice. So indeed enforcing this lack of order in the interface might make sense. I don't have a strong opinion here.

but I still would favour a dict with types as the keys and list of arguments of that type as a value...

I agree, this would definitely be maximally useful to __array_function__ authors -- basically providing all the information you could hope for.

My concern is that this would be difficult to implement without a (small) performance penalty:

  • If we start building the dict of lists when iterating through relevant arguments to look for overloads, we would need to include an entry for numpy.ndarray, because we don't know if there will be an __array_function__ implementation to call until reaching the end. This would probably add an unacceptable level of overhead for the no overloads case, especially because building something like a defaultdict(list) is hard to do in C without using Python's C API.
  • Alternatively, we could call the dispatcher function twice: first to see if anything implements an override, and second to collect the types mapping. But in contrast to collecting types, which I'll claim is basically free alongside collecting array arguments, this is unnecessary overhead for typical __array_function__ implementations that don't need this full information.

So instead, I'll propose that we should pass dispatcher functions as part of the __array_function__ protocol, which now becomes __array_function__(self, func, dispatcher, types, args, kwargs). You can simply re-call dispatcher(*args, **kwargs) to collect relevant arguments in whichever way you see fit.

@mhvk
Copy link
Contributor

@mhvk mhvk commented Jun 18, 2018

Hmm, perhaps I was overthinking this; I'd guess in most implementations one can would call one's own routine and things are in the right place already.

My sense would be not to pass on the dispatcher, though it might be good if there was a programmatic way to retrieve it for those that would like it. For instance, perhaps the decorated function could have a dispatcher attribute? (It will already have args, keywords and func from wraps anyway).

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 18, 2018

My sense would be not to pass on the dispatcher, though it might be good if there was a programmatic way to retrieve it for those that would like it. For instance, perhaps the decorated function could have a dispatcher attribute? (It will already have args, keywords and func from wraps anyway).

If implementations need access to the disptacher (it would be good to see use-cases), then my inclination would be to include it directly. Most but not all functions will use the decorator (see the updated NEP for details), so it may not always be easy to attach a dispatcher attribute.

Separate question: should I rename the dispatch decorator to array_function_dispatch? That's probably a better name for something we'll be exposing externally.

if success:
return value
return func(*args, **kwargs)
return new_func
Copy link
Contributor

@mhvk mhvk Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To attach the dispatcher, wouldn't one just do

new_func.dispatcher = dispatcher
return new_func

Copy link
Member Author

@shoyer shoyer Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work for "builtin" functions written in C like np.concatenate:

>>> np.concatenate.foo = 1
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-a0d69ae14b06> in <module>()
----> 1 np.concatenate.foo = 1

AttributeError: 'builtin_function_or_method' object has no attribute 'foo'

Copy link
Contributor

@mhvk mhvk Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in __array_ufunc__, aren't we passing on a wrapped concatenate? One can definitely assign to the wrapped function which would become the np.concatenate that people actually see:

def new_concat(*args, **kwargs):
    return np.concatenate(*args, **kwargs)

nc = functools.wraps(np.concatenate)(new_concat)
nc.dispatch = 'a'

(obviously, the same would be possible in C)

Copy link
Member Author

@shoyer shoyer Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how we choose to implement np.concatenate with dispatching. Potentially we could do it either way.

I'm sure it's also possible to define a normal Python function from C directly rather than a builtin. I don't know the trade-offs there.

Copy link
Contributor

@mhvk mhvk Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be wise to keep the wrapped and unwrapped versions separate, so that, e.g., __array_function__ can directly call the unwrapped one. (I still hope to do the same for the ufuncs).

Copy link
Contributor

@mhvk mhvk Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. one of the points of the wrapping in the example is to have access to dispatch without actually having to pass that in; it therefore must always be possible to expose it!

Copy link
Member Author

@shoyer shoyer Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. one of the points of the wrapping in the example is to have access to dispatch without actually having to pass that in; it therefore must always be possible to expose it!

I'm not sure I follow here: what is "it"? :)

Copy link
Contributor

@mhvk mhvk Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatch: your wrapped function must be able to call dispatch and therefore has access to it, and must thus be able to expose it to the outside. Of course, this does rely on having the wrapped function be the one in the numpy namespace and identical to the one passed on in __array_function__.

Maybe most relevant to the NEP: maybe good to state a preference for C functions being wrapped, not having dispatch cooked inside.

Copy link
Member Author

@shoyer shoyer Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By dispatch do you mean the operation specific function that returns arguments to check (what I call dispatcher) or the decorator @array_function_dispatch?

Copy link
Contributor

@mhvk mhvk Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes, I meant the dispatcher used inside new_func

@mhvk
Copy link
Contributor

@mhvk mhvk commented Jun 18, 2018

See in-line comment about attaching (and, no, I'm not sure about use cases either - I'd need to start writing a Quantity.__array_ufunc__ implementation to be sure it is any use).

About the name: yes, array_function_dispatch does seem better.

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 18, 2018

and, no, I'm not sure about use cases either - I'd need to start writing a Quantity.__array_ufunc__ implementation to be sure it is any use

OK, in that case I'll probably put a note about passing dispatcher into __array_function__ in some way under the "Alternatives" section for now. I don't think it would change the proposal substantially to add this in the future.

resembling the `decorator library <https://github.com/micheles/decorator>`_
to automatically write an overloaded function like the manually written
implemenations above with the exact same signature as the original.
Unfortunately, using the ``inspect`` module instead of code generation would
Copy link
Member

@eric-wieser eric-wieser Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the inspect module would be needed for code generation anyway, so these are not mutually exclusive. Perhaps "Using the inspect module at runtime instead of at code-generation time"?

Copy link
Member Author

@shoyer shoyer Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I haven't gotten to revising this section yet (I just moved it unchanged down to "Alternatives")

@mattip
Copy link
Member

@mattip mattip commented Jun 18, 2018

Is there a clear need for dispatcher in the __array_function__ interface where we have neither dispatcher nor types in the __array_ufunc__ interface? The timings will probably change as we move this to C. Maybe we could leave that part of the proposal in Alternatives.

@mhvk
Copy link
Contributor

@mhvk mhvk commented Jun 18, 2018

@mattip - the difference with __array_ufunc__ is that it is exactly known where to look for the overrides: in *inputs or kwargs['out'] - with __array_function__, the arguments can be anywhere.

But I'm not sure the dispatcher is necessarily all that helpful indeed, since it will still not tell me where the argument came from and hence I cannot easily, say, replace it with a properly adjusted ndarray.

As I don't have a good use case, let's just mention it somewhere and move on - I'm quite excited by the idea of actually being able to try this!

@charris
Copy link
Member

@charris charris commented Jun 18, 2018

CC @skrah also.

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 18, 2018

@mhvk surfacing your inline comment, we could potentially add the "non-dispatching" version of NumPy functions to the protocol, e.g., __array_function__(self, func, non_dispatching_func, types, args, kwargs). Or these could be included as attributes on func (e.g., np.sum.ndarray_only or np.sum.no_overrides), similar to what you proposed for dispatcher.

I'm inclined to put these in the same category of "possible future extensions" as exposing dispatcher functions. I do like function attributes that can be automatically set by a decorator more than the separate numpy.array_only namespace. I'm not sure there are use cases beyond defining ndarray.__array_function__ itself, which itself would be a bit of a lie because we still don't want to call ndarray.__array_function__ at all in the case of all numpy arrays for performance reasons.

@mhvk
Copy link
Contributor

@mhvk mhvk commented Jun 19, 2018

@shoyer - thanks for resurfacing - I think the non-override function has clearer use than the dispatcher - there should be a fair group of classes that work via ndarray, processing inputs before and outputs after. For ndarray subclasses, this is nicely covered by ndarray.__array_ufunc__, but for others that might not make most sense. It would seem most naturally exposed on the wrapped function (I.e., I would not put it among the arguments).

Maybe things like these can be in a section "to be decided based on actual implementation" or "possible enhancements"?

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 19, 2018

I think the non-override function has clearer use than the dispatcher - there should be a fair group of classes that work via ndarray, processing inputs before and outputs after.

I agree, but just like case for ufuncs, it also works to call the numpy API function directly again, after handling all inputs to remove arguments handled by a particular __array_function__ method.

Maybe things like these can be in a section "to be decided based on actual implementation" or "possible enhancements"?

Will do.


We propose to do so by writing "dispatcher" functions for each overloaded
NumPy function:
- These functions will be called with the exact same arguments that were passed
Copy link
Member

@mattip mattip Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line needed before first list item

Copy link
Contributor

@hameerabbasi hameerabbasi Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs fixing.

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Might be worth sending to the mailing list soon-ish.

docstring to note which arguments are checked for overloads.

It's particularly worth calling out the decorator's use of
``functools.wraps``:
Copy link
Contributor

@hameerabbasi hameerabbasi Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth mentioning that this isn't a catch-all thing, it doesn't always work on arbitrary callables if they aren't Python functions. I ran into this trying to nicely wrap ufuncs. Here's a quick demo, it shows that repr and string don't work, but __doc__ does. It's worth noting that this can be worked around, see https://github.com/pydata/sparse/blob/b51d74924d62ff6537b15ce4e1dd4e56080a3b6f/sparse/utils.py#L187-L191

It might be worth investigating this as a Python issue, but I'm not well-versed enough in Python internals to say if this is actually a bug.

Copy link
Member Author

@shoyer shoyer Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this only works well Python functions. For classes/methods, we would need another solution.

But I think functions are mostly what we want to cover in this NEP.

to work well with static analysis tools, which could report invalid arguments.
Likewise, there is a price in readability: these optional arguments won't be
included in the docstrings for NumPy functions. These downsides mean that users
should make use of this feature with care.
Copy link
Member Author

@shoyer shoyer Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this section to justify passing on unrecognized **kwargs, but now that I've considered the downsides I no longer think it's a good idea! I will probably move this to the section on "Alternatives".

Copy link
Contributor

@hameerabbasi hameerabbasi Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree with you. If someone needs implementation-specific kwargs, they can just call the library's implementation directly, instead of NumPy's.

Copy link
Contributor

@mhvk mhvk Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also agreed - better to keep it simple.

Copy link
Member Author

@shoyer shoyer Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

~~~~~~~~~~~~~

Within NumPy
''''''''''''
Copy link
Contributor

@hameerabbasi hameerabbasi Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're skipping a heading level here, which is making CircleCI fail. Above, you use ~, then ^ and then '. Here, you're skipping ^. The order of the heading levels must be consistent.

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 22, 2018

After the latest revisions, I think this is basically ready to be sent back to the list.

Did I miss anything?

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Minor typo/clarity comment. Feel free to ignore. This is good to go from my side.

optional arguments to NumPy functions without breaking working code already
relying on ``__array_function__``.
optional arguments to NumPy functions without breaking code that already
relyies on ``__array_function__``.
Copy link
Contributor

@hameerabbasi hameerabbasi Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/relyies/relies

@@ -146,6 +146,9 @@ for registering ``__array_function__`` implementations.
def __array_function__(self, func, types, args, kwargs):
if func not in HANDLED_FUNCTIONS:
return NotImplemented
# Note: we use MyArray instead of type(self) for issubclass to allow
Copy link
Contributor

@hameerabbasi hameerabbasi Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of type(self) -> instead of comparison with type(self)


An important virtue of this approach is that it allows for adding new
optional arguments to NumPy functions without breaking code that already
relyies on ``__array_function__``.
Copy link
Contributor

@hameerabbasi hameerabbasi Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/relyies/relies

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 23, 2018

@mrocklin please take a look when you have the chance

@mrocklin
Copy link
Contributor

@mrocklin mrocklin commented Jun 24, 2018

@mrocklin please take a look when you have the chance

Everything here looks good to me.

My apologies for being absent for much of this conversation. I would be fine if you all wanted to replace my name in the authors list with the names of @hameerabbasi and @mhvk, both of whom have been much more active here than I have been.

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 24, 2018

@hameerabbasi @mhvk @eric-wieser I would love to include all of you as co-authors on this proposal! I did most of the writing, but the design has been a real collaborative effort. Please let me know if you're OK with that (e.g., 👍on this comment).

@mrocklin You made significant contributions to the first draft of this, so I would prefer to keep you on, too.

@hameerabbasi
Copy link
Contributor

@hameerabbasi hameerabbasi commented Jun 24, 2018

For a future PR, I'd be willing to do the writing.

@mattip
Copy link
Member

@mattip mattip commented Jun 26, 2018

@shoyer waiting for additional author attributions change before merging

@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 26, 2018

@eric-wieser are you OK with being included as an author? Please let me know either way :).

@eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Jun 26, 2018

Sure!

@shoyer shoyer merged commit 7dace1d into numpy:master Jun 27, 2018
0 of 6 checks passed
@shoyer
Copy link
Member Author

@shoyer shoyer commented Jun 27, 2018

OK, merging. I'll send another email to the list shortly :).

@shoyer shoyer deleted the nep18-v3 branch Jun 27, 2018
@hameerabbasi
Copy link
Contributor

@hameerabbasi hameerabbasi commented Jun 27, 2018

I had an idea of how to handle array construction methods in this NEP. Just commenting here to be reminded on the list later.

@charris charris changed the title WIP: additional revision for NEP-18 (__array_function__) WIP: additional revision for NEP-18 (__array_function__) Nov 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants