Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: additional revision for NEP-18 (`__array_function__`) #11374

Merged
merged 12 commits into from Jun 27, 2018

Conversation

Projects
None yet
7 participants
@shoyer
Copy link
Member

commented Jun 18, 2018

Adjusts the text to describe a variant of @eric-wieser's "explicit decorator" proposal from #11303 (comment)

There are a still a number of TODOs from the last PR, and I haven't even gotten to revising the "Alternatives" or "Drawbacks" sections yet, but I thought I might as well put this up. I promise we'll send this back to the mailing list for discussion soon!

CCing the usual @mrocklin @mhvk @hameerabbasi @ngoldbaum @mattip

@mhvk
Copy link
Contributor

left a comment

Looks great. Only a persistent note about types. Given what you write, a set is more logical, but I still would favour a dict with types as the keys and list of arguments of that type as a value... I.e., in your notebook, I would do

from collections import defaultdict
import numpy as np

ndarray = np.ndarray
ndarray_array_function = ndarray.ndarray.__array_function__


def get_overloaded_types_and_args(relevant_args):
    """Returns a list of arguments on which to call __array_function__.

    __array_function__ implementations should be called in order on the return
    values from this function.
    """
    # Runtime is O(num_arguments * num_unique_types)
    overloaded_types = defaultdict(list)
    overloaded_args = []
    for arg in relevant_args:
        arg_type = type(arg)
        if arg_type is ndarray or (getattr(arg_type, '__array_function__',
                                           ndarray_array_function) is
                                   ndarray_array_function):
            continue

        if arg_type not in overloaded_types:
            # Are we a subclass of something already present?
            for i, old_arg in enumerate(overloaded_args):
                if issubclass(arg_type, type(old_arg)):
                    # We are a true subclass here, given upper if statement.
                    overloaded_args.insert(i, arg)
                    break
            else:
                overloaded_args.append(arg)

        overloaded_types[arg_type].append(arg)

    return overloaded_types, overloaded_args
As a convenience for ``__array_function__`` implementors, ``types`` contains a
list of argument types with an ``'__array_function__'`` attribute. This allows
downstream implementations to quickly determine if they are likely able to
support the operation. There are no guarantees about the order of argument types in ``types``: ``__array_funtion__`` implementations that rely on such behavior

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 18, 2018

Contributor

Should types be a set if we don't guarantee order? Would be easier to implement (no checking needed for whether a given class was already seen).

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 18, 2018

Author Member

I'm pretty sure that eventually we'll implement try_array_function_override in C, so we're definitely not making a types a Python set/list object until we're sure we need to call __array_function__. So convenient implementation isn't really a concern, but I agree that it's a better for our interface to enforce invariants, as long as we don't pay a performance penalty. And indeed, the speed difference for creating a small set vs a list vs a tuple is pretty small:

class A:
  pass

class B:
  pass

x = [A, B]
%timeit tuple(x)  # 158 ns per loop
%timeit list(x)  # 186 ns per loop
%timeit set(x)  # 225 ns per loop
@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 18, 2018

Only a persistent note about types. Given what you write, a set is more logical

Purely for speed of construction of Python objects, a tuple (158 ns) is a faster than a list (186 ns) which is faster than a set (225 ns). But we're really in nanobenchmarks at this point, quite unlikely to matter in practice. So indeed enforcing this lack of order in the interface might make sense. I don't have a strong opinion here.

but I still would favour a dict with types as the keys and list of arguments of that type as a value...

I agree, this would definitely be maximally useful to __array_function__ authors -- basically providing all the information you could hope for.

My concern is that this would be difficult to implement without a (small) performance penalty:

  • If we start building the dict of lists when iterating through relevant arguments to look for overloads, we would need to include an entry for numpy.ndarray, because we don't know if there will be an __array_function__ implementation to call until reaching the end. This would probably add an unacceptable level of overhead for the no overloads case, especially because building something like a defaultdict(list) is hard to do in C without using Python's C API.
  • Alternatively, we could call the dispatcher function twice: first to see if anything implements an override, and second to collect the types mapping. But in contrast to collecting types, which I'll claim is basically free alongside collecting array arguments, this is unnecessary overhead for typical __array_function__ implementations that don't need this full information.

So instead, I'll propose that we should pass dispatcher functions as part of the __array_function__ protocol, which now becomes __array_function__(self, func, dispatcher, types, args, kwargs). You can simply re-call dispatcher(*args, **kwargs) to collect relevant arguments in whichever way you see fit.

@mhvk

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2018

Hmm, perhaps I was overthinking this; I'd guess in most implementations one can would call one's own routine and things are in the right place already.

My sense would be not to pass on the dispatcher, though it might be good if there was a programmatic way to retrieve it for those that would like it. For instance, perhaps the decorated function could have a dispatcher attribute? (It will already have args, keywords and func from wraps anyway).

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 18, 2018

My sense would be not to pass on the dispatcher, though it might be good if there was a programmatic way to retrieve it for those that would like it. For instance, perhaps the decorated function could have a dispatcher attribute? (It will already have args, keywords and func from wraps anyway).

If implementations need access to the disptacher (it would be good to see use-cases), then my inclination would be to include it directly. Most but not all functions will use the decorator (see the updated NEP for details), so it may not always be easy to attach a dispatcher attribute.

Separate question: should I rename the dispatch decorator to array_function_dispatch? That's probably a better name for something we'll be exposing externally.

if success:
return value
return func(*args, **kwargs)
return new_func

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 18, 2018

Contributor

To attach the dispatcher, wouldn't one just do

new_func.dispatcher = dispatcher
return new_func

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 18, 2018

Author Member

This doesn't work for "builtin" functions written in C like np.concatenate:

>>> np.concatenate.foo = 1
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-a0d69ae14b06> in <module>()
----> 1 np.concatenate.foo = 1

AttributeError: 'builtin_function_or_method' object has no attribute 'foo'

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 18, 2018

Contributor

But in __array_ufunc__, aren't we passing on a wrapped concatenate? One can definitely assign to the wrapped function which would become the np.concatenate that people actually see:

def new_concat(*args, **kwargs):
    return np.concatenate(*args, **kwargs)

nc = functools.wraps(np.concatenate)(new_concat)
nc.dispatch = 'a'

(obviously, the same would be possible in C)

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 18, 2018

Author Member

It depends on how we choose to implement np.concatenate with dispatching. Potentially we could do it either way.

I'm sure it's also possible to define a normal Python function from C directly rather than a builtin. I don't know the trade-offs there.

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 18, 2018

Contributor

I think it would be wise to keep the wrapped and unwrapped versions separate, so that, e.g., __array_function__ can directly call the unwrapped one. (I still hope to do the same for the ufuncs).

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 18, 2018

Contributor

p.s. one of the points of the wrapping in the example is to have access to dispatch without actually having to pass that in; it therefore must always be possible to expose it!

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 18, 2018

Author Member

p.s. one of the points of the wrapping in the example is to have access to dispatch without actually having to pass that in; it therefore must always be possible to expose it!

I'm not sure I follow here: what is "it"? :)

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 18, 2018

Contributor

dispatch: your wrapped function must be able to call dispatch and therefore has access to it, and must thus be able to expose it to the outside. Of course, this does rely on having the wrapped function be the one in the numpy namespace and identical to the one passed on in __array_function__.

Maybe most relevant to the NEP: maybe good to state a preference for C functions being wrapped, not having dispatch cooked inside.

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 18, 2018

Author Member

By dispatch do you mean the operation specific function that returns arguments to check (what I call dispatcher) or the decorator @array_function_dispatch?

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 18, 2018

Contributor

Sorry, yes, I meant the dispatcher used inside new_func

@mhvk

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2018

See in-line comment about attaching (and, no, I'm not sure about use cases either - I'd need to start writing a Quantity.__array_ufunc__ implementation to be sure it is any use).

About the name: yes, array_function_dispatch does seem better.

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 18, 2018

and, no, I'm not sure about use cases either - I'd need to start writing a Quantity.__array_ufunc__ implementation to be sure it is any use

OK, in that case I'll probably put a note about passing dispatcher into __array_function__ in some way under the "Alternatives" section for now. I don't think it would change the proposal substantially to add this in the future.

resembling the `decorator library <https://github.com/micheles/decorator>`_
to automatically write an overloaded function like the manually written
implemenations above with the exact same signature as the original.
Unfortunately, using the ``inspect`` module instead of code generation would

This comment has been minimized.

Copy link
@eric-wieser

eric-wieser Jun 18, 2018

Member

Presumably the inspect module would be needed for code generation anyway, so these are not mutually exclusive. Perhaps "Using the inspect module at runtime instead of at code-generation time"?

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 18, 2018

Author Member

Thanks. I haven't gotten to revising this section yet (I just moved it unchanged down to "Alternatives")

@mattip

This comment has been minimized.

Copy link
Member

commented Jun 18, 2018

Is there a clear need for dispatcher in the __array_function__ interface where we have neither dispatcher nor types in the __array_ufunc__ interface? The timings will probably change as we move this to C. Maybe we could leave that part of the proposal in Alternatives.

@mhvk

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2018

@mattip - the difference with __array_ufunc__ is that it is exactly known where to look for the overrides: in *inputs or kwargs['out'] - with __array_function__, the arguments can be anywhere.

But I'm not sure the dispatcher is necessarily all that helpful indeed, since it will still not tell me where the argument came from and hence I cannot easily, say, replace it with a properly adjusted ndarray.

As I don't have a good use case, let's just mention it somewhere and move on - I'm quite excited by the idea of actually being able to try this!

@charris

This comment has been minimized.

Copy link
Member

commented Jun 18, 2018

CC @skrah also.

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 18, 2018

@mhvk surfacing your inline comment, we could potentially add the "non-dispatching" version of NumPy functions to the protocol, e.g., __array_function__(self, func, non_dispatching_func, types, args, kwargs). Or these could be included as attributes on func (e.g., np.sum.ndarray_only or np.sum.no_overrides), similar to what you proposed for dispatcher.

I'm inclined to put these in the same category of "possible future extensions" as exposing dispatcher functions. I do like function attributes that can be automatically set by a decorator more than the separate numpy.array_only namespace. I'm not sure there are use cases beyond defining ndarray.__array_function__ itself, which itself would be a bit of a lie because we still don't want to call ndarray.__array_function__ at all in the case of all numpy arrays for performance reasons.

@mhvk

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2018

@shoyer - thanks for resurfacing - I think the non-override function has clearer use than the dispatcher - there should be a fair group of classes that work via ndarray, processing inputs before and outputs after. For ndarray subclasses, this is nicely covered by ndarray.__array_ufunc__, but for others that might not make most sense. It would seem most naturally exposed on the wrapped function (I.e., I would not put it among the arguments).

Maybe things like these can be in a section "to be decided based on actual implementation" or "possible enhancements"?

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2018

I think the non-override function has clearer use than the dispatcher - there should be a fair group of classes that work via ndarray, processing inputs before and outputs after.

I agree, but just like case for ufuncs, it also works to call the numpy API function directly again, after handling all inputs to remove arguments handled by a particular __array_function__ method.

Maybe things like these can be in a section "to be decided based on actual implementation" or "possible enhancements"?

Will do.


We propose to do so by writing "dispatcher" functions for each overloaded
NumPy function:
- These functions will be called with the exact same arguments that were passed

This comment has been minimized.

Copy link
@mattip

mattip Jun 19, 2018

Member

Empty line needed before first list item

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 21, 2018

Contributor

This still needs fixing.

@hameerabbasi
Copy link
Contributor

left a comment

Might be worth sending to the mailing list soon-ish.

docstring to note which arguments are checked for overloads.

It's particularly worth calling out the decorator's use of
``functools.wraps``:

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 21, 2018

Contributor

It might be worth mentioning that this isn't a catch-all thing, it doesn't always work on arbitrary callables if they aren't Python functions. I ran into this trying to nicely wrap ufuncs. Here's a quick demo, it shows that repr and string don't work, but __doc__ does. It's worth noting that this can be worked around, see https://github.com/pydata/sparse/blob/b51d74924d62ff6537b15ce4e1dd4e56080a3b6f/sparse/utils.py#L187-L191

It might be worth investigating this as a Python issue, but I'm not well-versed enough in Python internals to say if this is actually a bug.

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 22, 2018

Author Member

Yeah, I think this only works well Python functions. For classes/methods, we would need another solution.

But I think functions are mostly what we want to cover in this NEP.

shoyer added some commits Jun 21, 2018

to work well with static analysis tools, which could report invalid arguments.
Likewise, there is a price in readability: these optional arguments won't be
included in the docstrings for NumPy functions. These downsides mean that users
should make use of this feature with care.

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 21, 2018

Author Member

I wrote this section to justify passing on unrecognized **kwargs, but now that I've considered the downsides I no longer think it's a good idea! I will probably move this to the section on "Alternatives".

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 21, 2018

Contributor

I actually agree with you. If someone needs implementation-specific kwargs, they can just call the library's implementation directly, instead of NumPy's.

This comment has been minimized.

Copy link
@mhvk

mhvk Jun 21, 2018

Contributor

Also agreed - better to keep it simple.

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 22, 2018

Author Member

Done

~~~~~~~~~~~~~

Within NumPy
''''''''''''

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 21, 2018

Contributor

You're skipping a heading level here, which is making CircleCI fail. Above, you use ~, then ^ and then '. Here, you're skipping ^. The order of the heading levels must be consistent.

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 22, 2018

After the latest revisions, I think this is basically ready to be sent back to the list.

Did I miss anything?

@hameerabbasi
Copy link
Contributor

left a comment

Minor typo/clarity comment. Feel free to ignore. This is good to go from my side.

optional arguments to NumPy functions without breaking working code already
relying on ``__array_function__``.
optional arguments to NumPy functions without breaking code that already
relyies on ``__array_function__``.

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 22, 2018

Contributor

s/relyies/relies

@@ -146,6 +146,9 @@ for registering ``__array_function__`` implementations.
def __array_function__(self, func, types, args, kwargs):
if func not in HANDLED_FUNCTIONS:
return NotImplemented
# Note: we use MyArray instead of type(self) for issubclass to allow

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 22, 2018

Contributor

instead of type(self) -> instead of comparison with type(self)


An important virtue of this approach is that it allows for adding new
optional arguments to NumPy functions without breaking code that already
relyies on ``__array_function__``.

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 22, 2018

Contributor

s/relyies/relies

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 23, 2018

@mrocklin please take a look when you have the chance

@mrocklin

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2018

@mrocklin please take a look when you have the chance

Everything here looks good to me.

My apologies for being absent for much of this conversation. I would be fine if you all wanted to replace my name in the authors list with the names of @hameerabbasi and @mhvk, both of whom have been much more active here than I have been.

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 24, 2018

@hameerabbasi @mhvk @eric-wieser I would love to include all of you as co-authors on this proposal! I did most of the writing, but the design has been a real collaborative effort. Please let me know if you're OK with that (e.g., 👍on this comment).

@mrocklin You made significant contributions to the first draft of this, so I would prefer to keep you on, too.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2018

For a future PR, I'd be willing to do the writing.

@mattip

This comment has been minimized.

Copy link
Member

commented Jun 26, 2018

@shoyer waiting for additional author attributions change before merging

@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 26, 2018

@eric-wieser are you OK with being included as an author? Please let me know either way :).

@eric-wieser

This comment has been minimized.

Copy link
Member

commented Jun 26, 2018

Sure!

@shoyer shoyer merged commit 7dace1d into numpy:master Jun 27, 2018

0 of 6 checks passed

LGTM analysis: C/C++ Running analyses for revisions
Details
LGTM analysis: JavaScript Running analyses for revisions
Details
LGTM analysis: Python Running analyses for revisions
Details
ci/circleci CircleCI is running your tests
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@shoyer

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

OK, merging. I'll send another email to the list shortly :).

@shoyer shoyer deleted the shoyer:nep18-v3 branch Jun 27, 2018

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2018

I had an idea of how to handle array construction methods in this NEP. Just commenting here to be reminded on the list later.

@charris charris changed the title WIP: additional revision for NEP-18 (__array_function__) WIP: additional revision for NEP-18 (`__array_function__`) Nov 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.