Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP: Initial draft for NEP 43 for extensible ufuncs #16723

Merged
merged 22 commits into from
Oct 13, 2020
Merged

Conversation

seberg
Copy link
Member

@seberg seberg commented Jul 1, 2020

There is still quite a bit to be done here. However, some of the important things for discussion are there I think, so putting it out here to work on here and hope for feedback.

One of the thoughts I had with respect to #12518 is to pass in a struct (which could grow) with all of the important information. The main tricky thing is to decide on:

  1. Promotion (but I think multiple dispatched promotion is a good solution)
  2. How to best wrap existing loops, and how to make the whole API as limited as possible for now.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - before anything else, a very nice write-up! Some initial comments, some just details, some perhaps too much focussed on implementation, and one real confusion, leading to the perhaps the main suggestion of some type of python mock-up.

doc/neps/nep-0043-extensible-ufuncs.rst Outdated Show resolved Hide resolved
doc/neps/nep-0043-extensible-ufuncs.rst Outdated Show resolved Hide resolved
setting the desired output descriptors, and possibly modifying the input
ones to conform with the inner-loops requirements.

* Define a new structure to be passed in the inner-loop, which can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the struct idea, but along the lines of the previous comment wonder whether the fixed and variable parts should be explicitly separated. I think that makes the call signature for the below function clearer too.

Though for the call to the inner loop one would have to combine them in a new struct with two parts if one wants to stick to one argument there. So, all depends a bit... (Note that numpy does use that void *data internally, and there may well be uses externally.)

Copy link
Member Author

@seberg seberg Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to think about this, thanks. Yes, we need to wrap legacy loops in any case I think (or, hidden from the user, have a flag which causes the old inner-loop convention (depending on what is easier probably, I don't mind the indirection for loops defined outside of NumPy, and loops inside of NumPy should move ASAP as well).

EDIT: Only added a TODO here for now.

* A flag to ask NumPy to perform floating point error checking (after custom
setup and before user teardown).

* Instead of a user-setup function,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one confuses me: Instead of what?

doc/neps/nep-0043-extensible-ufuncs.rst Show resolved Hide resolved
to error reporting, by extending the signature of the core inner-loop
functions.
Finally, one of the most important steps in the ufunc call is finding the
correct function for a specific set of input dtypes. We propose to use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct inner loop function? Or just "correct inner loop"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "correct ufunc implementation"?

implementation only.


Bound UFuncImpl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section ended up confusing me no end. It is not obvious to me that it is like a bound method where the ufunc and dtypes combine to some form of self, unless one thinks of each UFuncImpl as some type of class that has __init__(self, ufunc, dtypes), so that all the methods can just refer to self.ufunc and self.dtypes.

Indeed, I'm sufficiently confused that it makes me wonder whether it might not be helpful to have a mock-up of a python implementation of a UFuncImpl and on how one would use it for an existing function (the units case would be a good one), as well as how one would write a new UFunc taking perhaps an existing UFuncImpl (say, a sincos implementation). (This echoes a point made by @mattip in the original thread...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess you could see it that way with an __init__(self, ufunc, dtypes), although, I do not think such an init has to technically be exposed to the user. Anyway, I have to think about trying to clear this up, I was not sure I like the mental model around having an __init__, it somewhat implies there is a full blown (subclass-able?) class, something that I would like to avoid at least for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the way it is exposed to a python user does not have to be decided now. My main comment here was really that I found the present text confusing (unlike the rest). Might it make sense to move this piece to a separate very preliminary discussion of how things might be exposed in python?

Or, not as good but perhaps more practical, remove it and just have a note up front that a python implementation is not yet fully considered but that it is attempted to make sure nothing would stop it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have expanded/refactored this section, and more importantly tried to add a super-highlevel-get-the-feel for things section at the start, which happens to include resolve_impl (I don't know how much that helps, and I guess you are the wrong person to ask by now).

In any case, this section is restructured, and I tried to point out that it is a bit nitty-gritty for the interested reader (I am unsure about it, but it helped me be clear about who has what information)

inputs and tries again. This is well defined for most mathematical
functions, but can be disabled or customized if necessary.

2. Users can *register* new Promoters just as they can register new UFuncImpl.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, registration of UFuncImpl has not really been discussed much.

return NotImplemented

# Could check that res[1] is actually Int64
return ufunc.resolve_impl(tuple(res))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock-up is very useful, but would be even better if resolve_impl had already been introduced! (perhaps via the larger mock-up I wondered about above).


* How ``CastingImpl`` lookup, and thus the decision whether a cast is possible,
is defined. (This is speed relevant, although mainly during a transition
phase where UFuncs where NEP 43 is not yet implemented).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double where

PyUFuncObject *ufunc, PyArray_DTypeMeta *DTypes[nargs],
PyArray_Descr *in_dtypes[nargs], PyArray_Descr *out_dtypes[nargs]);

setting the desired output descriptors, and possibly modifying the input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - the "and possibly modifying the input" here was what made me think that changes were made in-place. Maybe rephrase this?

@seberg seberg marked this pull request as draft July 2, 2020 16:50
flagging (e.g. warnings) if desired.

* A flag to ask NumPy to perform floating point error checking (after custom
setup and before user teardown).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this error checking always happen ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many integer loops or loops working with Python objects only, do not really have to. And apparently floating point error checking can be a bit slow on some hardware, so I think we can create such an option. Although, I guess it would be fair to add that later as well.

user input descriptors with descriptors matching the wrapped loop.
It must be possible to *view* the inputs as the output.
* ``wrap_outputs(Tuple[DType]: input_descr) -> Tuple[DType]`` replacing the
resolved descriptors with with the desired actual loop descriptors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering how this handles a case like np.add(UnitFloat64, Unit[Float64["km"]) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good point, I should explain that/give an example. For Units this uses .view(np.float64) in the inner loop effectively. In the outer loop, the descriptors will be set to (Unit[Float64]("m"), Unit[Float64["m"], Unit[Float64]("m")), so the casting machinery has to convert the "km" to "m".

@seberg
Copy link
Member Author

seberg commented Aug 12, 2020

@hameerabbasi (and quite honestly everyone). While I will iterate on it in the next days, I think the big picture stuff is there and should be fine to read, so please do whenever you have time, independently of me updating it soon. It will require more iterations in any case, delaying the start of the discussion just means that my iterations are more annoying, slower and less thorough.

was already set previously it will be stored. Any previous error
will be cleared before calling ``teardown``.
The return value of the inner-loop is passed in to allow simple error
flagging (e.g. warnings) if desired.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for those who already read this. I think I will modify this. I am starting to be convinced that teardown does not need to do any error-related work, and that it may be an anti-pattern. Floating point errors are special, since they require this. As such, we could allow an error return, but should discourage it, since its much like deleting a Python object.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - I like your example up front! Just a few super-nitpicky comments as I was reading through.


1. For the above to work, we must also define::

np.positive.resolve_impl(MyDateTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just MyDType here and below? Or in the sentence above, "1. For the above to work for, e.g., a MyDateTime DType, we must also define"


res = np.positive(arr)

could be implemented (conceptional) as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conceptionally


add_impl = np.add.resolve_impl(type(arr1.dtype), type(arr2.dtype))

This NEP strives to define the minimal set of API necessary to achieve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete "set of" (i.e., just "minimal API")


This is described in the "Promotion and Dispatching" section and
step 2 in the list included in the next section
(The Steps involved in a UFunc call).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just link "next section" in rst?

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only read this partway through as of yet.

Detailed Description
--------------------

NEP 41 and NEP 42 layed out the general proposed structure into DType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NEP 41 and NEP 42 layed out the general proposed structure into DType
NEP 41 and NEP 42 laid out the general proposed structure into DType


4. Preparing the actual iteration. This step is largely handled by ``NpyIter`` (the iterator).

* Allocate all ouputs and potentially copies (or buffers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would be a copy of an output be needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted it. For example its the implementation for overlap protection and also for very small arrays more efficient (dunno if we use it for anything else but scalars).

But: It just doesn't matter here.

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'd front-load the use-cases for this NEP instead of front-loading the implementation. I realize that separating the two is hard; and also that as developers who implement things, we tend to focus more on implementation.

However, this makes the end-user zone out if they don't understand, and we lose their interest. See this discussion for details.

Comment on lines 93 to 98
Although, this ``positive_impl`` and ``resolve_impl`` is required for certain
functionality, the following code::

res = positive_impl(arr)

is not part of this NEP, as it is not central to the proposal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly move to a non-goals section.

Comment on lines 73 to 186
could be implemented (conceptionally) as:

positive_impl = arr.dtype.positive
res = positive_impl(arr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't what happens, and not something that the NEP will modify (as you say below), then I'd remove it completely or replace it by a more accurate statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hameerabbasi, I am trying to give the reader an idea/concept of what a multi-method is (and how it comes to into play with ufuncs). Starting with normal methods, seemed like a fair way to do that to me? Maybe the sentence afterwards can say that this is not how it can be done in NumPy for reasons 1 and 2. (Although, I have only semi appetite to discuss the possibility of an __array_ufunc__ like approach, unless someone insists on it being listed as a rejected alternative.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following:

# find_actual_implementation is what we want to achieve for custom dtypes
# and ufuncs
positive_impl = find_actual_implementation(np.positive, arr.dtype)
res = positive_impl(arr)

I just don't want to give users who gloss over this the perception that positive is a dtype slot. It's best to choose an example which doesn't require you to explain that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wanted the concept of "method" in there I guess, but maybe that is just not necessary.

doc/neps/nep-0043-extensible-ufuncs.rst Outdated Show resolved Hide resolved
Comment on lines 213 to 214
*TODO:* we need to briefly mention registration, even if the details of
how to register it are in the specs or even later!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should. API decisions should come before implementation details. 😄

Its not quite settled, I don't quite like the `npy_intp *` as a default
"user-data" yet, but I think I may warm up to it, and I think some kind
of default like that is really very convenient.

Split out `user_data *`, since I think Marten has a point that
separating "constant" and volatile data is conceptionally easier.
@seberg seberg marked this pull request as ready for review October 7, 2020 02:33
Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a link to the renedered document https://766-5652112-gh.circle-artifacts.com/0/doc/neps/_build/html/nep-0043-extensible-ufuncs.html.

I think we should merge this soon, there is an open typo suggestion and I made a suggestion around strided_loop, but feel free to ignore and we can iterate after this is merged.

# data: Pointers to the array data.
# strides: strides to iterate all elements
num_chars1 = context.descriptors[0].itemsize
num_chars2 = context.descriptors[0].itemsize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need dimension as well as strides. If dimensions are in the context, then strides could be as well. Also # n: Number of seems like a fragment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is a 1-D loop, so I only need the number of elements to visit in the loop (which is missing). One outstanding small question is whether we should pass the user_data with the context or explicitly. (There is a downside that some fields of context shouldn't be used in resolve_descriptors since it makes sense to call that without doing any actual operation.)

I will give it a pass today, and Ross as well and then plan to merge this to have a first rendered draft.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously quite the undertaking - thanks @seberg for taking the time to compile all this information, and for taking a lot of care in the communication of the concepts!

The only high-level comment that I have is that I would've found it useful if the "steps in a ufunc call" were higher up in the document, as this provides a nice scaffolding for all of the subsequent pieces that are addressed individually. It's not a show-stopper as there are forward references to that section in some places.

doc/neps/nep-0043-extensible-ufuncs.rst Outdated Show resolved Hide resolved

A UFunc call is split into the following steps:

1. Handling of ``__array_ufunc__`` for container types, such as a Dask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the other enumerated points start with a italicized summary, maybe you could do the same here for consistency, e.g. 1. *Handle ``__array_ufunc__`` protocol*

For example, adding a ``float32`` and a ``float64`` is implemented by
first casting the ``float32`` to ``float64``.

3. *``dtype`` resolution:*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this step relevant only for parametric dtypes? If so then maybe consider renaming this step to something like e.g. "dtype parameter resolution". Otherwise conceptual boundary(ies) between "promotion" and "dtype resolution" are not very clear to me.

We use the ``class`` syntax to describe the information required to create
a new ``ArrayMethod`` object:

.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type hints for the arguments are backwards in this code block: I think it's supposed to be arg_name : type. Not really a big deal since this is mostly for the purposes of illustration!

doc/neps/nep-0043-extensible-ufuncs.rst Outdated Show resolved Hide resolved
Comment on lines 1010 to 1011
While dispatching requires looking up the ``ArrayMethod`` registered for
the matching DTypes, requires additional definitions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there's a problem with the wording here

Comment on lines 1232 to 1233
Additionally, to changes in the resulting dtype, do not add for example
faster but less precise loops/promoter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording here is a bit unclear - I'm not sure I'm understanding this correctly, but what I got out of it was: "For example, do not add faster but less-precise promotions in addition to changing the resulting dtype".

seberg and others added 2 commits October 13, 2020 17:18
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
@seberg seberg changed the title NEP: Initial draft for NEP 43 NEP: Initial draft for NEP 43 for extensible ufuncs Oct 13, 2020
@seberg
Copy link
Member Author

seberg commented Oct 13, 2020

Thanks for the review especially Matti, Ross and of course Marten! Going to put this in so that we have an easy to read draft, but of course this is will not be the last iteration.

@seberg seberg merged commit dbc4f5d into numpy:master Oct 13, 2020
@seberg seberg deleted the NEP43 branch October 21, 2020 16:14
@guidocalvano
Copy link

What is the state of this? The draft of this is now in master, but it is not considered an accepted NEP yet?

https://numpy.org/neps/nep-0043-extensible-ufuncs.html

@seberg
Copy link
Member Author

seberg commented Jan 10, 2024

I am not sure we ever marked it as accepted (which we should do with the note below, probably).

The API probably moved a bit since then, but yes, I would like most of this to be available in NumPy 2.0. We are using the API internally quite a bit. I will note that there is still a chance of doing some changes, or omitting some functions to allow easier changes in the near future.

I am still considering one larger change mainly: Re-adding the static void * pointer as part of the context struct as well as moving the new auxdata into the context struct. If you have any thoughts, please let us know, this is the last few weeks were changes are trivial.
(The new API is designed to be able to deprecate older versions, but especially for the ufunc inner-loop signature that seems tedious.)

The new API is available for experimentation for a long time via extermintal_dtype_api.h.

@guidocalvano
Copy link

Is there a python api already? I don't expect to need c code.

Some more context should you be interested:
What I want to do is implement two things: decimal types and multivectors from geometric algebra. I expect both to be implementable fairly easily and quickly by a python ufunc: i.e. you create a dtype that imposes a size on an element, and then you can create a view of a different dtype, to access the data. Then you can do your math by operating on the view and return whatever the operator requires. Decimal add can be done by simply viewing the values as ints and adding them. Multiplication can be done by promoting to a bigger int data type, doing the multiplication, then shifting values two times the numbers behind the decimal point. Multivectors are a bit more complicated... it would take at least half an hour to explain the concept... but well worth watching a few videos on.

@seberg
Copy link
Member Author

seberg commented Jan 10, 2024

I have no plans for Python API for the foreseeable future, it would need a higher level hook to do well probably. Yes, you could add that, but no it doesn't exist.

@mhvk
Copy link
Contributor

mhvk commented Jan 11, 2024

@seberg - I think an ability to pass on arbitrary information to the inner loop remains a good idea. E.g., using old-style I have used it to create "ufunc chains" where the inner loop executes other inner loops in order - for speed-ups comparable to numexpr without having to describe operations in strings (see https://github.com/mhvk/chain_ufunc). And of course, you saw that it was somewhat handy for the gufunc fft/ifft routines in #25536...

But the main problem in commenting is that while I reviewed quite a bit, I haven't yet used the new API, so I don't really know how it would work in practice... So take this with a pinch of salt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants