-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEP: Initial draft for NEP 43 for extensible ufuncs #16723
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - before anything else, a very nice write-up! Some initial comments, some just details, some perhaps too much focussed on implementation, and one real confusion, leading to the perhaps the main suggestion of some type of python mock-up.
setting the desired output descriptors, and possibly modifying the input | ||
ones to conform with the inner-loops requirements. | ||
|
||
* Define a new structure to be passed in the inner-loop, which can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the struct idea, but along the lines of the previous comment wonder whether the fixed and variable parts should be explicitly separated. I think that makes the call signature for the below function clearer too.
Though for the call to the inner loop one would have to combine them in a new struct with two parts if one wants to stick to one argument there. So, all depends a bit... (Note that numpy does use that void *data
internally, and there may well be uses externally.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to think about this, thanks. Yes, we need to wrap legacy loops in any case I think (or, hidden from the user, have a flag which causes the old inner-loop convention (depending on what is easier probably, I don't mind the indirection for loops defined outside of NumPy, and loops inside of NumPy should move ASAP as well).
EDIT: Only added a TODO here for now.
* A flag to ask NumPy to perform floating point error checking (after custom | ||
setup and before user teardown). | ||
|
||
* Instead of a user-setup function, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one confuses me: Instead of what?
to error reporting, by extending the signature of the core inner-loop | ||
functions. | ||
Finally, one of the most important steps in the ufunc call is finding the | ||
correct function for a specific set of input dtypes. We propose to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct inner loop function? Or just "correct inner loop"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "correct ufunc implementation"?
implementation only. | ||
|
||
|
||
Bound UFuncImpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section ended up confusing me no end. It is not obvious to me that it is like a bound method where the ufunc and dtypes combine to some form of self
, unless one thinks of each UFuncImpl
as some type of class that has __init__(self, ufunc, dtypes)
, so that all the methods can just refer to self.ufunc
and self.dtypes
.
Indeed, I'm sufficiently confused that it makes me wonder whether it might not be helpful to have a mock-up of a python implementation of a UFuncImpl
and on how one would use it for an existing function (the units case would be a good one), as well as how one would write a new UFunc
taking perhaps an existing UFuncImpl
(say, a sincos
implementation). (This echoes a point made by @mattip in the original thread...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess you could see it that way with an __init__(self, ufunc, dtypes)
, although, I do not think such an init has to technically be exposed to the user. Anyway, I have to think about trying to clear this up, I was not sure I like the mental model around having an __init__
, it somewhat implies there is a full blown (subclass-able?) class, something that I would like to avoid at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the way it is exposed to a python user does not have to be decided now. My main comment here was really that I found the present text confusing (unlike the rest). Might it make sense to move this piece to a separate very preliminary discussion of how things might be exposed in python?
Or, not as good but perhaps more practical, remove it and just have a note up front that a python implementation is not yet fully considered but that it is attempted to make sure nothing would stop it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have expanded/refactored this section, and more importantly tried to add a super-highlevel-get-the-feel for things section at the start, which happens to include resolve_impl
(I don't know how much that helps, and I guess you are the wrong person to ask by now).
In any case, this section is restructured, and I tried to point out that it is a bit nitty-gritty for the interested reader (I am unsure about it, but it helped me be clear about who has what information)
inputs and tries again. This is well defined for most mathematical | ||
functions, but can be disabled or customized if necessary. | ||
|
||
2. Users can *register* new Promoters just as they can register new UFuncImpl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, registration of UFuncImpl
has not really been discussed much.
return NotImplemented | ||
|
||
# Could check that res[1] is actually Int64 | ||
return ufunc.resolve_impl(tuple(res)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock-up is very useful, but would be even better if resolve_impl
had already been introduced! (perhaps via the larger mock-up I wondered about above).
|
||
* How ``CastingImpl`` lookup, and thus the decision whether a cast is possible, | ||
is defined. (This is speed relevant, although mainly during a transition | ||
phase where UFuncs where NEP 43 is not yet implemented). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double where
PyUFuncObject *ufunc, PyArray_DTypeMeta *DTypes[nargs], | ||
PyArray_Descr *in_dtypes[nargs], PyArray_Descr *out_dtypes[nargs]); | ||
|
||
setting the desired output descriptors, and possibly modifying the input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - the "and possibly modifying the input" here was what made me think that changes were made in-place. Maybe rephrase this?
flagging (e.g. warnings) if desired. | ||
|
||
* A flag to ask NumPy to perform floating point error checking (after custom | ||
setup and before user teardown). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt this error checking always happen ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many integer loops or loops working with Python objects only, do not really have to. And apparently floating point error checking can be a bit slow on some hardware, so I think we can create such an option. Although, I guess it would be fair to add that later as well.
user input descriptors with descriptors matching the wrapped loop. | ||
It must be possible to *view* the inputs as the output. | ||
* ``wrap_outputs(Tuple[DType]: input_descr) -> Tuple[DType]`` replacing the | ||
resolved descriptors with with the desired actual loop descriptors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering how this handles a case like np.add(UnitFloat64, Unit[Float64["km"]) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good point, I should explain that/give an example. For Units this uses .view(np.float64)
in the inner loop effectively. In the outer loop, the descriptors will be set to (Unit[Float64]("m"), Unit[Float64["m"], Unit[Float64]("m"))
, so the casting machinery has to convert the "km"
to "m"
.
@hameerabbasi (and quite honestly everyone). While I will iterate on it in the next days, I think the big picture stuff is there and should be fine to read, so please do whenever you have time, independently of me updating it soon. It will require more iterations in any case, delaying the start of the discussion just means that my iterations are more annoying, slower and less thorough. |
was already set previously it will be stored. Any previous error | ||
will be cleared before calling ``teardown``. | ||
The return value of the inner-loop is passed in to allow simple error | ||
flagging (e.g. warnings) if desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for those who already read this. I think I will modify this. I am starting to be convinced that teardown
does not need to do any error-related work, and that it may be an anti-pattern. Floating point errors are special, since they require this. As such, we could allow an error return, but should discourage it, since its much like deleting a Python object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - I like your example up front! Just a few super-nitpicky comments as I was reading through.
|
||
1. For the above to work, we must also define:: | ||
|
||
np.positive.resolve_impl(MyDateTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just MyDType
here and below? Or in the sentence above, "1. For the above to work for, e.g., a MyDateTime
DType, we must also define"
|
||
res = np.positive(arr) | ||
|
||
could be implemented (conceptional) as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conceptionally
|
||
add_impl = np.add.resolve_impl(type(arr1.dtype), type(arr2.dtype)) | ||
|
||
This NEP strives to define the minimal set of API necessary to achieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete "set of" (i.e., just "minimal API")
|
||
This is described in the "Promotion and Dispatching" section and | ||
step 2 in the list included in the next section | ||
(The Steps involved in a UFunc call). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just link "next section" in rst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only read this partway through as of yet.
Detailed Description | ||
-------------------- | ||
|
||
NEP 41 and NEP 42 layed out the general proposed structure into DType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEP 41 and NEP 42 layed out the general proposed structure into DType | |
NEP 41 and NEP 42 laid out the general proposed structure into DType |
|
||
4. Preparing the actual iteration. This step is largely handled by ``NpyIter`` (the iterator). | ||
|
||
* Allocate all ouputs and potentially copies (or buffers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would be a copy of an output be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted it. For example its the implementation for overlap protection and also for very small arrays more efficient (dunno if we use it for anything else but scalars).
But: It just doesn't matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'd front-load the use-cases for this NEP instead of front-loading the implementation. I realize that separating the two is hard; and also that as developers who implement things, we tend to focus more on implementation.
However, this makes the end-user zone out if they don't understand, and we lose their interest. See this discussion for details.
Although, this ``positive_impl`` and ``resolve_impl`` is required for certain | ||
functionality, the following code:: | ||
|
||
res = positive_impl(arr) | ||
|
||
is not part of this NEP, as it is not central to the proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly move to a non-goals section.
could be implemented (conceptionally) as: | ||
|
||
positive_impl = arr.dtype.positive | ||
res = positive_impl(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this isn't what happens, and not something that the NEP will modify (as you say below), then I'd remove it completely or replace it by a more accurate statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hameerabbasi, I am trying to give the reader an idea/concept of what a multi-method is (and how it comes to into play with ufuncs). Starting with normal methods, seemed like a fair way to do that to me? Maybe the sentence afterwards can say that this is not how it can be done in NumPy for reasons 1 and 2. (Although, I have only semi appetite to discuss the possibility of an __array_ufunc__
like approach, unless someone insists on it being listed as a rejected alternative.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the following:
# find_actual_implementation is what we want to achieve for custom dtypes
# and ufuncs
positive_impl = find_actual_implementation(np.positive, arr.dtype)
res = positive_impl(arr)
I just don't want to give users who gloss over this the perception that positive
is a dtype
slot. It's best to choose an example which doesn't require you to explain that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I wanted the concept of "method" in there I guess, but maybe that is just not necessary.
*TODO:* we need to briefly mention registration, even if the details of | ||
how to register it are in the specs or even later! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should. API decisions should come before implementation details. 😄
Its not quite settled, I don't quite like the `npy_intp *` as a default "user-data" yet, but I think I may warm up to it, and I think some kind of default like that is really very convenient. Split out `user_data *`, since I think Marten has a point that separating "constant" and volatile data is conceptionally easier.
…romotion was more fleshed out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a link to the renedered document https://766-5652112-gh.circle-artifacts.com/0/doc/neps/_build/html/nep-0043-extensible-ufuncs.html.
I think we should merge this soon, there is an open typo suggestion and I made a suggestion around strided_loop
, but feel free to ignore and we can iterate after this is merged.
# data: Pointers to the array data. | ||
# strides: strides to iterate all elements | ||
num_chars1 = context.descriptors[0].itemsize | ||
num_chars2 = context.descriptors[0].itemsize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need dimension as well as strides. If dimensions are in the context, then strides could be as well. Also # n: Number of
seems like a fragment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a 1-D loop, so I only need the number of elements to visit in the loop (which is missing). One outstanding small question is whether we should pass the user_data
with the context or explicitly. (There is a downside that some fields of context
shouldn't be used in resolve_descriptors
since it makes sense to call that without doing any actual operation.)
I will give it a pass today, and Ross as well and then plan to merge this to have a first rendered draft.
Co-authored-by: Dave Hirschfeld <dave.hirschfeld@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is obviously quite the undertaking - thanks @seberg for taking the time to compile all this information, and for taking a lot of care in the communication of the concepts!
The only high-level comment that I have is that I would've found it useful if the "steps in a ufunc call" were higher up in the document, as this provides a nice scaffolding for all of the subsequent pieces that are addressed individually. It's not a show-stopper as there are forward references to that section in some places.
|
||
A UFunc call is split into the following steps: | ||
|
||
1. Handling of ``__array_ufunc__`` for container types, such as a Dask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the other enumerated points start with a italicized summary, maybe you could do the same here for consistency, e.g. 1. *Handle ``__array_ufunc__`` protocol*
For example, adding a ``float32`` and a ``float64`` is implemented by | ||
first casting the ``float32`` to ``float64``. | ||
|
||
3. *``dtype`` resolution:* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this step relevant only for parametric dtypes? If so then maybe consider renaming this step to something like e.g. "dtype parameter resolution". Otherwise conceptual boundary(ies) between "promotion" and "dtype resolution" are not very clear to me.
We use the ``class`` syntax to describe the information required to create | ||
a new ``ArrayMethod`` object: | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type hints for the arguments are backwards in this code block: I think it's supposed to be arg_name : type
. Not really a big deal since this is mostly for the purposes of illustration!
While dispatching requires looking up the ``ArrayMethod`` registered for | ||
the matching DTypes, requires additional definitions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there's a problem with the wording here
Additionally, to changes in the resulting dtype, do not add for example | ||
faster but less precise loops/promoter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording here is a bit unclear - I'm not sure I'm understanding this correctly, but what I got out of it was: "For example, do not add faster but less-precise promotions in addition to changing the resulting dtype".
Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
Thanks for the review especially Matti, Ross and of course Marten! Going to put this in so that we have an easy to read draft, but of course this is will not be the last iteration. |
What is the state of this? The draft of this is now in master, but it is not considered an accepted NEP yet? |
I am not sure we ever marked it as accepted (which we should do with the note below, probably). The API probably moved a bit since then, but yes, I would like most of this to be available in NumPy 2.0. We are using the API internally quite a bit. I will note that there is still a chance of doing some changes, or omitting some functions to allow easier changes in the near future. I am still considering one larger change mainly: Re-adding the static The new API is available for experimentation for a long time via |
Is there a python api already? I don't expect to need c code. Some more context should you be interested: |
I have no plans for Python API for the foreseeable future, it would need a higher level hook to do well probably. Yes, you could add that, but no it doesn't exist. |
@seberg - I think an ability to pass on arbitrary information to the inner loop remains a good idea. E.g., using old-style I have used it to create "ufunc chains" where the inner loop executes other inner loops in order - for speed-ups comparable to numexpr without having to describe operations in strings (see https://github.com/mhvk/chain_ufunc). And of course, you saw that it was somewhat handy for the gufunc fft/ifft routines in #25536... But the main problem in commenting is that while I reviewed quite a bit, I haven't yet used the new API, so I don't really know how it would work in practice... So take this with a pinch of salt! |
There is still quite a bit to be done here. However, some of the important things for discussion are there I think, so putting it out here to work on here and hope for feedback.
One of the thoughts I had with respect to #12518 is to pass in a struct (which could grow) with all of the important information. The main tricky thing is to decide on: