-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Refactor reductions to use NEP 43 style dispatching/promotion #18905
Conversation
4160f06
to
27d54f3
Compare
7e221c6
to
ca6cafe
Compare
fc0ca82
to
d92c22a
Compare
(Note that I should "backport" some fixups for the dispatching code from my continuation on the reductions here) @mhvk I have one question for you, unfortunately. Running the astropy tests on the branch: main...seberg:ufunc-refactor-2021-reductions (which includes this, but also fixed up reductions and probably fixes some smaller issues here)... I am running into problems with astropy and I think the problem is that previously the Instead, the ufunc machinery finds out the exact loop/operation dtypes using the type-resolver and then checks against casting. Now I think
A similar one is:
So in the For |
@seberg - I guess there are two questions here:
Just to be sure, I use my own type resolver because (see https://github.com/liberfa/pyerfa/blob/c67e478c231faa1b54b52c995fc96877b1271062/erfa/ufunc.c.templ#L540; note the copied over code before...):
From what you write, the problem is related to the second item, though not so much the python int to int32, but rather casting of strings. Currently, it is essentially impossible to create a ufunc for string input, so instead I have structured dtypes that contain just a single field with a fixed-length string, and then I rely on unsafe casting to convert string input to that structured dtype (see
Maybe it is good to look at
With the new scheme, can one separately override the casting? Should that be done by the dtype, or by the ufunc? More generally, are string dtypes now possible as input for ufuncs? |
p.s. I'm sorry, but I've lost track a bit. Is there a description in some NEP or some particular place in the code that would help me understand how to new scheme should work? |
Thanks for the points! I have to let it sink a bit and think a bit more. Please don't feel compelled to work through all of the below! One thing I can offer is try and port one or both of these ufuncs to the experimental-user-dtypes. That will force me to know more clearly what
Hmmm, I am confused why python ints are a problem. But I guess they are at least when there are only Python scalars involved? From a casting perspective those sould be same-kind though, which is the default casting.
About what to do here: It seems the loops that have problems with casting, are always defined with structured voids. For those loops we always fall back to the old type resolver (it is impossible to skip it, as the fields might be modified/checked). So, I can probably get away with always skipping the casting check. (If its enough to skip the input casts, that might be nice, but it doesn't really matter.) If that is not good enough for some reason, expanding the old API slightly to allow you to set the "force input cast" flags should probably work. This should give you time to move to the new API without worrying about it. About the new system: Its part of NEP 43. But its a bit long maybe, and I am slightly changing it. (Some things just become clearer closer to the finish line...) I will try to give an example below, of how it might look like. But I am not sure if the casting, promotion, and type resolution are clear enough from this:
There are some limitation mainly with string and void though:
Again, since I am not quite sure how well, this works for you, I may have to try it for you if that helps. For now, I expect I can make the "old" mechanism work. |
To confirm: Skipping the casting check for loops that have flexible/parametric legacy dtypes (structured ones) makes the astropy test suite pass. I guess the main issue in the future is probably the |
@seberg - thanks for the examples. That new process looks really good, I'm quite excited to try to use it! I also checked your branch with
It is obviously totally fine if this no longer works, as long as I have a work-around! |
@seberg - on getting review for this PR: for me at least it would be really helpful to update the documentation as things are implemented and have some concrete examples of ufuncs that use the new type resolution. Or is this not yet possible? E.g., a new I'm not sure what the plan for review is, but if at all possible, do split this PR in pieces! Even with your nice & liberal comments, it is very difficult to get an overall picture. Some suggestions for making this PR more manageable as I went through trying to get a sense:
Should add that I think this really is quite exciting! I think (g)ufunc are such a nice idea and it will be wonderful to have them be easier/more logical. |
@seberg - a note with probably insufficient thought: if you are using the tuple hash, might it be possible to avoid duplicating the hashing by actually creating tuples and calculating their hashes, and then for finding use the well-optitimized standard |
About the many small changes: There are probably quite a lot that crept in than, that should not have have/could be split out pretty well :/. Masked inner strided-loops: It seemed weird to split off, but it may be possible (that would effectively mean that any masked ufunc must keep using the old machinery.) The hashing, dispatching/promotion, and legacy method things can maybe be partially broken out. It probably means using them solely in new unit-tests, but that is fine with me. The hashing itself was based on a try from one year ago. IIRC it was around 5-10% overhead difference, which isn't super much (mainly skipping tuple creation and the "tuple hash" inlines the identity hash, after all). But I am really not sure what (if any) speed advantage exists in this code. |
@seberg - thanks for considering the splitting up. If hashing did not have a large impact, postponing that would be an easy gain for simpler review here! Otherwise, just splitting off the move of things to |
da7e9e4
to
f3a8595
Compare
8d3b53b
to
0ecff43
Compare
This should be pretty much ready for review, now only with the changes for reductions left. That said, it is bigger than I thought, so I am considering splitting out things again (i.e. can do the logical promoters first, even if I only really need them for reductions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very nice indeed - even just the few added tests show the big improvement!
My comments are all tiny, and may have suffered from really being a little too tired for a proper review.... But I like how it is becoming even clearer that the DType change was a really good idea!
0ecff43
to
8cd51e9
Compare
This should be ready for a round of review. I think the main problem is probably now for me to see if I have a serious flaw in the caching mechanism. |
87c26af
to
3dba7ea
Compare
This also implements new promoters for the logical functions. The reasoning is that any logical function can always be handled by first casting to boolean without changing the meaning. This is true (unless the output is object), and already effectively used for `np.all`/`np.any`.
Mainly to see if this is indeed the problem, but the assert is not necessary, and it might be we end up with the wrong dtype when the input is a compatible (but different) integer already?
This did not actually trigger any problems, probably because the reduce fallback mode later just catches it anyway... On the upside, this probably fixes most of the caching issues ;)
At least the reduceat path seems to have been untested before. The slight change in code layout (added assert) is just to make the code slightly easier to read. Since otherwise it looks like there is an additional `else` branch when `out` is given.
More fixups are coming, the biggest change is that the error message is now improved when a reduction makes no sense such as `np.subtract.reduce(np.array([1, 2, 3], dtype="M8[s]"))` where input and output cannot have the same descriptor. (Some more fixups still to go)
In particular, this covers a casting error that currently cannot be hit for normal ufuncs, because they already check casting during the legacy dtype resolution (which is called first).
3dba7ea
to
27f3b03
Compare
I would love to move this ahead. While I think the caching is not ideal especially for some reduction cases, the things that I tried (logical ufuncs) slowed down so little that it just does not matter. Partially probably because reductions just have huge overhead, but basically I don't think it is a problem right now even if it is not ideal and could be refined... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in resolve_implementation_info
. I wonder if there is a way we could write specific unit tests for it, and document what it is meant to do in pseudo-code (or a pure python implementation?).
* private for now (also it is very limited anyway). | ||
* There is one "exception". NA aware dtypes cannot cast to bool | ||
* (hopefully), so the `??->?` loop should error even with this flag. | ||
* But a second NA fallback loop will be necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add to the comment "the ??->?
loop should error even with this flag" which test hits this code path?
@@ -267,8 +267,39 @@ resolve_implementation_info(PyUFuncObject *ufunc, | |||
* the subclass should be considered a better match | |||
* (subclasses are always more specific). | |||
*/ | |||
/* Whether this (normally output) dtype was specified at all */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where op_dtypes[i]
is NULL
for input? Maybe assert i >= nin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am using it currently for reductions. But that is the only possibility so I could assert that fairly constrained case (i >= nin || (i == 1 && nin == 2 && nout == 1)
.
strcmp(ufunc->name, "logical_or") == 0 || | ||
strcmp(ufunc->name, "logical_and") == 0 || | ||
strcmp(ufunc->name, "logical_xor") == 0)) { | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fragile, but it will be refactored and removed at some point, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although I admit "at some point" is possibly not very soon. Although, maybe these could be some of the first functions to be moved to the new API.
(The prime use-case would be moving np.equal
to be full featured enough that ==
needs no – or almost no – special cases).
I think this is a nice cleanup. Besides |
Thanks @seberg |
Refactor reductions to use NEP 43 style dispatching/promotion
This also implements new promoters for the logical functions.
The reasoning is that any logical function can always be handled
by first casting to boolean without changing the meaning.
This is true (unless the output is object), and already effectively
used for
np.all
/np.any
.TODOs:
np.fmod.reduce([True, False])
used to fail, unlike the non-reduce version (just check briefly, probably OK).OLD versions:
This is now a "work towards this" PR to show all the changes to refactor ufuncs (or most of them) to implement NEP 43 style dispatching. This includes reductions.
The cache still grows indefinitely and does not do reference counting. Both seems OK to me for now.
ufunc.at
is not implemented here (but not very important).Original notes
I marked this as a work in progress, but at this point it has some problems but works fine. Good chunks won't really be tested much yet for the simple reason that we only have the old dtypes available and those do not have any special
isinstance()
checks or similar.The
longlist of things that are still a bit shaky (but maybe not all that bad):reductions
,accumulation
,ufunc.at
andreduceat
do not use the new machinery. I will probably do that next. I will keep it this way to not blow up the diff even more.In case someone will ask: The new code is typically faster, but if it has to fall back to legacy behaviour (mostly because there is value-based casting involved), it will be slower. The slowdown seems smaller or similar to the speed-up that 1.20.x has over 1.19.x due to my buffer changes, so I am NOT worried about it. Lets get rid of the unnecessary value-based complexity instead.