-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: implement explicit translation for NEP-18 #2089
Conversation
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for defining this translation layer. You mention performance from indirection, but that ought to be pretty minimal, while the benefits of a translation layer are pretty significant.
We'd like to synchronize the Awkward arguments and the NumPy arguments as much as possible, though, because we want to minimize user surprise. For instance,
NumPy's
kind
vs Awkward'sstable
forak.sort
should eventually match NumPy's arguments better, though for sorting, I was under the impression that their arguments are in flux. (They don't want to explicitly say what algorithm they're using, such as "heapsort"
.)
In this PR, it looks like you added a translation layer for every NEP-18 overload, not just the ones that have different arguments. No, I don't see any examples that have exactly the same arguments (and maybe we'd always have highlevel
and behavior
, while NumPy would never have these two arguments).
Yes, I like to make it clear that this does have an impact, but only if users are doing things in a loop. It's more of a "take note" rather than "this is a regression".
Definitely (in the long run). It should be a goal that we avoid un-motivated divergences. Sorting might be an aberration here, as you point out.
The idea with these translation functions is that we're not directly coupled to NumPy's interface, which may have legacy arguments or other features that we don't support. If we do this for all functions, we'll keep some symmetry on our end. I think you're right, that we should try and use this predominantly for compatibility than for needlessly deviating from the NumPy API.
Yes, and the main culprit here is
Should we make a change here (in deprecation cycles)? |
It doesn't have to be done now. Eventually, I'd like to replace the sorting kernels—which currently defer to C++ sorting algorithms—with something that would be more like what we'll need to do in CUDA, but CUDA sorting is something that will require extra thought in itself. |
This PR closes #2088 by defining explicit NEP-18 translations ("implementations"). These map the NumPy argument spec onto the Awkward function, and translate incompatible arguments (e.g. NumPy's
kind
vs Awkward'sstable
forak.sort
).The TLDR of the motivation for this PR is that some NumPy functions like
np.std
have incompatible differences with our implementations e.g.ak.std
. By explicitly defining a translation, we can decouple the two APIs.This adds a slight runtime cost; now a call to
np.sort()
jumps through two additional functions. However, this is not an area of performance that we should care about (and CPython 3.11 speeds function calls up slightly, so it's a solved problem /s).We could also use this mechanism to define the translations for non-Awkward overloaded NumPy functions, which just need better translations that our default heuristic-based approach takes.