New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Expose ufunc.resolve_dtypes
and strided loop access
#22422
Conversation
8183edf
to
d25c71c
Compare
OK, this should be done now. Test coverage may not be perfect yet (and I may follow up) – but it is niche API (and besides The API proposed here is intentionally minimal (i.e. only |
e8f5f5c
to
c752aba
Compare
The API here designed is "future" in the sense that it implementes NEP 50 and exposes loop specializations as per NEP 43 which is barely used by NumPy itself at this point. Due to the fact that NEP 50 is not implemented (or rather finalized) this API is not ideal and creates dummy-arrays internally so that it is not much faster than if the user created dummy arrays to probe the correct result.
c752aba
to
9c82567
Compare
@seberg Many thanks for working on this functionality. I'll raise this PR with the Numba folks at the Numba triage meeting and/or public meeting tomorrow.
I think that this is fine as it's not clear yet if anything else is needed. Numba (which will consume this API) has to version code against NumPy versions already so the impact isn't huge. It'd obviously be good at some point to make the API stable, but it should be fine for the Numba use case in mind.
This seems reasonable and reflects what is happening internally. From a Numba perspective, the stride information is not directly part of the type system so probably cannot be used to work out a stride-specialised loop to run unless this is done at runtime along with a GIL acquisition penalty.
I'd hope for this to be the case! Thanks again! |
@stuartarchibald Any update? |
I've taken a closer look at the inner loop behaviours. I think from a Numba point of view there's going to be issues with baking in addresses from PyCapsules if that becomes a necessity, but that's really an issue for Numba to resolve (and it probably indicates that it ought to generate its own implementation). I think the proof will be in the testing of this, I've opened numba/numba#8538 to track on the Numba side. Is there a preference with regards to providing feedback from consuming this API? Would it be more useful for the Numba folks to build out and test this patch ahead of merge, or for it to be reviewed as-is and then Numba folks test against |
Even if I intend things to be mostly underscored because it is not generally useful, I would be fine to risk that we need to deprecate and replace the two underscored functions again. My main concern for moving forward is the API of
Hmmm, note that the capsule needs to be held on to on a per-call level (since it holds some per-call data). I am not surprised if that requires new infrastructue in Numba, but would hope that it is not particularly difficult to make happen. |
From a Numba point of view, I think the main use will be the
Indeed, I think this is essentially another variant on closing over addresses and working out how to handle them being baked in at compile time. I am not sure it's actually going to be needed though as I seem to recall there being somewhere between zero and a few cases of Numba using NumPy's inner loops. This change is probably motivation to ensure that it is zero. |
*/ | ||
if (signature[0] == NULL && out == NULL) { | ||
/* | ||
* For integer types --- make sure at least a long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long
changes size depending on the platform, would it not be better to be explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this code is just moved. I.e. changing it is probably tied to thinking about changing the default integer type (although this path for sum
and prod
could be changed separately maybe)/
@@ -2802,7 +2836,7 @@ reducelike_promote_and_resolve(PyUFuncObject *ufunc, | |||
* (although this should possibly happen through a deprecation) | |||
*/ | |||
if (resolve_descriptors(3, ufunc, ufuncimpl, | |||
ops, out_descrs, signature, NPY_UNSAFE_CASTING) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here an below, you are making this a choice that can be made at a higher level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Right now reductions always use unsafe casting. They still do here, but I thought that resolve_dtypes
should maybe not.
The downside is that if someone now hardcodes resolve_dtypes(..., casting="unsafe", reduction=True)
and NumPy eventually changes the "same_kind"
, they also have to update.
Maybe I should do it casting=None
and if not given default to "unsafe" (our default) when reduction=True
? It did seem OK to support casting
for resolve_dtypes
in reductions, which is why this is needed/useful.
goto fail; | ||
} | ||
|
||
return ufuncimpl; | ||
|
||
fail: | ||
for (int i = 0; i < 3; ++i) { | ||
Py_DECREF(out_descrs[i]); | ||
Py_CLEAR(out_descrs[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because the input may be NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can't be NULL here, but I want to make sure they are NULL'ed for the caller to not clean up a second time. Py_CLEAR
is just one way to do it (I think you are right and it also does a NULL check).
@@ -3094,7 +3127,8 @@ PyUFunc_Accumulate(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out, | |||
|
|||
PyArray_Descr *descrs[3]; | |||
PyArrayMethodObject *ufuncimpl = reducelike_promote_and_resolve(ufunc, | |||
arr, out, signature, NPY_TRUE, descrs, "accumulate"); | |||
arr, out, signature, NPY_TRUE, descrs, NPY_UNSAFE_CASTING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of style, I'd tend to put the first argument on the second line as well. I'm curious as to what clang_format
does in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for a time I had the habit of keeping self
on its own (due to the idea of formatting errors like that).
I don't mind changing, but maybe not here since it didn't really change?
@@ -3511,7 +3545,8 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind, | |||
|
|||
PyArray_Descr *descrs[3]; | |||
PyArrayMethodObject *ufuncimpl = reducelike_promote_and_resolve(ufunc, | |||
arr, out, signature, NPY_TRUE, descrs, "reduceat"); | |||
arr, out, signature, NPY_TRUE, descrs, NPY_UNSAFE_CASTING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the choice of NPY_UNSAFE_CASTING
do here? Is it likely to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It keeps the current default. Yes, I think it would be nice to change eventually. IIRC the main reason for "unsafe" was probably that logic ufuncs needed it, but I have a different solution for them now.
npy_bool no_floatingpoint_errors; | ||
PyArrayMethod_Context _full_context; | ||
PyArray_Descr *_descrs[]; | ||
} ufunc_call_info; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a local structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is local and partially exposed via the capsule and documented in the function docstring.
I was considering putting the public part into experimental_dtype_api.h
, but it did not seem very useful it the API is expected to still change a bit with versions (and unlike experimental_dtype_api.h
I don't want to expect compiling against a specific NumPy version).
free_ufunc_call_info(PyObject *self) | ||
{ | ||
ufunc_call_info *call_info = PyCapsule_GetPointer( | ||
self, "numpy_1.24_ufunc_call_info"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a searchable comment help here? I ask because of the explicit versioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind if this is not updated on every release, so long that it still warns that it might be updated on any release? Adding a comment, but not sure it is searchable...
PyObject *capsule = PyCapsule_New( | ||
call_info, "numpy_1.24_ufunc_call_info", &free_ufunc_call_info); | ||
if (capsule == NULL) { | ||
PyObject_Free(call_info); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the capsule will handle the free when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will call free_ufunc_call_info
once created, which includes this.
numpy/core/src/umath/ufunc_object.c
Outdated
context->descriptors[i] = operation_descrs[i]; | ||
} | ||
|
||
Py_SETREF(result, PyTuple_Pack(2, result, capsule)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, reusing result here feels a bit too clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good nose, not that it is much prettier now, but result
cannot be reused like that due to the "finish" goto. It was a bug.
Just completed the first pass through this large PR, I probably missed some things. |
Let's put this in. I suspect it isn't yet in final form, but this is a step on the way. |
Thanks Sebastian. |
I am happy to just put it in (will make a small pass today on the other comments). The main thing I would like to get right is the API of OTOH, its super new API that won't be used massively tomorrow, so if numba finds something we probably can get away with calling it a bug-fix. |
This patch numba/numba#8544 removes the last few places where Numba was reliant on NumPy's ufunc inner loops. It prevents the "baking in addresses from/lifetime of PyCapsule" problem in Numba noted above. Once merged, should be in a position to try this API out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given this a review with most attention paid to the documentation and the ufunc.resolve_dtypes()
API. Most comments are just a few "typo" level things. Thanks for working on this!
numpy/core/src/umath/ufunc_object.c
Outdated
if (fixed_strides_obj == Py_None) { | ||
for (int i = 0; i < ufunc->nargs; i++) { | ||
fixed_strides[i] = NPY_MAX_INTP; | ||
} | ||
} | ||
if (PyTuple_CheckExact(fixed_strides_obj) | ||
&& PyTuple_Size(fixed_strides_obj) == ufunc->nargs) { | ||
for (int i = 0; i < ufunc->nargs; i++) { | ||
PyObject *stride = PyTuple_GET_ITEM(fixed_strides_obj, i); | ||
if (PyLong_CheckExact(stride)) { | ||
fixed_strides[i] = PyLong_AsSsize_t(stride); | ||
if (error_converting(fixed_strides[i])) { | ||
return NULL; | ||
} | ||
} | ||
else if (stride == Py_None) { | ||
fixed_strides[i] = NPY_MAX_INTP; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be if...else if.... else error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wooops, also the outer if had it missing, added (and a test).
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
Also add tests (including for a bad capsule)
Reusing `result` doesn't work with a single "finish" goto, since result must be NULL on error then. This copies the result over for continuation, which is maybe also a bit awkward, but at least not buggy...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @seberg, I've taken a look and there's a couple more minor things to perhaps consider. Many thanks.
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
8cc7b29
to
c973fc9
Compare
Thanks @seberg |
The API here designed is "future" in the sense that it implementes NEP 50 and exposes loop specializations as per NEP 43 which is barely used by NumPy itself at this point.
Due to the fact that NEP 50 is not implemented (or rather finalized) this API is not ideal and creates dummy-arrays internally so that it is not much faster than if the user created dummy arrays to probe the correct result.
Marking as WIP, since it should get very basic tests and I also intended to expose reductions (which is subtly different and should not require the first argument as input to the type resolution).(Sorry, had to be off today mostly, so did not finish it as hoped but wanted to put this out here.)
I don't think I can guarantee fulls table ABI at this point (nicely). It would be possible, but we still need to evolve, so some version passing would be needed. Thus, the version is instead simply encoded in the PyCapsule and anyone adopting will have to update their code (mildly) when the version is changed.
@stuartarchibald this is the API that I can provide right now. It might seem clumsy to have two calls, but fixed strides may become more of thing in the future and other kwargs may also be relevant.
(NumPy cannot decide on fixed strides before knowing whether casts are necessary, so it has to be split into two calls internally, NEP 43 details the steps.)
resolve_dtypes
however may well widely useful beyond Numba.