ENH: Expose `ufunc.resolve_dtypes` and strided loop access #22422

seberg · 2022-10-10T17:19:58Z

The API here designed is "future" in the sense that it implementes NEP 50 and exposes loop specializations as per NEP 43 which is barely used by NumPy itself at this point.

Due to the fact that NEP 50 is not implemented (or rather finalized) this API is not ideal and creates dummy-arrays internally so that it is not much faster than if the user created dummy arrays to probe the correct result.

Marking as WIP, since it should get very basic tests and I also intended to expose reductions (which is subtly different and should not require the first argument as input to the type resolution).
(Sorry, had to be off today mostly, so did not finish it as hoped but wanted to put this out here.)

I don't think I can guarantee fulls table ABI at this point (nicely). It would be possible, but we still need to evolve, so some version passing would be needed. Thus, the version is instead simply encoded in the PyCapsule and anyone adopting will have to update their code (mildly) when the version is changed.

@stuartarchibald this is the API that I can provide right now. It might seem clumsy to have two calls, but fixed strides may become more of thing in the future and other kwargs may also be relevant.
(NumPy cannot decide on fixed strides before knowing whether casts are necessary, so it has to be split into two calls internally, NEP 43 details the steps.)

resolve_dtypes however may well widely useful beyond Numba.

seberg · 2022-10-11T10:38:54Z

OK, this should be done now. Test coverage may not be perfect yet (and I may follow up) – but it is niche API (and besides resolve_dtypes clearly "take care when using this" anyway).

The API proposed here is intentionally minimal (i.e. only signature and not the more convenient dtype), because I thought it is good to expose a bit of how NumPy works internally.
Passing always a tuple (including out) may be a bit inconvenient often, so I could imagine tweaking that a little, though, if others prefer that.

The API here designed is "future" in the sense that it implementes NEP 50 and exposes loop specializations as per NEP 43 which is barely used by NumPy itself at this point. Due to the fact that NEP 50 is not implemented (or rather finalized) this API is not ideal and creates dummy-arrays internally so that it is not much faster than if the user created dummy arrays to probe the correct result.

stuartarchibald · 2022-10-17T10:51:12Z

@seberg Many thanks for working on this functionality. I'll raise this PR with the Numba folks at the Numba triage meeting and/or public meeting tomorrow.

I don't think I can guarantee fulls table ABI at this point (nicely). It would be possible, but we still need to evolve, so some version passing would be needed. Thus, the version is instead simply encoded in the PyCapsule and anyone adopting will have to update their code (mildly) when the version is changed.

I think that this is fine as it's not clear yet if anything else is needed. Numba (which will consume this API) has to version code against NumPy versions already so the impact isn't huge. It'd obviously be good at some point to make the API stable, but it should be fine for the Numba use case in mind.

@stuartarchibald this is the API that I can provide right now. It might seem clumsy to have two calls, but fixed strides may become more of thing in the future and other kwargs may also be relevant. (NumPy cannot decide on fixed strides before knowing whether casts are necessary, so it has to be split into two calls internally, NEP 43 details the steps.)

This seems reasonable and reflects what is happening internally. From a Numba perspective, the stride information is not directly part of the type system so probably cannot be used to work out a stride-specialised loop to run unless this is done at runtime along with a GIL acquisition penalty.

resolve_dtypes however may well widely useful beyond Numba.

I'd hope for this to be the case!

Thanks again!

charris · 2022-10-23T22:38:04Z

@stuartarchibald Any update?

stuartarchibald · 2022-10-25T09:48:20Z

@stuartarchibald Any update?

I've taken a closer look at the inner loop behaviours. I think from a Numba point of view there's going to be issues with baking in addresses from PyCapsules if that becomes a necessity, but that's really an issue for Numba to resolve (and it probably indicates that it ought to generate its own implementation).

I think the proof will be in the testing of this, I've opened numba/numba#8538 to track on the Numba side. Is there a preference with regards to providing feedback from consuming this API? Would it be more useful for the Numba folks to build out and test this patch ahead of merge, or for it to be reviewed as-is and then Numba folks test against main and report issues (if any)?

seberg · 2022-10-25T10:21:06Z

Even if I intend things to be mostly underscored because it is not generally useful, I would be fine to risk that we need to deprecate and replace the two underscored functions again.

My main concern for moving forward is the API of resolve_dtypes(). Although the actual use of the other two functions is more interesting :).

there's going to be issues with baking in addresses from PyCapsules if that becomes a necessity, but that's really an issue for Numba to resolve

Hmmm, note that the capsule needs to be held on to on a per-call level (since it holds some per-call data). I am not surprised if that requires new infrastructue in Numba, but would hope that it is not particularly difficult to make happen.

stuartarchibald · 2022-10-25T10:39:19Z

Even if I intend things to be mostly underscored because it is not generally useful, I would be fine to risk that we need to deprecate and replace the two underscored functions again.

My main concern for moving forward is the API of resolve_dtypes(). Although the actual use of the other two functions is more interesting :).

From a Numba point of view, I think the main use will be the resolves_dtypes API. As noted above, if there's use of the "underscored" functions then Numba already versions code against NumPy versions so it would be acceptable. Worse case, Numba's current ufunc signature resolution is not ideal but is sufficient and will still be present.

there's going to be issues with baking in addresses from PyCapsules if that becomes a necessity, but that's really an issue for Numba to resolve

Hmmm, note that the capsule needs to be held on to on a per-call level (since it holds some per-call data). I am not surprised if that requires new infrastructue in Numba, but would hope that it is not particularly difficult to make happen.

Indeed, I think this is essentially another variant on closing over addresses and working out how to handle them being baked in at compile time. I am not sure it's actually going to be needed though as I seem to recall there being somewhere between zero and a few cases of Numba using NumPy's inner loops. This change is probably motivation to ensure that it is zero.

charris · 2022-10-25T18:04:43Z

numpy/core/src/umath/ufunc_object.c

+      */
+    if (signature[0] == NULL && out == NULL) {
+        /*
+         * For integer types --- make sure at least a long


long changes size depending on the platform, would it not be better to be explicit?

Yes, but this code is just moved. I.e. changing it is probably tied to thinking about changing the default integer type (although this path for sum and prod could be changed separately maybe)/

charris · 2022-10-25T18:07:19Z

numpy/core/src/umath/ufunc_object.c

@@ -2802,7 +2836,7 @@ reducelike_promote_and_resolve(PyUFuncObject *ufunc,
     * (although this should possibly happen through a deprecation)
     */
    if (resolve_descriptors(3, ufunc, ufuncimpl,
-            ops, out_descrs, signature, NPY_UNSAFE_CASTING) < 0) {


So here an below, you are making this a choice that can be made at a higher level?

Yeah. Right now reductions always use unsafe casting. They still do here, but I thought that resolve_dtypes should maybe not.
The downside is that if someone now hardcodes resolve_dtypes(..., casting="unsafe", reduction=True) and NumPy eventually changes the "same_kind", they also have to update.

Maybe I should do it casting=None and if not given default to "unsafe" (our default) when reduction=True? It did seem OK to support casting for resolve_dtypes in reductions, which is why this is needed/useful.

charris · 2022-10-25T18:10:55Z

numpy/core/src/umath/ufunc_object.c

        goto fail;
    }

    return ufuncimpl;

  fail:
    for (int i = 0; i < 3; ++i) {
-        Py_DECREF(out_descrs[i]);
+        Py_CLEAR(out_descrs[i]);


Is this because the input may be NULL?

They can't be NULL here, but I want to make sure they are NULL'ed for the caller to not clean up a second time. Py_CLEAR is just one way to do it (I think you are right and it also does a NULL check).

charris · 2022-10-25T18:12:58Z

numpy/core/src/umath/ufunc_object.c

@@ -3094,7 +3127,8 @@ PyUFunc_Accumulate(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out,

    PyArray_Descr *descrs[3];
    PyArrayMethodObject *ufuncimpl = reducelike_promote_and_resolve(ufunc,
-            arr, out, signature, NPY_TRUE, descrs, "accumulate");
+            arr, out, signature, NPY_TRUE, descrs, NPY_UNSAFE_CASTING,


As a matter of style, I'd tend to put the first argument on the second line as well. I'm curious as to what clang_format does in this case.

Yeah, for a time I had the habit of keeping self on its own (due to the idea of formatting errors like that).
I don't mind changing, but maybe not here since it didn't really change?

charris · 2022-10-25T18:14:17Z

numpy/core/src/umath/ufunc_object.c

@@ -3511,7 +3545,8 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind,

    PyArray_Descr *descrs[3];
    PyArrayMethodObject *ufuncimpl = reducelike_promote_and_resolve(ufunc,
-            arr, out, signature, NPY_TRUE, descrs, "reduceat");
+            arr, out, signature, NPY_TRUE, descrs, NPY_UNSAFE_CASTING,


What does the choice of NPY_UNSAFE_CASTING do here? Is it likely to change?

It keeps the current default. Yes, I think it would be nice to change eventually. IIRC the main reason for "unsafe" was probably that logic ufuncs needed it, but I have a different solution for them now.

charris · 2022-10-25T18:15:31Z

numpy/core/src/umath/ufunc_object.c

+    npy_bool no_floatingpoint_errors;
+    PyArrayMethod_Context _full_context;
+    PyArray_Descr *_descrs[];
+} ufunc_call_info;


This is a local structure?

Yes it is local and partially exposed via the capsule and documented in the function docstring.
I was considering putting the public part into experimental_dtype_api.h, but it did not seem very useful it the API is expected to still change a bit with versions (and unlike experimental_dtype_api.h I don't want to expect compiling against a specific NumPy version).

charris · 2022-10-25T18:17:46Z

numpy/core/src/umath/ufunc_object.c

+free_ufunc_call_info(PyObject *self)
+{
+    ufunc_call_info *call_info = PyCapsule_GetPointer(
+            self, "numpy_1.24_ufunc_call_info");


Would a searchable comment help here? I ask because of the explicit versioning.

I don't mind if this is not updated on every release, so long that it still warns that it might be updated on any release? Adding a comment, but not sure it is searchable...

charris · 2022-10-25T18:26:52Z

numpy/core/src/umath/ufunc_object.c

+    PyObject *capsule = PyCapsule_New(
+            call_info, "numpy_1.24_ufunc_call_info", &free_ufunc_call_info);
+    if (capsule == NULL) {
+        PyObject_Free(call_info);


I assume the capsule will handle the free when needed.

Yes, it will call free_ufunc_call_info once created, which includes this.

charris · 2022-10-25T18:31:59Z

numpy/core/src/umath/ufunc_object.c

+        context->descriptors[i] = operation_descrs[i];
+    }
+
+    Py_SETREF(result, PyTuple_Pack(2, result, capsule));


Hmm, reusing result here feels a bit too clever.

Good nose, not that it is much prettier now, but result cannot be reused like that due to the "finish" goto. It was a bug.

charris · 2022-10-25T19:01:06Z

Just completed the first pass through this large PR, I probably missed some things.

charris · 2022-10-26T23:16:18Z

Let's put this in. I suspect it isn't yet in final form, but this is a step on the way.

charris · 2022-10-26T23:16:33Z

Thanks Sebastian.

seberg · 2022-10-27T07:20:55Z

I am happy to just put it in (will make a small pass today on the other comments). The main thing I would like to get right is the API of ufunc.resolve_dtypes().

OTOH, its super new API that won't be used massively tomorrow, so if numba finds something we probably can get away with calling it a bug-fix.

stuartarchibald · 2022-10-27T11:12:42Z

I am happy to just put it in (will make a small pass today on the other comments). The main thing I would like to get right is the API of ufunc.resolve_dtypes().

OTOH, its super new API that won't be used massively tomorrow, so if numba finds something we probably can get away with calling it a bug-fix.

This patch numba/numba#8544 removes the last few places where Numba was reliant on NumPy's ufunc inner loops. It prevents the "baking in addresses from/lifetime of PyCapsule" problem in Numba noted above. Once merged, should be in a position to try this API out.

stuartarchibald

I've given this a review with most attention paid to the documentation and the ufunc.resolve_dtypes() API. Most comments are just a few "typo" level things. Thanks for working on this!

numpy/core/_add_newdocs.py

numpy/core/src/umath/ufunc_object.c

stuartarchibald · 2022-10-27T11:05:54Z

numpy/core/src/umath/ufunc_object.c

+    if (fixed_strides_obj == Py_None) {
+        for (int i = 0; i < ufunc->nargs; i++) {
+            fixed_strides[i] = NPY_MAX_INTP;
+        }
+    }
+    if (PyTuple_CheckExact(fixed_strides_obj)
+            && PyTuple_Size(fixed_strides_obj) == ufunc->nargs) {
+        for (int i = 0; i < ufunc->nargs; i++) {
+            PyObject *stride = PyTuple_GET_ITEM(fixed_strides_obj, i);
+            if (PyLong_CheckExact(stride)) {
+                fixed_strides[i] = PyLong_AsSsize_t(stride);
+                if (error_converting(fixed_strides[i])) {
+                    return NULL;
+                }
+            }
+            else if (stride == Py_None) {
+                fixed_strides[i] = NPY_MAX_INTP;
+            }
+        }
+    }


Should this be if...else if.... else error

Wooops, also the outer if had it missing, added (and a test).

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

Also add tests (including for a bad capsule)

Reusing `result` doesn't work with a single "finish" goto, since result must be NULL on error then. This copies the result over for continuation, which is maybe also a bit awkward, but at least not buggy...

…docs)

stuartarchibald

Thanks for the updates @seberg, I've taken a look and there's a couple more minor things to perhaps consider. Many thanks.

numpy/core/src/umath/ufunc_object.c

numpy/core/tests/test_ufunc.py

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

mattip · 2022-11-16T16:19:20Z

Thanks @seberg

github-actions bot added the 01 - Enhancement label Oct 10, 2022

seberg force-pushed the expose-dtype-resolution-get-loop branch 2 times, most recently from 8183edf to d25c71c Compare October 11, 2022 08:53

seberg marked this pull request as ready for review October 11, 2022 10:32

stuartarchibald mentioned this pull request Oct 11, 2022

fmin and fmax incorrect output dtype numba/numba#8478

Open

2 tasks

seberg force-pushed the expose-dtype-resolution-get-loop branch from e8f5f5c to c752aba Compare October 11, 2022 12:19

seberg added 6 commits October 12, 2022 10:05

TST: Add basic tests for lowlevel access (including direct loop call)

5304031

MAINT: Move add/multiple reduction special case into promotion helper

ba69ac6

ENH: Allow reductions in np.add.resolve_dtypes

2179be2

BUG: ufunc.resolve_dtypes expects descrs to be valid even on error

f6f73c0

TST: Skip ufunc loop access if ctypes.pythonapi is unavailable

9c82567

seberg force-pushed the expose-dtype-resolution-get-loop branch from c752aba to 9c82567 Compare October 12, 2022 08:07

DOC: Add examples for ufunc.resolve_dtypes

36d7538

stuartarchibald mentioned this pull request Oct 25, 2022

Update ufunc loop signature resolution to use NumPy #22422 numba/numba#8538

Open

charris reviewed Oct 25, 2022

View reviewed changes

stuartarchibald reviewed Oct 27, 2022

View reviewed changes

seberg and others added 4 commits October 27, 2022 16:01

DOC: Apply Stuarts suggestions from code review

7a1f194

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

BUG: Fix error checking of _get_strided_Loop fixed_strides

1c64f9b

Also add tests (including for a bad capsule)

BUG: Fix _resolve_dtypes_and_context refcounting error returns

64b2d5a

Reusing `result` doesn't work with a single "finish" goto, since result must be NULL on error then. This copies the result over for continuation, which is maybe also a bit awkward, but at least not buggy...

DOC: Add comment for capsule which includes name (mainly to point to …

e18a236

…docs)

stuartarchibald reviewed Oct 27, 2022

View reviewed changes

numpy/core/src/umath/ufunc_object.c Show resolved Hide resolved

numpy/core/src/umath/ufunc_object.c Show resolved Hide resolved

numpy/core/tests/test_ufunc.py Show resolved Hide resolved

numpy/core/tests/test_ufunc.py Outdated Show resolved Hide resolved

stuartarchibald reviewed Oct 27, 2022

View reviewed changes

numpy/core/tests/test_ufunc.py Outdated Show resolved Hide resolved

MAINT: Adopt changes from Stuart's review

c973fc9

Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>

seberg force-pushed the expose-dtype-resolution-get-loop branch from 8cc7b29 to c973fc9 Compare October 27, 2022 16:47

seberg added this to the 1.24.0 release milestone Nov 16, 2022

mattip merged commit d428d45 into numpy:main Nov 16, 2022

seberg deleted the expose-dtype-resolution-get-loop branch November 16, 2022 19:23

stuartarchibald mentioned this pull request Nov 21, 2022

NumPy 1.24 support numba/numba#8464

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Expose `ufunc.resolve_dtypes` and strided loop access #22422

ENH: Expose `ufunc.resolve_dtypes` and strided loop access #22422

seberg commented Oct 10, 2022 •

edited

seberg commented Oct 11, 2022

stuartarchibald commented Oct 17, 2022

charris commented Oct 23, 2022

stuartarchibald commented Oct 25, 2022

seberg commented Oct 25, 2022

stuartarchibald commented Oct 25, 2022

charris Oct 25, 2022

seberg Oct 26, 2022

charris Oct 25, 2022

seberg Oct 26, 2022

charris Oct 25, 2022

seberg Oct 26, 2022

charris Oct 25, 2022

seberg Oct 27, 2022

charris Oct 25, 2022

seberg Oct 26, 2022

charris Oct 25, 2022

seberg Oct 26, 2022

charris Oct 25, 2022

seberg Oct 27, 2022

charris Oct 25, 2022

seberg Oct 27, 2022

charris Oct 25, 2022

seberg Oct 27, 2022

charris commented Oct 25, 2022

charris commented Oct 26, 2022

charris commented Oct 26, 2022

seberg commented Oct 27, 2022

stuartarchibald commented Oct 27, 2022

stuartarchibald left a comment

stuartarchibald Oct 27, 2022

seberg Oct 27, 2022

stuartarchibald left a comment

mattip commented Nov 16, 2022

ENH: Expose ufunc.resolve_dtypes and strided loop access #22422

ENH: Expose ufunc.resolve_dtypes and strided loop access #22422

Conversation

seberg commented Oct 10, 2022 • edited

seberg commented Oct 11, 2022

stuartarchibald commented Oct 17, 2022

charris commented Oct 23, 2022

stuartarchibald commented Oct 25, 2022

seberg commented Oct 25, 2022

stuartarchibald commented Oct 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Oct 25, 2022

charris commented Oct 26, 2022

charris commented Oct 26, 2022

seberg commented Oct 27, 2022

stuartarchibald commented Oct 27, 2022

stuartarchibald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuartarchibald left a comment

Choose a reason for hiding this comment

mattip commented Nov 16, 2022

ENH: Expose `ufunc.resolve_dtypes` and strided loop access #22422

ENH: Expose `ufunc.resolve_dtypes` and strided loop access #22422

seberg commented Oct 10, 2022 •

edited