Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Expose ufunc.resolve_dtypes and strided loop access #22422

Merged
merged 12 commits into from Nov 16, 2022

Conversation

seberg
Copy link
Member

@seberg seberg commented Oct 10, 2022

The API here designed is "future" in the sense that it implementes NEP 50 and exposes loop specializations as per NEP 43 which is barely used by NumPy itself at this point.

Due to the fact that NEP 50 is not implemented (or rather finalized) this API is not ideal and creates dummy-arrays internally so that it is not much faster than if the user created dummy arrays to probe the correct result.


Marking as WIP, since it should get very basic tests and I also intended to expose reductions (which is subtly different and should not require the first argument as input to the type resolution).
(Sorry, had to be off today mostly, so did not finish it as hoped but wanted to put this out here.)

I don't think I can guarantee fulls table ABI at this point (nicely). It would be possible, but we still need to evolve, so some version passing would be needed. Thus, the version is instead simply encoded in the PyCapsule and anyone adopting will have to update their code (mildly) when the version is changed.

@stuartarchibald this is the API that I can provide right now. It might seem clumsy to have two calls, but fixed strides may become more of thing in the future and other kwargs may also be relevant.
(NumPy cannot decide on fixed strides before knowing whether casts are necessary, so it has to be split into two calls internally, NEP 43 details the steps.)

resolve_dtypes however may well widely useful beyond Numba.

@seberg seberg force-pushed the expose-dtype-resolution-get-loop branch 2 times, most recently from 8183edf to d25c71c Compare October 11, 2022 08:53
@seberg seberg marked this pull request as ready for review October 11, 2022 10:32
@seberg
Copy link
Member Author

seberg commented Oct 11, 2022

OK, this should be done now. Test coverage may not be perfect yet (and I may follow up) – but it is niche API (and besides resolve_dtypes clearly "take care when using this" anyway).

The API proposed here is intentionally minimal (i.e. only signature and not the more convenient dtype), because I thought it is good to expose a bit of how NumPy works internally.
Passing always a tuple (including out) may be a bit inconvenient often, so I could imagine tweaking that a little, though, if others prefer that.

The API here designed is "future" in the sense that it implementes
NEP 50 and exposes loop specializations as per NEP 43 which is barely
used by NumPy itself at this point.

Due to the fact that NEP 50 is not implemented (or rather finalized)
this API is not ideal and creates dummy-arrays internally so that
it is not much faster than if the user created dummy arrays to
probe the correct result.
@seberg seberg force-pushed the expose-dtype-resolution-get-loop branch from c752aba to 9c82567 Compare October 12, 2022 08:07
@stuartarchibald
Copy link
Contributor

@seberg Many thanks for working on this functionality. I'll raise this PR with the Numba folks at the Numba triage meeting and/or public meeting tomorrow.

I don't think I can guarantee fulls table ABI at this point (nicely). It would be possible, but we still need to evolve, so some version passing would be needed. Thus, the version is instead simply encoded in the PyCapsule and anyone adopting will have to update their code (mildly) when the version is changed.

I think that this is fine as it's not clear yet if anything else is needed. Numba (which will consume this API) has to version code against NumPy versions already so the impact isn't huge. It'd obviously be good at some point to make the API stable, but it should be fine for the Numba use case in mind.

@stuartarchibald this is the API that I can provide right now. It might seem clumsy to have two calls, but fixed strides may become more of thing in the future and other kwargs may also be relevant. (NumPy cannot decide on fixed strides before knowing whether casts are necessary, so it has to be split into two calls internally, NEP 43 details the steps.)

This seems reasonable and reflects what is happening internally. From a Numba perspective, the stride information is not directly part of the type system so probably cannot be used to work out a stride-specialised loop to run unless this is done at runtime along with a GIL acquisition penalty.

resolve_dtypes however may well widely useful beyond Numba.

I'd hope for this to be the case!

Thanks again!

@charris
Copy link
Member

charris commented Oct 23, 2022

@stuartarchibald Any update?

@stuartarchibald
Copy link
Contributor

@stuartarchibald Any update?

I've taken a closer look at the inner loop behaviours. I think from a Numba point of view there's going to be issues with baking in addresses from PyCapsules if that becomes a necessity, but that's really an issue for Numba to resolve (and it probably indicates that it ought to generate its own implementation).

I think the proof will be in the testing of this, I've opened numba/numba#8538 to track on the Numba side. Is there a preference with regards to providing feedback from consuming this API? Would it be more useful for the Numba folks to build out and test this patch ahead of merge, or for it to be reviewed as-is and then Numba folks test against main and report issues (if any)?

@seberg
Copy link
Member Author

seberg commented Oct 25, 2022

Even if I intend things to be mostly underscored because it is not generally useful, I would be fine to risk that we need to deprecate and replace the two underscored functions again.

My main concern for moving forward is the API of resolve_dtypes(). Although the actual use of the other two functions is more interesting :).

there's going to be issues with baking in addresses from PyCapsules if that becomes a necessity, but that's really an issue for Numba to resolve

Hmmm, note that the capsule needs to be held on to on a per-call level (since it holds some per-call data). I am not surprised if that requires new infrastructue in Numba, but would hope that it is not particularly difficult to make happen.

@stuartarchibald
Copy link
Contributor

Even if I intend things to be mostly underscored because it is not generally useful, I would be fine to risk that we need to deprecate and replace the two underscored functions again.

My main concern for moving forward is the API of resolve_dtypes(). Although the actual use of the other two functions is more interesting :).

From a Numba point of view, I think the main use will be the resolves_dtypes API. As noted above, if there's use of the "underscored" functions then Numba already versions code against NumPy versions so it would be acceptable. Worse case, Numba's current ufunc signature resolution is not ideal but is sufficient and will still be present.

there's going to be issues with baking in addresses from PyCapsules if that becomes a necessity, but that's really an issue for Numba to resolve

Hmmm, note that the capsule needs to be held on to on a per-call level (since it holds some per-call data). I am not surprised if that requires new infrastructue in Numba, but would hope that it is not particularly difficult to make happen.

Indeed, I think this is essentially another variant on closing over addresses and working out how to handle them being baked in at compile time. I am not sure it's actually going to be needed though as I seem to recall there being somewhere between zero and a few cases of Numba using NumPy's inner loops. This change is probably motivation to ensure that it is zero.

*/
if (signature[0] == NULL && out == NULL) {
/*
* For integer types --- make sure at least a long
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long changes size depending on the platform, would it not be better to be explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this code is just moved. I.e. changing it is probably tied to thinking about changing the default integer type (although this path for sum and prod could be changed separately maybe)/

@@ -2802,7 +2836,7 @@ reducelike_promote_and_resolve(PyUFuncObject *ufunc,
* (although this should possibly happen through a deprecation)
*/
if (resolve_descriptors(3, ufunc, ufuncimpl,
ops, out_descrs, signature, NPY_UNSAFE_CASTING) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here an below, you are making this a choice that can be made at a higher level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Right now reductions always use unsafe casting. They still do here, but I thought that resolve_dtypes should maybe not.
The downside is that if someone now hardcodes resolve_dtypes(..., casting="unsafe", reduction=True) and NumPy eventually changes the "same_kind", they also have to update.

Maybe I should do it casting=None and if not given default to "unsafe" (our default) when reduction=True? It did seem OK to support casting for resolve_dtypes in reductions, which is why this is needed/useful.

goto fail;
}

return ufuncimpl;

fail:
for (int i = 0; i < 3; ++i) {
Py_DECREF(out_descrs[i]);
Py_CLEAR(out_descrs[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because the input may be NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can't be NULL here, but I want to make sure they are NULL'ed for the caller to not clean up a second time. Py_CLEAR is just one way to do it (I think you are right and it also does a NULL check).

@@ -3094,7 +3127,8 @@ PyUFunc_Accumulate(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out,

PyArray_Descr *descrs[3];
PyArrayMethodObject *ufuncimpl = reducelike_promote_and_resolve(ufunc,
arr, out, signature, NPY_TRUE, descrs, "accumulate");
arr, out, signature, NPY_TRUE, descrs, NPY_UNSAFE_CASTING,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of style, I'd tend to put the first argument on the second line as well. I'm curious as to what clang_format does in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for a time I had the habit of keeping self on its own (due to the idea of formatting errors like that).
I don't mind changing, but maybe not here since it didn't really change?

@@ -3511,7 +3545,8 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind,

PyArray_Descr *descrs[3];
PyArrayMethodObject *ufuncimpl = reducelike_promote_and_resolve(ufunc,
arr, out, signature, NPY_TRUE, descrs, "reduceat");
arr, out, signature, NPY_TRUE, descrs, NPY_UNSAFE_CASTING,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the choice of NPY_UNSAFE_CASTING do here? Is it likely to change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It keeps the current default. Yes, I think it would be nice to change eventually. IIRC the main reason for "unsafe" was probably that logic ufuncs needed it, but I have a different solution for them now.

npy_bool no_floatingpoint_errors;
PyArrayMethod_Context _full_context;
PyArray_Descr *_descrs[];
} ufunc_call_info;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a local structure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is local and partially exposed via the capsule and documented in the function docstring.
I was considering putting the public part into experimental_dtype_api.h, but it did not seem very useful it the API is expected to still change a bit with versions (and unlike experimental_dtype_api.h I don't want to expect compiling against a specific NumPy version).

free_ufunc_call_info(PyObject *self)
{
ufunc_call_info *call_info = PyCapsule_GetPointer(
self, "numpy_1.24_ufunc_call_info");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a searchable comment help here? I ask because of the explicit versioning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if this is not updated on every release, so long that it still warns that it might be updated on any release? Adding a comment, but not sure it is searchable...

PyObject *capsule = PyCapsule_New(
call_info, "numpy_1.24_ufunc_call_info", &free_ufunc_call_info);
if (capsule == NULL) {
PyObject_Free(call_info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the capsule will handle the free when needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will call free_ufunc_call_info once created, which includes this.

context->descriptors[i] = operation_descrs[i];
}

Py_SETREF(result, PyTuple_Pack(2, result, capsule));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, reusing result here feels a bit too clever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good nose, not that it is much prettier now, but result cannot be reused like that due to the "finish" goto. It was a bug.

@charris
Copy link
Member

charris commented Oct 25, 2022

Just completed the first pass through this large PR, I probably missed some things.

@charris
Copy link
Member

charris commented Oct 26, 2022

Let's put this in. I suspect it isn't yet in final form, but this is a step on the way.

@charris
Copy link
Member

charris commented Oct 26, 2022

Thanks Sebastian.

@seberg
Copy link
Member Author

seberg commented Oct 27, 2022

I am happy to just put it in (will make a small pass today on the other comments). The main thing I would like to get right is the API of ufunc.resolve_dtypes().

OTOH, its super new API that won't be used massively tomorrow, so if numba finds something we probably can get away with calling it a bug-fix.

@stuartarchibald
Copy link
Contributor

I am happy to just put it in (will make a small pass today on the other comments). The main thing I would like to get right is the API of ufunc.resolve_dtypes().

OTOH, its super new API that won't be used massively tomorrow, so if numba finds something we probably can get away with calling it a bug-fix.

This patch numba/numba#8544 removes the last few places where Numba was reliant on NumPy's ufunc inner loops. It prevents the "baking in addresses from/lifetime of PyCapsule" problem in Numba noted above. Once merged, should be in a position to try this API out.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a review with most attention paid to the documentation and the ufunc.resolve_dtypes() API. Most comments are just a few "typo" level things. Thanks for working on this!

numpy/core/_add_newdocs.py Outdated Show resolved Hide resolved
numpy/core/_add_newdocs.py Show resolved Hide resolved
numpy/core/_add_newdocs.py Outdated Show resolved Hide resolved
numpy/core/_add_newdocs.py Outdated Show resolved Hide resolved
numpy/core/src/umath/ufunc_object.c Outdated Show resolved Hide resolved
numpy/core/src/umath/ufunc_object.c Outdated Show resolved Hide resolved
Comment on lines 6634 to 6653
if (fixed_strides_obj == Py_None) {
for (int i = 0; i < ufunc->nargs; i++) {
fixed_strides[i] = NPY_MAX_INTP;
}
}
if (PyTuple_CheckExact(fixed_strides_obj)
&& PyTuple_Size(fixed_strides_obj) == ufunc->nargs) {
for (int i = 0; i < ufunc->nargs; i++) {
PyObject *stride = PyTuple_GET_ITEM(fixed_strides_obj, i);
if (PyLong_CheckExact(stride)) {
fixed_strides[i] = PyLong_AsSsize_t(stride);
if (error_converting(fixed_strides[i])) {
return NULL;
}
}
else if (stride == Py_None) {
fixed_strides[i] = NPY_MAX_INTP;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if...else if.... else error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wooops, also the outer if had it missing, added (and a test).

seberg and others added 4 commits October 27, 2022 16:01
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
Also add tests (including for a bad capsule)
Reusing `result` doesn't work with a single "finish" goto, since
result must be NULL on error then.
This copies the result over for continuation, which is maybe also a bit
awkward, but at least not buggy...
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @seberg, I've taken a look and there's a couple more minor things to perhaps consider. Many thanks.

numpy/core/src/umath/ufunc_object.c Show resolved Hide resolved
numpy/core/src/umath/ufunc_object.c Show resolved Hide resolved
numpy/core/tests/test_ufunc.py Show resolved Hide resolved
numpy/core/tests/test_ufunc.py Outdated Show resolved Hide resolved
Co-authored-by: stuartarchibald <stuartarchibald@users.noreply.github.com>
@seberg seberg force-pushed the expose-dtype-resolution-get-loop branch from 8cc7b29 to c973fc9 Compare October 27, 2022 16:47
@seberg seberg added this to the 1.24.0 release milestone Nov 16, 2022
@mattip mattip merged commit d428d45 into numpy:main Nov 16, 2022
@mattip
Copy link
Member

mattip commented Nov 16, 2022

Thanks @seberg

@seberg seberg deleted the expose-dtype-resolution-get-loop branch November 16, 2022 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants