Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: combine string ufuncs by passing on auxilliary data #25796

Merged
merged 7 commits into from
Feb 15, 2024

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Feb 8, 2024

This is just a proof of concept to show shows how one can reduce the code by using loops for multiple functions. trying maximum and minimum EDIT: now all useful functions done.

Right now, it is rather inelegant since I do not know how to pass on arbitrary data in the new array method framework... (@seberg?), so instead I base it on the name of the function. That said, at least that name was immediately useful! EDIT: it changes the array method so that it allows passing on aux data; this should really be a different PR!

cc @ngoldbaum

p.s. Shaves 1500 55000 bytes off the (enormous) _multiarray_umath.cpython-311-x86_64-linux-gnu.so, though that is with debug symbols, etc.

@seberg
Copy link
Member

seberg commented Feb 9, 2024

Bytes, hehe :). So the thing is the method object is completely opaque right now (which could hold static data). The other options would be to expand context with a static data slot that can be set, I suppose.
Somehow, I didn't think static data was all that important, but folding it into auxdata isn't that much easier of course. But for this type of tihng it is useful (i.e. you don't care about the singleton function call overhead at all).

@mhvk
Copy link
Contributor Author

mhvk commented Feb 9, 2024

the method object is completely opaque

But how would I set the method inside, say, init_ufunc?

Also, for these purposes, it really should be extra data that goes with the specific loop, which makes auxdata a bit more logical. Could be part of the spec, I guess (which is effectively what I'm doing by using the name).

Note that in the FFT additions, I similarly made use of auxdata to avoid creating duplicate pieces of code...

@seberg
Copy link
Member

seberg commented Feb 9, 2024

So, we could add it to the self as a slot and create accessor (or make it public to the extend needed for that). It is currently opaque (so even if you add it, the user can't get it).

Of course to some degree you could also prefer tagging it on to the ufunc. But than you need to extend the way the ufuncs store loops now (since ufuncs assume evertying is in the ArrMethod object now).

@mhvk
Copy link
Contributor Author

mhvk commented Feb 9, 2024

Not sure what is the best approach, but you are right that for this use case, something associated with the ufunc would be good enough, maybe like a "default auxdata" (right now, I see that in npy_default_get_strided_loop, the data just gets set to NULL). Though the loop-specific case is what would be needed for, say, np.exp using a generic inner loop that just calls a given function on each element (where the function likely is specific to the input dtype). I feel an ability to set auxdata as part of adding a given loop would be the most generally useful, i.e., it feels to me easiest would be to add a user_data entry to PyArrayMethod_Spec.

I saw that you had a note in _dtype_api.h that perhaps is relevant:

* TODO: Due to the fact that `resolve_descriptors` is also used for `can_cast`
* there is no way to "pass out" the result of this function. This means
* it will be called twice for every ufunc call.
* (I am considering including `auxdata` as an "optional" parameter to
* `resolve_descriptors`, so that it can be filled there if not NULL.)

This refers to filling it if not NULL, but how in the present scheme can auxdata/transferdata/loop_data be anything but NULL?

@seberg
Copy link
Member

seberg commented Feb 9, 2024

You can set auxdata just fine, it is done in the get_loop function. And auxdata is in fact what is used internally to pass the static void * into legacy ufuncs.

I.e. you can do:

struct {
   NpyAuxdata base;
   void *func;
} static_auxdata_struct;

/* Do nothing free, it's static */
static void
no_free(NpyAuxData *aux) {}

static_auxdata_struct min_auxdata = {
    .base={
        .free = no_free;
        // Don't bother about clone for now
    }
    func = &string_min;
}

get_loop(...) {
   if (ufunc == ...) {
       out_auxdata = min_auxdata;
   }
}

Which would mean you move the ufunc inspection into get_loop, but that's about it.

@mhvk
Copy link
Contributor Author

mhvk commented Feb 9, 2024

That's possible, I guess, but feels wrong; get_loop really should not have to worry about what ufunc it is attached to; at most, it should move some bits of data from its context to auxdata (I guess that is also possible, by defining one's own PyArrayMethod just like is done for the legacy ufuncs...).

Note that making this a bit more general may be useful: for strings, the duplication of inner loops which essentially do exactly the same thing just with a different function was my main complaint (see two-but-last to-do item in gh-25693). It would really help to make this something declarative (as it was for the the legacy ufuncs). Since I think this is one of the first in-depth uses of the new array method mechanism, maybe it is still a good time to address?

@seberg
Copy link
Member

seberg commented Feb 9, 2024

I am very happy to add a slot to the ArrayMethod creation that allows adding static data. And probably exposing it through self. Internally, you could just do that. Making it public (i.e. exposing some of the ArrayMethod struct), is also totally fine for obvious stuff (like this one).

Attaching things to the ufunc is plausible, but right now the ufunc has a map of dtypes -> arraymethod, and making that dtypes -> (arraymethod, void *) is a bit meeh in my eyes.

In other words, I am very happy if we add it to the ArrayMethod and also if we make it public. We could pass it as part of context rather than context->meth->static_data, but attaching it the ufunc (which at least "feels" leaner) seems tedious.

@mhvk
Copy link
Contributor Author

mhvk commented Feb 9, 2024

Yes, agreed, attached to the ArrayMethod makes most sense.

Looking at https://docs.python.org/3/c-api/type.html#c.PyType_Slot, typically a slot contains a function, so do you mean adding a slot for a function that provides the data? Or a slot that provides the data directly? (I think you imply the function, but just in case). And would you then use that slot (if defined), in npy_default_get_strided_loop (which is where currently, out_transferdata is set to NULL)?

@seberg
Copy link
Member

seberg commented Feb 9, 2024

Or a slot that provides the data directly?

In almost all cases, I like functions, but I was meaning a void *. A function might make sense to me if and only if:

  • You use it to fill a new slot in context, and
  • the function gets some input.

And I think at least the second just won't be the case (or isn't worthwhile).

@mhvk
Copy link
Contributor Author

mhvk commented Feb 9, 2024

OK, so closer to the existing auxdata then!

@mhvk mhvk force-pushed the string-ufuncs-combined-minimum-maximum branch from e051304 to 0373679 Compare February 9, 2024 21:27
*out_transferdata = NULL;
}
else {
static_auxdata_struct *transferdata = (static_auxdata_struct *)malloc(sizeof(static_auxdata_struct));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - I followed your example above, but defining a .clone might actually be nicer. But, really, I'm just messing around...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - I now realize there is really no point creating this transferdata here since in the loop I can just access context->method->aux_data. Is that what you had in mind? If so, I'll remove these changes in array_method.*...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right we shouldn't build an AuxData struct at all of course, and this second iteration is what what I had in mind. Either context->method->const_data or context->const_data, and if all we do is copy it, the first seems just as well.

(Moving it to the context could be done to make it a true scratch space, but we can also just add scratch space to context for that.)

I think we should rename aux_data, though. Since there already is the other auxdata and it has a per ufunc run lifetime! I can see static_data, const_data or even using context/ctx again.

One thing I was mentioning (but let's defer that) is that I don't want the Method objects struct fully public, so on the public side it would make sense to move this high up, so we can decide to make parts public (I think it might be fully opaque right now).

@@ -482,20 +427,24 @@ minimum_strided_loop(PyArrayMethod_Context *context, char *const data[],
const npy_packed_static_string *sin1 = (npy_packed_static_string *)in1;
const npy_packed_static_string *sin2 = (npy_packed_static_string *)in2;
npy_packed_static_string *sout = (npy_packed_static_string *)out;
if (_compare(in1, in2, in1_descr, in2_descr) < 0) {
int cmp = _compare(in1, in2, in1_descr, in2_descr);
if (cmp == 0 && (in1 == out || in2 == out)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngoldbaum - this I think is actually a small bug in the current implementation: if the strings compare equal, then no copy should happen if either side is in-place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this!!

@mhvk
Copy link
Contributor Author

mhvk commented Feb 9, 2024

@seberg - I used this PR for a proof of concept of adding aux data - does it make any sense? Obviously, if it does, we should take it out of this PR...

@mhvk mhvk force-pushed the string-ufuncs-combined-minimum-maximum branch from 0373679 to b36b2c3 Compare February 10, 2024 02:53
@mhvk mhvk changed the title MAINT: combine minimum and maximum string ufuncs MAINT: combine string ufuncs by passing on auxilliary data Feb 10, 2024
@mhvk
Copy link
Contributor Author

mhvk commented Feb 10, 2024

I went ahead and did all functions that could be fairly easily factored together... About 1000 fewer lines of code.

@mhvk mhvk force-pushed the string-ufuncs-combined-minimum-maximum branch from b36b2c3 to 41b5a5a Compare February 10, 2024 03:04
@mhvk
Copy link
Contributor Author

mhvk commented Feb 10, 2024

p.s. The overall diff of stringdtype_ufuncs.cpp is rather dreadful. May be best to view individual commits.

@ngoldbaum
Copy link
Member

Awesome, thanks for taking this on! I will look this over next week.

@ngoldbaum
Copy link
Member

@lysnikolaou you probably want to look at this too

@@ -55,6 +55,7 @@ typedef struct PyArrayMethodObject_tag {
PyArrayMethod_StridedLoop *unaligned_strided_loop;
PyArrayMethod_StridedLoop *unaligned_contiguous_loop;
PyArrayMethod_StridedLoop *contiguous_indexed_loop;
void *aux_data;
Copy link
Member

@seberg seberg Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the fact that it should be renamed, I think, the below is only relevant for followup or maybe if splitting out.

Just FYI: I think I would be OK to move this up (maybe very high) and if we make things public, include the existing ones up to, and including, flags.

The "methods/functions" should not be public, in order to ensure we can deprecate and replace them (it would be nice to add an accessors for some so that a non-NumPy library could use our functions).
(We could also move down name, I really added it for debugging only.)

After adding this, it could probably also simplify the legacy arraymethod as it currently (ab)uses aux-data for static data, effectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks! I now call it static_data and moved it just beyond flags. It is still on the method, which is most logical if we allow filling it via fill_array_method_from_slots.

I can see the logic of something on context itself, though. But maybe we need to think through a bit where exactly it will be useful...

@ngoldbaum
Copy link
Member

By the way, I'm making the DType API public in another PR, so maybe wait on that one being merged for a final round on this PR because there will definitely be conflicts.

@mhvk mhvk force-pushed the string-ufuncs-combined-minimum-maximum branch 3 times, most recently from 97503ae to 75b5242 Compare February 10, 2024 19:05
@mhvk
Copy link
Contributor Author

mhvk commented Feb 10, 2024

OK, this seems to work and I'm fairly happy with it. A general question is whether the dtype C API addition should be split off, and, if so, a specific one whether it might still be made part of gh-25754 (if not, then some guidance on what other files to adjust would be welcome...).

For the ufuncs themselves, the diff of the whole PR is basically unreviewable, but it's OK if done commit by commit. Happy to split it up in different PRs is that feels better.

@ngoldbaum
Copy link
Member

OK great, I'm hoping to have the dtype API PR done today so hopefully this won't take more than a couple days to sort out. Hope that's OK with you too @lysnikolaou!

No such thing as a major release of a nearly 20 year old open source project without gnrly merge conflicts when we get down to the line!

@mhvk
Copy link
Contributor Author

mhvk commented Feb 13, 2024

Yes, n many ways it is a compliment to all those before us that there are so few major merge conflicts!

@seberg
Copy link
Member

seberg commented Feb 13, 2024

Right: For the record, I am happy with the addition, I am not 100% on how much of the PyArrayMethod struct we should make public, but up to there seems not unreasonable.

I was mostly hoping to delay these things until Nathans PR is in (and making the descriptor int64 and hiding the fields is gnarly business, even if hopefully not all that bad for downstream).

@lysnikolaou
Copy link
Contributor

OK great, I'm hoping to have the dtype API PR done today so hopefully this won't take more than a couple days to sort out. Hope that's OK with you too @lysnikolaou!

Sure! I'm ready to try this for fixed-length dtypes as well as soon as it's merged.

@ngoldbaum
Copy link
Member

I don't really want to deal with making the PyArrayMethodObject partially public, so for now I'm making NPY_METH_static_data private and prepending the name with an underscore so this is mergeable without resolving the public API.

@mhvk mhvk force-pushed the string-ufuncs-combined-minimum-maximum branch from 75b5242 to 83a50c2 Compare February 14, 2024 21:19
@ngoldbaum ngoldbaum force-pushed the string-ufuncs-combined-minimum-maximum branch from 83a50c2 to 5f24538 Compare February 14, 2024 21:26
@mhvk mhvk force-pushed the string-ufuncs-combined-minimum-maximum branch from 5f24538 to 86bdcc9 Compare February 14, 2024 21:28
Copy link
Contributor Author

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I rebased, and things seem to work.

Main query is whether I put the documentation in the right place, and whether we should say anything about where static_data is exposed.

doc/source/reference/c-api/array.rst Outdated Show resolved Hide resolved
@mhvk
Copy link
Contributor Author

mhvk commented Feb 14, 2024

Ah, oops, we clearly were pushing at the same time, sorry! I like your suggestion - since I messed this up, let me just push from my branch...

@ngoldbaum
Copy link
Member

Sorry for force-pushing to this branch earlier, I thought you wanted me to resolve the merge conflicts

@mhvk
Copy link
Contributor Author

mhvk commented Feb 14, 2024

You had offered that very kindly, but I thought since I had seen the docs going in, I could just do it myself - sorry to work at cross purposes!

@mhvk mhvk force-pushed the string-ufuncs-combined-minimum-maximum branch from 86bdcc9 to 9c722db Compare February 14, 2024 21:34
@mhvk
Copy link
Contributor Author

mhvk commented Feb 14, 2024

OK, should be all OK now. Reminder that for review of the ufunc changes, it makes most sense to go commit by commit - the difference of the whole set is basically unintelligible.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, always love to see -1000 LOC PRs. In my defense, sometimes when you're writing a big block of code like this it's easy to lose the forest for the trees. Thanks so much for adding the new slot in the API, it seems like an obvious nice way to avoid a ton of boilerplate. The commit-by-commit review was very straightforward. Also the tests passing strongly indicates this refactor didn't lead to any unintended behavior changes.

@@ -482,20 +427,24 @@ minimum_strided_loop(PyArrayMethod_Context *context, char *const data[],
const npy_packed_static_string *sin1 = (npy_packed_static_string *)in1;
const npy_packed_static_string *sin2 = (npy_packed_static_string *)in2;
npy_packed_static_string *sout = (npy_packed_static_string *)out;
if (_compare(in1, in2, in1_descr, in2_descr) < 0) {
int cmp = _compare(in1, in2, in1_descr, in2_descr);
if (cmp == 0 && (in1 == out || in2 == out)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this!!



template <IMPLEMENTED_UNARY_FUNCTIONS f, const char* function_name>
typedef bool (Buffer<ENCODING::UTF8>::*utf8_buffer_method)();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this is much simpler with the typedef, i'm glad the functor wasn't necessary in the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one took my by far the longest - mostly to realize that this method pointers are 16 bytes long. But, yes, once I had it, I was pleased with it!

@mhvk
Copy link
Contributor Author

mhvk commented Feb 14, 2024

Great! I hope that with this example, refactoring the other PRs will be straightforward.

Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was surprisingly easy to review commit-by-commit, which really speaks volumes for this approach! Let's merge this and I'll try to do it for fixed-length strings as well.

@seberg
Copy link
Member

seberg commented Feb 15, 2024

If you are all happy, we should just put this in, thanks.

@seberg seberg merged commit fcb6402 into numpy:main Feb 15, 2024
63 checks passed
@mhvk mhvk deleted the string-ufuncs-combined-minimum-maximum branch February 15, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants