Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allocate working buffers outside ufunc's inner loop #11510

Open
mattip opened this issue Jul 5, 2018 · 5 comments
Open

ENH: allocate working buffers outside ufunc's inner loop #11510

mattip opened this issue Jul 5, 2018 · 5 comments

Comments

@mattip
Copy link
Member

mattip commented Jul 5, 2018

Working on matmul in #11133, and comparing to the linalg inner loops, I ran into a need for a working buffer much like linalg. In umath_linalg.c.src each iteration of the inner loop mallocs/frees the working memory. There seems to be no generic support for passing in a working buffer allocated once for the ufunc call.

The PyUFuncGenericFunction signature has a innerloopdata argument, but I could find no examples of its use in linalg. In the actual inner loops in umath_linalg.c.src and elsewhere it is marked as NPY_UNUSED(func)

The only place I could find a use for this argument is in unmasked_ufunc_loop_as_masked where it is used to hold a structure, not a function.

@mhvk
Copy link
Contributor

mhvk commented Jul 5, 2018

@mattip - In my work wrapping erfa functions I ran into the same problem. I considered passing on temporary buffers via *data, but realized that doesn't work in a multi-threaded environment, since the *data is unique to each ufunc, not to each invocation of the ufunc. I think the only argument possibly useful for a future extension would be args, but it would need some further adjustment of the ufunc structure to indicate to the code that such a temporary buffer should be allocated.

Although I wonder: at least in my case the reason I needed a temporary array was that the functions I'm wrapping expect input arrays that are contiguous, and that is not necessarily what one gets passed. Is it the same for you? Is there perhaps a flag to ask the iterator to provide that?

p.s. I was puzzled too about what *data was used for, and found it is used to pass on a functions like sin and exp, which all use the same inner loop (see "generic float loops" in loops.c.src; this explains the reason for it being separate for each type), as well as for frompyfunc.

@mattip
Copy link
Member Author

mattip commented Jul 5, 2018

since the *data is unique to each ufunc, not to each invocation of the ufunc

I was thinking of something that would allocate before starting to iterate over the loops and free afterwards, but that second part is exactly what is missing.

Indeed I would like to get a contiguous piece of memory for the inner loop, but

  • there is no API to set the flags for a gufunc, which is functionality that is also missing but not part of this issue, see also BUG: better detect memory overlap when calling gufuncs #11381
  • I don't want to copy the entire operand into a contiguous buffer since for higher dimensional data that would be quite wasteful
  • I do not see a mechanism to request contiguous data from the outer loop iterator in PyUFunc_GeneralizedFunction or for that matter in any of the simpler looping functions.

The solution used for masked data is an example of the kind of hack I would like to avoid - there is special code at the end of execute_fancy_ufunc_loop to call NPY_AUXDATA_FREE for the data structure allocated before loop iteration begins in PyUFunc_DefaultMaskedInnerLoopSelector.

Maybe we need to add an after_loop_cleanup pointer function to the ufunc structure. We are changing it for NEP 20 so adding another field might be OK?

@mhvk
Copy link
Contributor

mhvk commented Jul 5, 2018

Maybe we need to add an after_loop_cleanup pointer function to the ufunc structure. We are changing it for NEP 20 so adding another field might be OK?

I think we could extend as long as we do it before 1.16.0. And perhaps write it such that we can also solve the masked loops (have to admit I do not yet fully understand what happens there).

@mattip
Copy link
Member Author

mattip commented May 7, 2022

@seberg did any of these ideas make it into the ufunc refactoring?

@seberg
Copy link
Member

seberg commented May 7, 2022

It is possible now, yes if you write it the way that the string ufuncs are written in my PR (also good to merge ;)).

It will be slightly awkward right now, since there is no way to get the shapes early on currently (could be an API addition though). So you need to use create an +empty NpyAuxData * currently, and then do the actual filling in only in the first call, because only then you will have the shape available you need for matmul specifically.

EDIT: I am happy to walk through in detail of how to do it with anyone who wants to look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants