New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: casting not using configurable memory functions from NEP 49? #22686
Comments
Thanks for the interest in wrapping such objects! Maybe @mattip thinks differently, but I think that is approach is fundamentally broken, unfortunately. That is, the DType should not can control the allocation scheme itself (we could allow it to, but I suspect it is not a great approach, also because generally we may want traversal and not just deallocation). I just looked into reorganizing how NumPy clears memory yesterday. Which replaces the current refcounting scheme (it is working) and would be available to new user DTypes (but only NEP 41-42 ones). The new API is still unstable and hidden behind an "experimental" flag, but I don't really expect massive changes, just that ABI breaks will happen and some things might just crash because of an old code path that doesn't work. What is your time line? Do you want to meet up and talk about such new API or help doing it? There is a bit more interest in developing new style DTypes now, so I hope that will help settle the API and fill in the missing things relatively quickly. |
Hi @seberg , thanks for the reply! To clear up misconceptions, the approach I outlined above is not about having dtype-specific memory management strategies. Rather, I believe that the problem that I am seeing is that the NEP 49 memory allocation functions are not used consistently throughout the NumPy codebase, and situations in which NumPy arrays (or at least, data segments) are allocated without going through the customisable memory allocation routines can arise. Perhaps this is all working as intended, in that case my approach is destined to fail. I would be very interested in learning more/chatting about the new dtype API - does it solve the general issue of wrapping C++ types with arbitrary constructors/destructors/assignment operators? Can I read about it somewhere? |
But your dtypes need a specific memory allocation/deallocation function to work? So if I write a "pinned memory" memory allocator that replaces yours, your dtype will break. I suppose the DType could try to guess at that, but that seems rather involved? You can have a look at the NEPs, maybe starting from 42 (some parts may be a bit outdated, but basically that is what we got now as an experimental header you can use). The new API can get a generic deallocation function. I had not prioritized that for a long time, because an early argument was, that maybe we should just ask everyone to store Python objects and support that explicitly. That is also a solution (that doesn't quite work yet), but I now think that it is easier to just allow you to hook into deallocation. Simple as said: I will create for now a new One thing I would like to keep, but I am not quite sure about for C++ objects is to generally only 0 the array data as initialization for "objects with references". Maybe with some limits (i.e. only some functions like clear would need to support zeros memory, not all). We are creating an example repo in https://github.com/numpy/numpy-user-dtypes, they are not production ready for a while of course. I have some more examples myself and a new one coming, which I may add there (the new one isn't public yet). |
You are right that if you override my (de)allocation functions, the whole machinery will break down.
Thanks for the pointer, I will do some reading. One data point: I think it's vital to ensure that people can create NumPy views on memory buffers created on the C++ side. I.e., I would really like to be able to create a NumPy array that has as a base object a capsule storing internally a
From the little time I have spent in the NumPy codebase, it seems like it's peppered with liberal uses of |
The dtype auxdata is part of the dtype, not part of the "data segment" in the sense described in NEP 49 which was meant to allow overriding the memory management scheme for the continuous chunk numpy.ndarray.data. The auxdata lives on the dtype, and so is not covered by the NEP 49 memory management scheme. I am not sure how far we can strech the basic NumPy premise that data is stored as a single contiguous segment over a different premise where each data element is to be interpreted as a c++ object. Stretching it to cover the case where the data is a c struct is hard enough. If the user-defined dtype need more layers to support such an idea, personally I would vote for keeping things simple. The casting machinery is quite hard to understand as-is, without further complications.
I would be interested in seeing where those are used on the data segment. We have a passing test that sets a NEP 49 memory management strategy and re-runs the entire test suite with it. Additionally, when I implemented NEP 49 I tried to carefully check the places that use those functions to make sure they do not touch the data segment of a managed array (they may be used for some local, temporary arrays). Did I miss any?
In summary: I am not convinced you can safely use NumPy to store C++ objects in the data segment without some large changes. Perhaps you could get away with storing pointers to C++ objects, in something like an |
"littered with" is a bit of an exaggeration, I think :). It is a maze, but there are a couple of bad functions mostly written 20 years ago that are just broken. Then there are a few places that do incorrect safety checks. But beyond that there should only be a hand-full of places that really need adjustment for custom initialization to work. I am not sure about One caveat I see is double clearing. It would be easier if you assume that it is fine to call the |
Ah, maybe one point about |
Hi @seberg and @mattip and thanks a lot for your replies.
I think I more or less understand the rationale. My surprise comes from the fact that these pointers originating from a https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/dtype_transfer.c#L2876 are eventually passed into the user-defined casting functions (I mean, those that are registered via
My expectations for what a data segment is were probably wrong, so I cannot really claim you missed anything :)
My impression working on this is that custom memory allocation strategies can get you 90% there. As long as you are careful about keeping track of where in the data segment you have constructed C++ objects, there's no reason why it should not work (assuming of course that all the relevant buffer allocations are tracked under this machinery).
Apologies, I realised this could come across as a bit on the snarky side and wasn't meant to, so I edited it :)
Still it seem like the dtype auxdata should then consider the possibility of having to initialise memory buffers in a custom fashion, or did I misunderstand?
Unfortunately |
Please ignore "dtype auxdata", it is old API that needs to be replaced, just say OK, so for
I don't know if there is a way to tell if a C++ object is safe for direct storage in a C-style array? But how does |
Here is one example of a problematic code stanza, in numpy/numpy/core/src/multiarray/array_assign_scalar.c Lines 254 to 263 in 7f0f045
Do we have a dtype with |
I have to figure out what C++ objects need that makes I am not sure why you want more than the We do break that in a few places (i.e. check |
I guess @bluescarni could test if |
I don't see the issue right now? As of now, I assume it is OK if a DType:
I would assume those two things are enough for most use-cases even many C++ objects, although using typed Python objects is maybe just as well for anything larger. Now, I don't want to pretend that the above isn't problematic. We have the problem of "ownership" for example in the HPy sense or DTypes that may wish to have a custom memory arena (strings). For those, we need to figure out how such buffering works (maybe it is fine, because they attach the arena to the dtype itself, maybe not). Of course there could be other interesting reasons for requiring a dtype specific allocation. Such dtypes couldn't be stored in a structured dtype (unless they all share that allocation scheme), though! |
@seberg
Thus, from a strictly language-lawyer point of view,
In other words, in order to properly support C++ types in a NumPy array, one must be able to specify how to construct, assign and destroy individual array elements. |
@mattip you mean the |
Adding the
|
@bluescarni you should give up on old style dtypes for this. The only thing they can possibly support through ugly hacks is storing Python objects inside. @bluescarni the point being , so long your object
The problem is only if you need to copy the elements, since we use realloc in places where we want the kernel to resize for us efficiently. But of course we can work around everywhere (worst case, it is inefficient). We can easily do the "allocate something larger and copy the data" scheme in NumPy. |
It still seems to me to be problematic to create a copy of src data in |
@mattip I don't see what is problematic. We make copies all the time, there is nothing special here. The only special thing is that the temporary copy isn't wrapped into a full blown array object? Now, the point of that is of course the assumption that copying the data from a scalar to the array may be much faster than doing the equivalent cast. I expect:
So, I dunno if it is overall worthwhile. Maybe the special cases for effectively Also, I bet we can get more speed improvement out of SIMDfying (with compiler hints only if necessar) the existing casts, than we would lose if we just remove this... |
Describe the issue:
Hello!
I am trying to use the new configurable memory routines from NEP 49, described here:
https://numpy.org/devdocs/reference/c-api/data_memory.html#configurable-memory-routines-in-numpy-nep-49
My use case is that I want to store non-trivial C++ objects as elements of NumPy arrays. In order to correctly manage the lifetime of my C++ objects (including correct construction and destruction without memory leaking), I override the default
malloc
/calloc
/realloc
/free
implementations so that metadata is associated to every array data segment allocated by NumPy. This metadata keeps track of which C++ objects have been constructed in the data segment. Then, in the basic NumPy primitives for my C++ dtype (e.g.,get/setitem
,nonzero
,copyswap
, etc.), I query the metadata of the data segment and construct C++ object in-place as needed. Thefree
function override takes care of calling the destructors of the constructed C++ objects before de-allocating the data segment.For reference, here is the implementation of the overriding memory functions + metadata handling:
https://github.com/bluescarni/heyoka.py/blob/b661040030ff4d05c0dc2e03c8c58217d9ca0d71/heyoka/numpy_memory.cpp
And here is the code for exposing my C++ type (called
mppp::real
) to Python and NumPy:https://github.com/bluescarni/heyoka.py/blob/b661040030ff4d05c0dc2e03c8c58217d9ca0d71/heyoka/expose_real.cpp
So far, most of NumPy's basic functionality seems to work fine: I can create new arrays with my custom C++ dtype, access elements, etc. See here for a WIP testing suite I am working on:
https://github.com/bluescarni/heyoka.py/blob/b661040030ff4d05c0dc2e03c8c58217d9ca0d71/heyoka/test.py#L5399
I am running however in an issue when I attempt to cast arrays of my custom dtype to/from other NumPy dtypes.
What happens is that the casting functions seem to receive
from
/to
arrays in which the data segment has NOT been allocated via the configurable memory routines from NEP 49. This of course is a big issue, because there is no metadata associated to the data segments, and my whole scheme for managing the lifetimes of my C++ objects comes to a crashing halt.I have spent a few hours with a debug build of NumPy and a debugger. Although I haven't fully understood the intricacies of the casting logic, I think I have managed to pinpoint where the data segments passed to the casting functions originate from, and it seems to be here:
https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/dtype_transfer.c#L2865
I.e., from a direct
PyMem_Malloc()
call (rather than leveraging the configurable memory routines from NEP 49). Theto
/from
buffers that are passed to the casting functions are defined a bit below:https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/dtype_transfer.c#L2876
(
from_buffer
andto_buffer
).Am I wrong in expecting that these two buffers should be allocated via the NEP 49 functions?
Reproduce the code example:
Error message:
No response
NumPy/Python version information:
1.25.0.dev0+20.gd4b2d4f80 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0]
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: