TASK: Potential fixup/followups for the allocator changes #20193

seberg · 2021-10-25T18:57:25Z

These is a list of potential followups for gh-17582

Release note is missing? PR DOC: add release note and move NEP 49 to Final #20194
I would like a bit "hand holding" for the arr->flags |= NPY_OWNDATA user. I.e. tell users in some place, (maybe the flag documentation?) linked from the warning: Please use use a PyCapsule as the array base (this is simple, but you have to know about PyCapsule). (I could look into that myself.)
PyCapsule_GetPointer technically can set an error and may need the GIL? I am not worried enough about this to delay, but the functions look like they are meant to be callable without the GIL (I doubt we use it, and I doubt it really can error/need the GIL, but maybe we can make it a bit nicer anyway).
The void functions use (normally) small allocation: I think it may make more sense to just use the Python allocators rather than a (possibly slow) aligned allocator. (This is VOID_compare)
We should double check that we are OK without an explicit versioning mechanism (and maybe related with users assigning directly assigning the handler).

Anyone else have anything that might make sense to look into?

The text was updated successfully, but these errors were encountered:

mattip · 2021-10-25T19:37:37Z

Please use use a PyCapsule as the array base

An example might be the best way. Do we know of code in the wild that assigns NPY_OWNDATA ? Searcing the top 4000 PyPI packages did not yield any results.

seberg · 2021-10-25T19:41:41Z

Ah, I had done a github search: https://github.com/Seynen/egfrd/blob/e0ed0916797fe105f90e38e83b6375d340f94caf/peer/numpy/ndarray_converters.hpp#L60

or: https://github.com/ThierryDeruyttere/vilbert-Talk2car/blob/6476b16970cfd0d88e09beb9a57cc5c39b7acb3f/tools/refer/external/_mask.pyx#L96

(I guess it is good that the top 4000 PyPI packages are clean, means that this seems really only used in a fiew small "home grown" libs.)

leofang · 2021-10-25T22:27:36Z

It'd be nice to have Python APIs for the allocator, but I doubt it can be done in time for 1.22?

mattip · 2021-10-26T16:57:49Z

Whoops. Searching for NPY_ARRAY_OWNDATA does show a few packages that are (ab)using the flag.

seberg · 2021-10-26T17:03:28Z

Searching for NPY_ARRAY_OWNDATA does show a few packages that are (ab)using the flag.

Ah, that explains why my search felt like it only turned up fringe examples this time ;).

It'd be nice to have Python APIs for the allocator, but I doubt it can be done in time for 1.22?

@leofang I am not sure what you are looking for? Is there a use that is not just as well addressed by an example in the documentation, or, maybe even better, a dedicated small repository (which could live in the NumPy org)?

leofang · 2021-10-26T18:44:08Z

It'd be nice to have Python APIs for the allocator, but I doubt it can be done in time for 1.22?

@leofang I am not sure what you are looking for? Is there a use that is not just as well addressed by an example in the documentation, or, maybe even better, a dedicated small repository (which could live in the NumPy org)?

@seberg I need a Python API to switch the allocator in a Python session, as we likely cannot afford to make NumPy a build time dependency. Something similar to what @eliaskoromilas did with his numpy-allocator would do:

>>> import numpy as np
>>> 
>>> # let downstream libraries/users worry about how to prepare the necessary interface objects
>>> my_allocator = np.create_allocator(
>>>     malloc=...,
>>>     calloc=...,
>>>     realloc=...,
>>>     free=...)
>>> with my_allocator:  # change the allocator locally
...     a = np.array([1, 2, 3])
>>> 
>>> # or, change the allocator globally
>>> curr_allocator = np.get_allocator()
>>> np.set_allocator(my_allocator)
>>> b = np.array([4, 5, 6])

If it has to be in a separate repo, we might be able to live with it, but the preference is to have it in NumPy because it's really just a small interface.

leofang · 2021-10-26T19:00:14Z

@seberg Another orthogonal thing I've been pondering is how to make the allocator interact with the DLPack support (#19083). Ideally, for example, if I set the allocator to CUDA pinned memory or managed memory, when exporting through DLPack it should be able to set the corresponding DLDeviceType field correctly. Thanks to PyDataMem_Handler we could probably fetch the necessary info from its name field?

seberg · 2021-10-26T19:13:55Z

Uffffff... I personally don't really like encoding arbitrary stuff in names :(.

Maybe we do need some "reserved" space to allow putting in feature-flags/version or just some function pointer slots (easy enough)? If you are already proposing to extend the API/ABI. Or even just bite the bullet and consider a FromSpec API?

I don't like getting hands dirty in API discussion, but if this is impeding anyway better now than after release.

EDIT: Of course, you could write your own dlpack exporter already, but that cannot not work implicitly...

jakirkham · 2021-10-27T08:15:17Z

Just to add to Leo's point about hooking allocators into Python, there are a few use cases that stick out to me:

Allocators with specific alignment (some low-level function expect very specific alignment)
Pinned memory (good for shepherding data-to-from special devices like GPUs ;)
Shared memory (useful for working with large memory allocations in a process parallel friendly way)

There are probably others I'm overlooking, but these come up a fair bit. Admittedly there is some C code in all of these, but it can be handy to switch to a different allocator (especially in particular contexts).

jakirkham · 2021-11-05T19:47:41Z

cc @pentschev @madsbk @quasiben (for awareness)

eliaskoromilas · 2021-11-08T22:25:55Z

Something similar to what @eliaskoromilas did with his numpy-allocator would do:

>>> import numpy as np
>>> 
>>> # let downstream libraries/users worry about how to prepare the necessary interface objects
>>> my_allocator = np.create_allocator(
>>>     malloc=...,
>>>     calloc=...,
>>>     realloc=...,
>>>     free=...)
>>> with my_allocator:  # change the allocator locally
...     a = np.array([1, 2, 3])
>>> 
>>> # or, change the allocator globally
>>> curr_allocator = np.get_allocator()
>>> np.set_allocator(my_allocator)
>>> b = np.array([4, 5, 6])

My NumPy Allocator API actually fulfills all these requirements through:

a) The allocator metaclass (`numpy_allocator.type`)

Let me first explain how metaclasses work (for those not familiar with the term). Classes in Python, by default, use object as their base and type as their metaclass, which means that the following (3) are equivalent:

class <name>:
    pass

# or

class <name>(metaclass=type):
    pass

# or

<name> = type('<name>', (), dict())

Custom metaclasses can be used to provide static attributes to a class. In NumPy Allocator's case, numpy_allocator.type is responsible for initializing the handler capsule (based on the provided allocator funcs), but also for the context management functionality (__enter__/__exit__). It makes describing custom allocators as easy as:

class <name>(metaclass=numpy_allocator.type):
    _calloc_ = <calloc>
    _free_ = <free>
    _malloc_ = <malloc>
    _realloc_ = <realloc>

# or

<name> = numpy_allocator.type('<name>', (), dict(_calloc_=<calloc>, _free_=<free>, _malloc_=<malloc>, _realloc_=<realloc>))

Of course you are free to wrap this with a function:

def create_allocator(name, malloc=None, calloc=None, realloc=None, free=None):
    return numpy_allocator.type(name, (), dict(_calloc_=calloc, _free_=free, _malloc_=malloc, _realloc_=realloc))

It's important to note here that ctypes function pointers are used to define the _{c|m|re}alloc_/_free_ functions (independently), which means that both Python callback functions and DLL symbols are allowed.

b) The low-level handler API (`numpy_allocator.{g|s}et_handler`)

If you don't care about context management and the high-level interface that the metaclass offers, you can still use the low-level handler API to switch between allocators. If you already have a capsule containing the NumPy memory handler, you can use only that part of the library to get, set or reset the context-local handler, but also to extract the handler from a specific array.

# let's assume that somehow we have got a valid "mem_handler" capsule (e.g. std_handler)

# std_handler could be also created like this
# class std_allocator(metaclass=numpy_allocator.type):
#     pass

# std_handler = std_allocator.handler()

#################################

numpy_allocator.set_handler(std_handler)

test = numpy.ndarray(())

print(numpy.core.multiarray.get_handler_name())  # prints: std_allocator

numpy_allocator.set_handler(None)

print(numpy.core.multiarray.get_handler_name())  # prints: default_allocator

numpy_allocator.set_handler(numpy_allocator.get_handler(test))

print(numpy.core.multiarray.get_handler_name())  # prints: std_allocator

eliaskoromilas · 2021-11-08T22:44:19Z

... about hooking allocators into Python, there are a few use cases that stick out to me:

Allocators with specific alignment (some low-level function expect very specific alignment)

This is an example of how an aligned allocator can be written in Python, utilizing the NumPy Allocator API.

Shared memory (useful for working with large memory allocations in a process parallel friendly way)

This is an example of how we (InAccel) use this API to integrate our shared memory architecture with NumPy. As you will notice, in this case we just open libcoral-api DLL and pass on the desired symbols to the respective inaccel_allocator class attributes.

seberg · 2021-11-08T22:52:02Z

@eliaskoromilas just wondering if you have a thought on how/whether we should have some versioning (e.g. a version number stored in the struct)?

jakirkham · 2021-11-09T00:47:17Z

Maybe there could be a method for getting that version? Agree having a versioned API is important (this API may well change in the future)

eliaskoromilas · 2021-11-09T10:04:19Z

@eliaskoromilas just wondering if you have a thought on how/whether we should have some versioning (e.g. a version number stored in the struct)?

#17582 introduced the following API:

PyObject * PyDataMem_GetHandler()
PyObject * PyDataMem_SetHandler(PyObject *handler)

with PyObject representing a capsule of a valid PyDataMem_Handler struct object.

A PyCapsule is actually just a named wrapper around a pointer, with reference count capabilities and an optional destructor. In the current implementation, the NumPy API accepts/returns capsules with the name "mem_handler", containing PyDataMem_Handler pointers.

I think it's important to mark this as Stable ABI (promising backwards compatibility, deprecation period, etc.). Fortunately, there is a way to make this happen through the use of capsule names. I've already proposed this in a comment, but let me explain here in more detail.

Let's assume that in the future there is a need to introduce a memcpy func. To allow backwards compatibility, the new version of the handler should be defined in a new struct (let's just add a "2" suffix):

typedef struct {
    void *ctx;
    void* (*malloc) (void *ctx, size_t size);
    void* (*calloc) (void *ctx, size_t nelem, size_t elsize);
    void* (*realloc) (void *ctx, void *ptr, size_t new_size);
    void* (*memcpy) (void *ctx, void *dest, void *src, size_t n); /* this is the new func */
    void (*free) (void *ctx, void *ptr, size_t size);
} PyDataMemAllocator2;

typedef struct {
    char name[128];  /* multiple of 64 to keep the struct aligned */
    PyDataMemAllocator2 allocator;
} PyDataMem_Handler2;

In order to allow both the old and the new struct to co-exist, we need to be able to distinguish them. The capsule names come in handy here and can play the role of the version identifier. Since capsules are the way to pass handlers around, we can just use a different capsule name (e.g. "mem_handler2") for the new struct objects.

Now, PyDataMem_{G|S}etHandler may accept/return capsules named either "mem_handler" or "mem_handler2", containing PyDataMem_Handler or PyDataMem_Handler2 struct objects, respectively.

Benefits:

A Python user can easily tell which handler version an allocator supports.

>>> my_allocator.handler()
<capsule object "mem_handler2" at 0x7f1326654c90>

An external NumPy allocator library can easily upgrade to the new version just by adopting the new struct and updating the capsule name.

# let's also assume that the library wants to support both versions,
# but also set the newest supported as default one (my_allocator)

# my_allocator_v1 contains a "mem_handler" capsule
# my_allocator_v2 contains a "mem_handler2" capsule

if version.parse(numpy.__version__) < version.parse('X.Y.Z'):
    my_allocator = my_allocator_v1
else:
    my_allocator = my_allocator_v2

NumPy can exploit the PyCapsule API to handle all the accepted handler versions.

/* handler here is a capsule provided by the user */

if (PyCapsule_IsValid(handler, "mem_handler") {
   PyDataMem_Handler *mem_handler = (PyDataMem_Handler *) PyCapsule_GetPointer(handler, "mem_handler");
   /* allocator actions, for memcpy use the default function */
} else if (PyCapsule_IsValid(handler, "mem_handler2") {
    PyDataMem_Handler2 *mem_handler2 = (PyDataMem_Handler2 *) PyCapsule_GetPointer(handler, "mem_handler2");
    /* allocator actions */
} else {
    /* unknown version */
}

/*
or to avoid running these checks multiple times,
only PyDataMem_SetHandler could perform them,
(if needed) transforming the handler to a v2 handler
(in this case, setting a default value for the missing `memcpy` field), 
and then storing the handler in the context-local var.
*/

To summarize, the capsule API that the ENH introduced can support versioning through capsule names. In my opinion it would be redundant to have a version field in the handler structs, since there is already a way to "label" the handler capsules.

seberg · 2021-11-09T15:10:24Z

Hmm, means we need to do if/else chains. Although, I guess you could also get the capsule name and do a strncmp only on the first part (up to the version), then check the version.
My main thought was that it might be nice if alloc and free don't bloat for this, but I suppose that is possible.

leofang · 2021-11-09T15:21:09Z

IIRC @seberg isn't a fan of using the capsule name to contain operational information 😄 But I can live with that.

One nitpick on @eliaskoromilas's example is that PyDataMemAllocator2 should be made ABI compatible with PyDataMemAllocator by adding memcpy to the end of the struct. Extra safety should be always preferred.

seberg · 2021-11-09T15:27:42Z

Yeah, but I can be convinced. I don't like that if/else chain, but now I realize that can probably be somewhat avoided. Not a huge fan, but so long version is the only thing we safe it is probably fine.
And I don't really expect more versions than maybe 2 and 3 ;).

And I suppose... we could even "deprecate" versions effectively, by making a very quick check in the __dealloc__, which is guaranteed to get called.

EDIT: And yes, I assume any newer version is ABI compatible with all older ones.

eliaskoromilas · 2021-11-09T21:27:09Z

IIRC @seberg isn't a fan of using the capsule name to contain operational information smile But I can live with that.

I'm not a huge fan either. In the capsule name solution there is no compatibility between the different handler structs. In other words, we don't have "versions" but "identifiers".

One nitpick on @eliaskoromilas's example is that PyDataMemAllocator2 should be made ABI compatible with PyDataMemAllocator by adding memcpy to the end of the struct. Extra safety should be always preferred.

Intentionally I messed with the existing fields to note this functionality. If we choose to go with this solution, "mem_handler2" capsules won't be supported by NumPy versions that only know how to handle "mem_handler" capsules, etc. User libraries need to make sure that their allocator is compatible with the installed ΝumPy version.

EDIT: And yes, I assume any newer version is ABI compatible with all older ones.

There is an alternate solution, of course, that focuses exactly on that. In this solution:

we (MUST) keep the capsule name fixed ("mem_handler")
we add a version field before/after the name field in the handler struct (needs to be done immediately, before 1.22.0)
newer NumPy versions are allowed only to append fields/funcs to the allocator struct

This means that user libraries may set 1.22.0 as their minimum required NumPy version, but still update their handler structs to match the latest NumPy release.

Hmm, means we need to do if/else chains.

if/else chains would be also required in this solution, if we want newer NumPy versions to accept older handler allocator structs. Newer NumPy versions still need to know if/which e.g. functions are missing from a handler, and apply defaults. Of course there won't be string comparisons :).

And I don't really expect more versions than maybe 2 and 3 ;).

I agree.

TLDR; If we think that future versions (most probably) are going to be extensions to the current struct (e.g. extra functions), a version field will simplify a lot its maintenance (both in user and NumPy side). Capsule names would be useful in a more complex scenario.

jakirkham · 2021-11-10T06:08:42Z

Matti added PR ( #20343 ) to include versioning. Would be great if others here took a look 🙂

charris · 2021-11-16T01:05:55Z

Is it OK to push this off? We now have versioning, and I don't think all the tasks here will be finished in time.

seberg · 2021-11-16T03:18:40Z

Yeah, should be good now.

mattip · 2022-05-18T16:28:41Z

Closing, I think we have resolved enough of these problems, please reopen or open a new issue.

hmaarrfk · 2022-05-18T17:17:18Z

by any chance can you point us to the documentation for the new api?

mattip · 2022-05-18T17:30:16Z

The documentation added is https://numpy.org/devdocs/reference/c-api/data_memory.html, and there is also the NEP https://numpy.org/neps/nep-0049.html

@jakirkham

@jakirkham did you want to help maintain? I noticed you were active in the discussion referenced below. @eliaskoromilas do you have any interest in helping maintain conda-forge packages? numpy/numpy#20193 (comment)

seberg added the 17 - Task label Oct 25, 2021

seberg added this to the 1.22.0 release milestone Oct 25, 2021

jakirkham mentioned this issue Nov 5, 2021

Use an aligned allocator for NumPy? #5312

Closed

mattip mentioned this issue Nov 9, 2021

ENH: add a 'version' field to PyDataMem_Handler #20343

Merged

eliaskoromilas mentioned this issue Nov 10, 2021

ENH: Avoid re-encapsulation of the default handler #20345

Merged

charris modified the milestones: 1.22.0 release, 1.23.0 release Nov 16, 2021

jakirkham mentioned this issue Feb 22, 2022

Optionally use NumPy to allocate buffers dask/distributed#5750

Merged

3 tasks

mattip closed this as completed May 18, 2022

hmaarrfk mentioned this issue Nov 20, 2022

Add numpy-allocator conda-forge/staged-recipes#21216

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TASK: Potential fixup/followups for the allocator changes #20193

TASK: Potential fixup/followups for the allocator changes #20193

seberg commented Oct 25, 2021 •

edited

mattip commented Oct 25, 2021

seberg commented Oct 25, 2021

leofang commented Oct 25, 2021

mattip commented Oct 26, 2021

seberg commented Oct 26, 2021

leofang commented Oct 26, 2021

leofang commented Oct 26, 2021 •

edited

seberg commented Oct 26, 2021 •

edited

jakirkham commented Oct 27, 2021

jakirkham commented Nov 5, 2021

eliaskoromilas commented Nov 8, 2021 •

edited

eliaskoromilas commented Nov 8, 2021 •

edited

seberg commented Nov 8, 2021

jakirkham commented Nov 9, 2021

eliaskoromilas commented Nov 9, 2021 •

edited

seberg commented Nov 9, 2021

leofang commented Nov 9, 2021

seberg commented Nov 9, 2021 •

edited

eliaskoromilas commented Nov 9, 2021

jakirkham commented Nov 10, 2021

charris commented Nov 16, 2021

seberg commented Nov 16, 2021

mattip commented May 18, 2022

hmaarrfk commented May 18, 2022

mattip commented May 18, 2022

TASK: Potential fixup/followups for the allocator changes #20193

TASK: Potential fixup/followups for the allocator changes #20193

Comments

seberg commented Oct 25, 2021 • edited

mattip commented Oct 25, 2021

seberg commented Oct 25, 2021

leofang commented Oct 25, 2021

mattip commented Oct 26, 2021

seberg commented Oct 26, 2021

leofang commented Oct 26, 2021

leofang commented Oct 26, 2021 • edited

seberg commented Oct 26, 2021 • edited

jakirkham commented Oct 27, 2021

jakirkham commented Nov 5, 2021

eliaskoromilas commented Nov 8, 2021 • edited

a) The allocator metaclass (numpy_allocator.type)

b) The low-level handler API (numpy_allocator.{g|s}et_handler)

eliaskoromilas commented Nov 8, 2021 • edited

seberg commented Nov 8, 2021

jakirkham commented Nov 9, 2021

eliaskoromilas commented Nov 9, 2021 • edited

seberg commented Nov 9, 2021

leofang commented Nov 9, 2021

seberg commented Nov 9, 2021 • edited

eliaskoromilas commented Nov 9, 2021

jakirkham commented Nov 10, 2021

charris commented Nov 16, 2021

seberg commented Nov 16, 2021

mattip commented May 18, 2022

hmaarrfk commented May 18, 2022

mattip commented May 18, 2022

seberg commented Oct 25, 2021 •

edited

leofang commented Oct 26, 2021 •

edited

seberg commented Oct 26, 2021 •

edited

eliaskoromilas commented Nov 8, 2021 •

edited

a) The allocator metaclass (`numpy_allocator.type`)

b) The low-level handler API (`numpy_allocator.{g|s}et_handler`)

eliaskoromilas commented Nov 8, 2021 •

edited

eliaskoromilas commented Nov 9, 2021 •

edited

seberg commented Nov 9, 2021 •

edited