Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TASK: Potential fixup/followups for the allocator changes #20193

Closed
2 of 5 tasks
seberg opened this issue Oct 25, 2021 · 25 comments
Closed
2 of 5 tasks

TASK: Potential fixup/followups for the allocator changes #20193

seberg opened this issue Oct 25, 2021 · 25 comments

Comments

@seberg
Copy link
Member

seberg commented Oct 25, 2021

These is a list of potential followups for gh-17582

  • Release note is missing? PR DOC: add release note and move NEP 49 to Final #20194
  • I would like a bit "hand holding" for the arr->flags |= NPY_OWNDATA user. I.e. tell users in some place, (maybe the flag documentation?) linked from the warning: Please use use a PyCapsule as the array base (this is simple, but you have to know about PyCapsule). (I could look into that myself.)
  • PyCapsule_GetPointer technically can set an error and may need the GIL? I am not worried enough about this to delay, but the functions look like they are meant to be callable without the GIL (I doubt we use it, and I doubt it really can error/need the GIL, but maybe we can make it a bit nicer anyway).
  • The void functions use (normally) small allocation: I think it may make more sense to just use the Python allocators rather than a (possibly slow) aligned allocator. (This is VOID_compare)
  • We should double check that we are OK without an explicit versioning mechanism (and maybe related with users assigning directly assigning the handler).

Anyone else have anything that might make sense to look into?

@seberg seberg added this to the 1.22.0 release milestone Oct 25, 2021
@mattip
Copy link
Member

mattip commented Oct 25, 2021

Please use use a PyCapsule as the array base

An example might be the best way. Do we know of code in the wild that assigns NPY_OWNDATA ? Searcing the top 4000 PyPI packages did not yield any results.

@seberg
Copy link
Member Author

seberg commented Oct 25, 2021

Ah, I had done a github search: https://github.com/Seynen/egfrd/blob/e0ed0916797fe105f90e38e83b6375d340f94caf/peer/numpy/ndarray_converters.hpp#L60

or: https://github.com/ThierryDeruyttere/vilbert-Talk2car/blob/6476b16970cfd0d88e09beb9a57cc5c39b7acb3f/tools/refer/external/_mask.pyx#L96

(I guess it is good that the top 4000 PyPI packages are clean, means that this seems really only used in a fiew small "home grown" libs.)

@leofang
Copy link
Contributor

leofang commented Oct 25, 2021

It'd be nice to have Python APIs for the allocator, but I doubt it can be done in time for 1.22?

@mattip
Copy link
Member

mattip commented Oct 26, 2021

Whoops. Searching for NPY_ARRAY_OWNDATA does show a few packages that are (ab)using the flag.

@seberg
Copy link
Member Author

seberg commented Oct 26, 2021

Searching for NPY_ARRAY_OWNDATA does show a few packages that are (ab)using the flag.

Ah, that explains why my search felt like it only turned up fringe examples this time ;).

It'd be nice to have Python APIs for the allocator, but I doubt it can be done in time for 1.22?

@leofang I am not sure what you are looking for? Is there a use that is not just as well addressed by an example in the documentation, or, maybe even better, a dedicated small repository (which could live in the NumPy org)?

@leofang
Copy link
Contributor

leofang commented Oct 26, 2021

It'd be nice to have Python APIs for the allocator, but I doubt it can be done in time for 1.22?

@leofang I am not sure what you are looking for? Is there a use that is not just as well addressed by an example in the documentation, or, maybe even better, a dedicated small repository (which could live in the NumPy org)?

@seberg I need a Python API to switch the allocator in a Python session, as we likely cannot afford to make NumPy a build time dependency. Something similar to what @eliaskoromilas did with his numpy-allocator would do:

>>> import numpy as np
>>> 
>>> # let downstream libraries/users worry about how to prepare the necessary interface objects
>>> my_allocator = np.create_allocator(
>>>     malloc=...,
>>>     calloc=...,
>>>     realloc=...,
>>>     free=...)
>>> with my_allocator:  # change the allocator locally
...     a = np.array([1, 2, 3])
>>> 
>>> # or, change the allocator globally
>>> curr_allocator = np.get_allocator()
>>> np.set_allocator(my_allocator)
>>> b = np.array([4, 5, 6])

If it has to be in a separate repo, we might be able to live with it, but the preference is to have it in NumPy because it's really just a small interface.

@leofang
Copy link
Contributor

leofang commented Oct 26, 2021

@seberg Another orthogonal thing I've been pondering is how to make the allocator interact with the DLPack support (#19083). Ideally, for example, if I set the allocator to CUDA pinned memory or managed memory, when exporting through DLPack it should be able to set the corresponding DLDeviceType field correctly. Thanks to PyDataMem_Handler we could probably fetch the necessary info from its name field?

@seberg
Copy link
Member Author

seberg commented Oct 26, 2021

Uffffff... I personally don't really like encoding arbitrary stuff in names :(.

Maybe we do need some "reserved" space to allow putting in feature-flags/version or just some function pointer slots (easy enough)? If you are already proposing to extend the API/ABI. Or even just bite the bullet and consider a FromSpec API?

I don't like getting hands dirty in API discussion, but if this is impeding anyway better now than after release.

EDIT: Of course, you could write your own dlpack exporter already, but that cannot not work implicitly...

@jakirkham
Copy link
Contributor

Just to add to Leo's point about hooking allocators into Python, there are a few use cases that stick out to me:

  • Allocators with specific alignment (some low-level function expect very specific alignment)
  • Pinned memory (good for shepherding data-to-from special devices like GPUs ;)
  • Shared memory (useful for working with large memory allocations in a process parallel friendly way)

There are probably others I'm overlooking, but these come up a fair bit. Admittedly there is some C code in all of these, but it can be handy to switch to a different allocator (especially in particular contexts).

@jakirkham
Copy link
Contributor

cc @pentschev @madsbk @quasiben (for awareness)

@eliaskoromilas
Copy link
Contributor

eliaskoromilas commented Nov 8, 2021

Something similar to what @eliaskoromilas did with his numpy-allocator would do:

>>> import numpy as np
>>> 
>>> # let downstream libraries/users worry about how to prepare the necessary interface objects
>>> my_allocator = np.create_allocator(
>>>     malloc=...,
>>>     calloc=...,
>>>     realloc=...,
>>>     free=...)
>>> with my_allocator:  # change the allocator locally
...     a = np.array([1, 2, 3])
>>> 
>>> # or, change the allocator globally
>>> curr_allocator = np.get_allocator()
>>> np.set_allocator(my_allocator)
>>> b = np.array([4, 5, 6])

My NumPy Allocator API actually fulfills all these requirements through:

a) The allocator metaclass (numpy_allocator.type)

Let me first explain how metaclasses work (for those not familiar with the term). Classes in Python, by default, use object as their base and type as their metaclass, which means that the following (3) are equivalent:

class <name>:
    pass

# or

class <name>(metaclass=type):
    pass

# or

<name> = type('<name>', (), dict())

Custom metaclasses can be used to provide static attributes to a class. In NumPy Allocator's case, numpy_allocator.type is responsible for initializing the handler capsule (based on the provided allocator funcs), but also for the context management functionality (__enter__/__exit__). It makes describing custom allocators as easy as:

class <name>(metaclass=numpy_allocator.type):
    _calloc_ = <calloc>
    _free_ = <free>
    _malloc_ = <malloc>
    _realloc_ = <realloc>

# or

<name> = numpy_allocator.type('<name>', (), dict(_calloc_=<calloc>, _free_=<free>, _malloc_=<malloc>, _realloc_=<realloc>))

Of course you are free to wrap this with a function:

def create_allocator(name, malloc=None, calloc=None, realloc=None, free=None):
    return numpy_allocator.type(name, (), dict(_calloc_=calloc, _free_=free, _malloc_=malloc, _realloc_=realloc))

It's important to note here that ctypes function pointers are used to define the _{c|m|re}alloc_/_free_ functions (independently), which means that both Python callback functions and DLL symbols are allowed.

b) The low-level handler API (numpy_allocator.{g|s}et_handler)

If you don't care about context management and the high-level interface that the metaclass offers, you can still use the low-level handler API to switch between allocators. If you already have a capsule containing the NumPy memory handler, you can use only that part of the library to get, set or reset the context-local handler, but also to extract the handler from a specific array.

# let's assume that somehow we have got a valid "mem_handler" capsule (e.g. std_handler)

# std_handler could be also created like this
# class std_allocator(metaclass=numpy_allocator.type):
#     pass

# std_handler = std_allocator.handler()

#################################

numpy_allocator.set_handler(std_handler)

test = numpy.ndarray(())

print(numpy.core.multiarray.get_handler_name())  # prints: std_allocator

numpy_allocator.set_handler(None)

print(numpy.core.multiarray.get_handler_name())  # prints: default_allocator

numpy_allocator.set_handler(numpy_allocator.get_handler(test))

print(numpy.core.multiarray.get_handler_name())  # prints: std_allocator

@eliaskoromilas
Copy link
Contributor

eliaskoromilas commented Nov 8, 2021

... about hooking allocators into Python, there are a few use cases that stick out to me:

  • Allocators with specific alignment (some low-level function expect very specific alignment)

This is an example of how an aligned allocator can be written in Python, utilizing the NumPy Allocator API.

  • Shared memory (useful for working with large memory allocations in a process parallel friendly way)

This is an example of how we (InAccel) use this API to integrate our shared memory architecture with NumPy. As you will notice, in this case we just open libcoral-api DLL and pass on the desired symbols to the respective inaccel_allocator class attributes.

@seberg
Copy link
Member Author

seberg commented Nov 8, 2021

@eliaskoromilas just wondering if you have a thought on how/whether we should have some versioning (e.g. a version number stored in the struct)?

@jakirkham
Copy link
Contributor

Maybe there could be a method for getting that version? Agree having a versioned API is important (this API may well change in the future)

@eliaskoromilas
Copy link
Contributor

eliaskoromilas commented Nov 9, 2021

@eliaskoromilas just wondering if you have a thought on how/whether we should have some versioning (e.g. a version number stored in the struct)?

#17582 introduced the following API:

PyObject * PyDataMem_GetHandler()
PyObject * PyDataMem_SetHandler(PyObject *handler)

with PyObject representing a capsule of a valid PyDataMem_Handler struct object.

A PyCapsule is actually just a named wrapper around a pointer, with reference count capabilities and an optional destructor. In the current implementation, the NumPy API accepts/returns capsules with the name "mem_handler", containing PyDataMem_Handler pointers.

I think it's important to mark this as Stable ABI (promising backwards compatibility, deprecation period, etc.). Fortunately, there is a way to make this happen through the use of capsule names. I've already proposed this in a comment, but let me explain here in more detail.

Let's assume that in the future there is a need to introduce a memcpy func. To allow backwards compatibility, the new version of the handler should be defined in a new struct (let's just add a "2" suffix):

typedef struct {
    void *ctx;
    void* (*malloc) (void *ctx, size_t size);
    void* (*calloc) (void *ctx, size_t nelem, size_t elsize);
    void* (*realloc) (void *ctx, void *ptr, size_t new_size);
    void* (*memcpy) (void *ctx, void *dest, void *src, size_t n); /* this is the new func */
    void (*free) (void *ctx, void *ptr, size_t size);
} PyDataMemAllocator2;

typedef struct {
    char name[128];  /* multiple of 64 to keep the struct aligned */
    PyDataMemAllocator2 allocator;
} PyDataMem_Handler2;

In order to allow both the old and the new struct to co-exist, we need to be able to distinguish them. The capsule names come in handy here and can play the role of the version identifier. Since capsules are the way to pass handlers around, we can just use a different capsule name (e.g. "mem_handler2") for the new struct objects.

Now, PyDataMem_{G|S}etHandler may accept/return capsules named either "mem_handler" or "mem_handler2", containing PyDataMem_Handler or PyDataMem_Handler2 struct objects, respectively.

Benefits:

  • A Python user can easily tell which handler version an allocator supports.
>>> my_allocator.handler()
<capsule object "mem_handler2" at 0x7f1326654c90>
  • An external NumPy allocator library can easily upgrade to the new version just by adopting the new struct and updating the capsule name.
# let's also assume that the library wants to support both versions,
# but also set the newest supported as default one (my_allocator)

# my_allocator_v1 contains a "mem_handler" capsule
# my_allocator_v2 contains a "mem_handler2" capsule

if version.parse(numpy.__version__) < version.parse('X.Y.Z'):
    my_allocator = my_allocator_v1
else:
    my_allocator = my_allocator_v2
  • NumPy can exploit the PyCapsule API to handle all the accepted handler versions.
/* handler here is a capsule provided by the user */

if (PyCapsule_IsValid(handler, "mem_handler") {
   PyDataMem_Handler *mem_handler = (PyDataMem_Handler *) PyCapsule_GetPointer(handler, "mem_handler");
   /* allocator actions, for memcpy use the default function */
} else if (PyCapsule_IsValid(handler, "mem_handler2") {
    PyDataMem_Handler2 *mem_handler2 = (PyDataMem_Handler2 *) PyCapsule_GetPointer(handler, "mem_handler2");
    /* allocator actions */
} else {
    /* unknown version */
}

/*
or to avoid running these checks multiple times,
only PyDataMem_SetHandler could perform them,
(if needed) transforming the handler to a v2 handler
(in this case, setting a default value for the missing `memcpy` field), 
and then storing the handler in the context-local var.
*/

To summarize, the capsule API that the ENH introduced can support versioning through capsule names. In my opinion it would be redundant to have a version field in the handler structs, since there is already a way to "label" the handler capsules.

@seberg
Copy link
Member Author

seberg commented Nov 9, 2021

Hmm, means we need to do if/else chains. Although, I guess you could also get the capsule name and do a strncmp only on the first part (up to the version), then check the version.
My main thought was that it might be nice if alloc and free don't bloat for this, but I suppose that is possible.

@leofang
Copy link
Contributor

leofang commented Nov 9, 2021

IIRC @seberg isn't a fan of using the capsule name to contain operational information 😄 But I can live with that.

One nitpick on @eliaskoromilas's example is that PyDataMemAllocator2 should be made ABI compatible with PyDataMemAllocator by adding memcpy to the end of the struct. Extra safety should be always preferred.

@seberg
Copy link
Member Author

seberg commented Nov 9, 2021

Yeah, but I can be convinced. I don't like that if/else chain, but now I realize that can probably be somewhat avoided. Not a huge fan, but so long version is the only thing we safe it is probably fine.
And I don't really expect more versions than maybe 2 and 3 ;).

And I suppose... we could even "deprecate" versions effectively, by making a very quick check in the __dealloc__, which is guaranteed to get called.

EDIT: And yes, I assume any newer version is ABI compatible with all older ones.

@eliaskoromilas
Copy link
Contributor

IIRC @seberg isn't a fan of using the capsule name to contain operational information smile But I can live with that.

I'm not a huge fan either. In the capsule name solution there is no compatibility between the different handler structs. In other words, we don't have "versions" but "identifiers".

One nitpick on @eliaskoromilas's example is that PyDataMemAllocator2 should be made ABI compatible with PyDataMemAllocator by adding memcpy to the end of the struct. Extra safety should be always preferred.

Intentionally I messed with the existing fields to note this functionality. If we choose to go with this solution, "mem_handler2" capsules won't be supported by NumPy versions that only know how to handle "mem_handler" capsules, etc. User libraries need to make sure that their allocator is compatible with the installed ΝumPy version.

EDIT: And yes, I assume any newer version is ABI compatible with all older ones.

There is an alternate solution, of course, that focuses exactly on that. In this solution:

  • we (MUST) keep the capsule name fixed ("mem_handler")
  • we add a version field before/after the name field in the handler struct (needs to be done immediately, before 1.22.0)
  • newer NumPy versions are allowed only to append fields/funcs to the allocator struct

This means that user libraries may set 1.22.0 as their minimum required NumPy version, but still update their handler structs to match the latest NumPy release.

Hmm, means we need to do if/else chains.

if/else chains would be also required in this solution, if we want newer NumPy versions to accept older handler allocator structs. Newer NumPy versions still need to know if/which e.g. functions are missing from a handler, and apply defaults. Of course there won't be string comparisons :).

And I don't really expect more versions than maybe 2 and 3 ;).

I agree.

TLDR; If we think that future versions (most probably) are going to be extensions to the current struct (e.g. extra functions), a version field will simplify a lot its maintenance (both in user and NumPy side). Capsule names would be useful in a more complex scenario.

@jakirkham
Copy link
Contributor

Matti added PR ( #20343 ) to include versioning. Would be great if others here took a look 🙂

@charris
Copy link
Member

charris commented Nov 16, 2021

Is it OK to push this off? We now have versioning, and I don't think all the tasks here will be finished in time.

@seberg
Copy link
Member Author

seberg commented Nov 16, 2021

Yeah, should be good now.

@mattip
Copy link
Member

mattip commented May 18, 2022

Closing, I think we have resolved enough of these problems, please reopen or open a new issue.

@mattip mattip closed this as completed May 18, 2022
@hmaarrfk
Copy link
Contributor

by any chance can you point us to the documentation for the new api?

@mattip
Copy link
Member

mattip commented May 18, 2022

The documentation added is https://numpy.org/devdocs/reference/c-api/data_memory.html, and there is also the NEP https://numpy.org/neps/nep-0049.html

hmaarrfk added a commit to hmaarrfk/staged-recipes that referenced this issue Nov 20, 2022
@jakirkham did you want to help maintain? I noticed you were active in
the discussion referenced below.

@eliaskoromilas do you have any interest in helping maintain conda-forge
packages?

numpy/numpy#20193 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants