ENH: add a 'version' field to PyDataMem_Handler #20343

mattip · 2021-11-09T22:28:44Z

Related to #20193, where discussion seemed to settle on adding an explicit version to the PyDataMem_Handler. One bytes is stolen from the name field. A get_handler_version function was added as well in the same private namespace as the get_handler_name.

I also updated the NEP and the documentation.

Decisions to discuss:

reject the whole idea
use a different-sized field
expose a single get_handler_info tuple of (name, version).

@eliaskoromilas @jakirkham @seberg, thoughts?

jakirkham · 2021-11-09T22:35:24Z

cc @leofang

seberg · 2021-11-10T00:42:44Z

Copying out that the struct is now:

        typedef struct {
             char name[127];  /* multiple of 64 to keep the struct aligned */
             uint8_t version; /* currently 1 */
             PyDataMemAllocator allocator;
         } PyDataMem_Handler;

just for visibility.

I am happy with this, I don't really expect this will be super important, but better do this slightly annoying now than jump through hoops in the eventuality.

From my side this is all good: but if anyone has suggestions to a slightly different struct layout, etc. also happy!

Otherwise, this should go in into NumPy 1.22 and with that the handler API is de-facto finalized.

leofang · 2021-11-10T02:57:34Z

@mattip @seberg Maybe it's irrelevant of this PR (and if so I am happy to discuss offline, but this seems to be a good time since we are changing the struct): We were discussing NumPy's Handler design internally, and a question was raised regarding the alignment requirement of the struct. Why do we need 128 bytes for name instead of 64 or fewer? Moreover, are we sure it's aligned as intended without any compiler specific instructions?

seberg · 2021-11-10T03:17:04Z

@leofang my guess was/is that it does not matter how long the name is, since it will be on a different cache line anyway (and the name is probably rarely checked, at least not in the context free/dealloc is)? So a bit of "empty space" between the two seems fine and gives you more space for weird names.

Putting the version where it is right now, puts it outside of the cache-line of the other attributes. That could matter, but I assume any new functions are unlikely to be called as much as alloc and dealloc (and we actually still can steal space from the name if we want to).

Moreover, are we sure it's aligned as intended without any compiler specific instructions?

No, I am not sure about it, but I did not investigate or care too much. I think you are right and this would also make sense in some other places in NumPy.

I do not think it belongs here, but I would be happy about a PR or pointers on adding something like that. (But I don't think we should enforce it, just an optimization for NumPy's allocator that we can also put into the docs so others can use it)

jakirkham · 2021-11-10T06:06:51Z

Thanks Matti! 😄 Generally this seems reasonable. Also agree it would be good to get in NumPy 1.22

What happens when an allocator version goes beyond 255? Or do we think this is unlikely to happen?

Would be great to get @eliaskoromilas' perspective given he's looked at how to bind this stuff into Python 🙂

mattip · 2021-11-10T07:00:54Z

What happens when an allocator version goes beyond 255? Or do we think this is unlikely to happen?

Probably will never happen. If it does, I would hope by then we can re-use one of the older versions with a new name.

As long as we are changing things, I am also happy to drop the name to 63 bytes, that seems like enough. Thoughts?

eliaskoromilas · 2021-11-10T07:27:12Z

Thanks @mattip, that was quick. It looks good to me.

What happens when an allocator version goes beyond 255? Or do we think this is unlikely to happen?

256 versions means an allocator struct with at least 259 (4 + 255) fields/funcs. I can barely list a few:

reallocarray (? maybe not)
{c|m|re}allocate_at_least similar to C++ allocate_at_least
handle_{c|m|re}alloc_error similar to Rust handle_alloc_error
memcpy
memset
maybe more <string.h> mem* functions ?

I believe a byte should be enough.

As long as we are changing things, I am also happy to drop the name to 63 bytes, that seems like enough. Thoughts?

I think name is just informational. 63 characters sound a lot.

Not entirely related to this PR, but should we also add a capsule name "fail early" check in PyDataMem_SetHandler? I tested passing a capsule with a wrong name and it fails with a MemoryError at first malloc/calloc but without an error message. If we want to avoid that check, should we at least set an error string before returning from PyDataMem_User* functions:

PyDataMem_Handler *handler = (PyDataMem_Handler *) PyCapsule_GetPointer(mem_handler, "mem_handler");
if (handler == NULL) {
    PyErr_SetString(PyExc_ValueError, "...");
    return NULL;
}

EDIT: Setting an exception there wouldn't work since the message would be always overridden by PyErr_NoMemory().

seberg · 2021-11-10T14:30:46Z

Lets not worry about an ABI version number going beyond there, if we bump it up once a year someone in 250 years should will probably have an idea about how to extend it ;).

Since @eliaskoromilas seems to be happy, going to put this in.

Happy to follow up, though (just remember that 1.22 timeline means in a month it might get weird to do fixups).

jakirkham · 2021-11-10T18:49:29Z

Sounds good.

Don't have strong thoughts on the length of the name field, but having some length there seems useful. People may want to create their own specialized allocators for different purposes. Also as Matti notes this can be flow over space for the version (if we get to that)

ENH: add a 'version' field to PyDataMem_Handler

6312c18

github-actions bot added the 01 - Enhancement label Nov 9, 2021

seberg added this to the 1.22.0 release milestone Nov 10, 2021

seberg approved these changes Nov 10, 2021

View reviewed changes

jakirkham mentioned this pull request Nov 10, 2021

TASK: Potential fixup/followups for the allocator changes #20193

Closed

5 tasks

eliaskoromilas mentioned this pull request Nov 10, 2021

ENH: Avoid re-encapsulation of the default handler #20345

Merged

seberg merged commit 8dbd507 into numpy:main Nov 10, 2021

mattip deleted the alloc-version branch December 27, 2022 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: add a 'version' field to PyDataMem_Handler #20343

ENH: add a 'version' field to PyDataMem_Handler #20343

Uh oh!

mattip commented Nov 9, 2021

Uh oh!

jakirkham commented Nov 9, 2021

Uh oh!

seberg commented Nov 10, 2021 •

edited

Loading

Uh oh!

leofang commented Nov 10, 2021

Uh oh!

seberg commented Nov 10, 2021

Uh oh!

jakirkham commented Nov 10, 2021 •

edited

Loading

Uh oh!

mattip commented Nov 10, 2021

Uh oh!

eliaskoromilas commented Nov 10, 2021 •

edited

Loading

Uh oh!

seberg commented Nov 10, 2021

Uh oh!

jakirkham commented Nov 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

ENH: add a 'version' field to PyDataMem_Handler #20343

ENH: add a 'version' field to PyDataMem_Handler #20343

Uh oh!

Conversation

mattip commented Nov 9, 2021

Uh oh!

jakirkham commented Nov 9, 2021

Uh oh!

seberg commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang commented Nov 10, 2021

Uh oh!

seberg commented Nov 10, 2021

Uh oh!

jakirkham commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Nov 10, 2021

Uh oh!

eliaskoromilas commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Nov 10, 2021

Uh oh!

jakirkham commented Nov 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

seberg commented Nov 10, 2021 •

edited

Loading

jakirkham commented Nov 10, 2021 •

edited

Loading

eliaskoromilas commented Nov 10, 2021 •

edited

Loading