Support more integer types in PyMemberDef #117031

serhiy-storchaka · 2024-03-19T19:10:04Z

I plan to add more wrappers for Posix structures. Many of them use types not supported in PyMemberDef, like uint32_t or pid_t. So the first step is to add support of many standard C and Posix integer types.

gh-114388 and gh-115011 were preparations to this step.

The open question: should we assign new codes for new types or define them as aliases to existing types if they are already exist (for example, Py_T_PID can be an alias of Py_T_INT, Py_T_LONG or Py_T_LONGLONG, depending on platform).

Linked PRs

gh-117031: Add support for new member types for PyMemberDef.type #117032

The text was updated successfully, but these errors were encountered:

serhiy-storchaka · 2024-03-19T19:10:32Z

cc @vstinner, @encukou

Add support for standard C and Posix integer types like Py_T_UINT32, Py_T_PTRDIFF, Py_T_OFF and Py_T_PID. Add Py_T_SSIZE as alias of Py_T_PYSSIZET.

encukou · 2024-03-20T10:36:38Z

Do we need this in 3.13?
I would much rather add support for fields of custom size (and perhaps alignment/signedness/endianness), than special-case only the C/Posix types.

vstinner · 2024-03-22T08:52:52Z

I like the idea of adding stdint.h types: int8/16/32/64_t, uint8/16/32/64_t, size_t and ssize_t. I'm also fine with intptr_t and uintptr_t since it's the recommended way to store an integer as a pointer in C.

I'm less sure about 128-bit flavor: is it supposed by all platforms supposed by Python? Currently, we don't use intmax_t. I would prefer to also leave this one aside for now.

Also, I'm not sure that we need aliases such as ptrdiff_t. I prefer to not add it yet.

For Unix types, "off_t" and pid_t", can we not add them as PyMemberDef types, but use the macro black magic comparing their SIZEOF to select the corresponding int or uint type with a size, such as int32_t for pid_t? If we add off_t, pid_t, what about uid_t and gid_t? and file descriptor? and socket handle? etc. System programming uses a tons of types.

In short, I prefer to start with a bare minimum set of types.

I would much rather add support for fields of custom size (and perhaps alignment/signedness/endianness),

I'm not aware of any type which requires a specific alignment. Usually, it "just" works :-)

Endianness: are you aware of an object which requires endianness conversion? Honestly, for "custom" use cases, just use getter and setter functions, no?

serhiy-storchaka · 2024-03-27T08:54:21Z

I need this to create wrappers of C structures. Several issues wait for this feature.

How do you specify the custom size/alignment/signedness/endianness? There is no place for this in PyMemberDef.

struct PyMemberDef {
    const char *name;
    int type;
    Py_ssize_t offset;
    int flags;
    const char *doc;
};

You can only encode this as a part of type or flags.

serhiy-storchaka · 2024-03-27T09:20:53Z

Support of off_t and pid_t is needed for F_SETLK, F_SETLKW, and F_GETLK.

           struct flock {
               ...
               short l_type;    /* Type of lock: F_RDLCK,
                                   F_WRLCK, F_UNLCK */
               short l_whence;  /* How to interpret l_start:
                                   SEEK_SET, SEEK_CUR, SEEK_END */
               off_t l_start;   /* Starting offset for lock */
               off_t l_len;     /* Number of bytes to lock */
               pid_t l_pid;     /* PID of process blocking our lock
                                   (set by F_GETLK and F_OFD_GETLK) */
               ...
           };

(on different platforms the types and the order of fields can be different).

Support of pid_t is needed for F_GETOWN_EX and F_SETOWN_EX.

                  struct f_owner_ex {
                      int   type;
                      pid_t pid;
                  };

vstinner · 2024-03-27T12:58:12Z

For me, it would be easier to review if your PR only add intX_t/uintX_t types. Later, we can discuss other types.

I understand the need for Unix types such as pid_t. My question is more if we can implement them as aliases to intX_t or uintX_t:

#if SIZEOF_PID_T == 8
#  define Py_T_PID Py_T_INT64
#elif SIZEOF_PID_T == 4
#  define Py_T_PID Py_T_INT32
// ... add more cases if needed ...
#else
#  error "unsupported pid_t type size"
#endif

The problem of this approach is that configure doesn't check if types are signed or not.

I see the that your implementation does it partially:

#ifdef MS_WINDOWS
# define Py_T_OFF      Py_T_LONGLONG
#else
# define Py_T_OFF      35
#endif
#define Py_T_PID 36

and:

#ifndef MS_WINDOWS
    case Py_T_OFF:
        v = _PyLong_FromByteArray((const unsigned char *)addr,
                                  sizeof(off_t), PY_LITTLE_ENDIAN, 1);
        break;
#endif
    case Py_T_PID:
        v = PyLong_FromPid(*(pid_t*)addr);
        break;

encukou · 2024-03-27T14:20:48Z

Let's go with size and signedness only, then. (Ignore alignment/endian, I was too fast to type that.)

Let's say that if ((type) & 0xFF00) >> 8 is non-zero, it gives the size of the type, and type & 1 is signedness. The other bits need to be 0. To define the members, we can add a macro like Py_T_INTEGER, so that Py_T_INTEGER(int32_t) gives you 0x401 and Py_T_INTEGER(uint32_t) gives 0x400.
Then anyone can add their favourite integer typedef, we're not limited to C/Posix ones.

serhiy-storchaka · 2024-03-27T17:45:38Z

I was thinking about a similar idea when assigned values to new code types. It would be nice to encode signedness in the lowest bit, but unfortunately the existing code types do not follow any system.

As for the macro, how do you determine the signedness of the C type?

The larger issue is that there are more than two options for signedness. For example, -1 is a special value for uid_t and gid_t, but on some platforms these types are unsigned, and support values larger that the maximal value of the signed type of the same size. So _PyLong_FromUid and _Py_Uid_Converter are special, they cannot be made an alias of the converter for the standard C integer type. And we may need special converters for other Posix types, like dev_t (#89928, #113078).

Note also that the existing code types were not consistent with this. Only recently it was partially fixed, but there is more work, and it will take several releases to finish:

This is why I started with adding type codes for concrete C types. It is easier to specialize code for every concrete type. I am open to the idea of making some of new type codes aliases to other type codes if they do not need special converters yet, but we should decide what they should use -- standard integer C types (short, int, long, etc) or new fixed-size integer C types (intXX_t).

encukou · 2024-03-28T08:24:13Z

the existing code types do not follow any system.

That's fine. There'll always be exceptions, like PyObject* needing refcounting.
My proposed scheme allows exceptions when ((type) & 0xFF00) is zero. So all the existing codes are fine, and there's space for a couple hundred more.

how do you determine the signedness of the C type?

(((type) -1) < 0)? Perhaps there's some magic needed to avoid compiler warnings.

The larger issue is that there are more than two options for signedness. For example, -1 is a special value for uid_t and gid_t

More generally, something the C type system doesn't capture is whether negative Python integers can be stored in unsigned fields. Sadly, the current member types don't care about this property. Sometimes that's an error, but sometimes it's useful to allow this -- @zooba likes to give Windows error codes as examples here.
I see you've added a warning about it for Py_T_UINT; I wish that got some more review.

My proposed scheme has 7 bits left for details like this.

Anyway: IMO, at the very least we should do a systematic approach (with the macro) for the intNN_t and uintNN_t types -- those should all behave the same, and should match Py T_INT/Py_T_UINT in handling signedness.

vstinner · 2024-03-28T08:31:32Z

Let's say that if ((type) & 0xFF00) >> 8 is non-zero, it gives the size of the type, and type & 1 is signedness.

What's the use case for that? Existing types don't implement it. I'm not convinced that it's useless.

Note: use _Py_IS_TYPE_SIGNED() to check if a type is signed or not, but it doesn't work in the preprocessor (#if), only in C.

encukou · 2024-03-28T08:48:52Z

What's the use case for that?

It allows the second part of the proposal: a Py_T_INTEGER(type) macro that anyone can use to support any integral type, not just the C/POSIX ones.

vstinner · 2024-03-28T11:20:22Z

It allows the second part of the proposal: a Py_T_INTEGER(type) macro that anyone can use to support any integral type, not just the C/POSIX ones.

You cannot write a macro which checks the signedness of a type. I tested, the preprocessor cannot test it.

Do you have a working implementation (or just a proof-of-concept)?

encukou · 2024-03-28T12:38:39Z

You don't need to test this in the preprocessor. The macro only needs to produce a value to put in the memberdef.

#include <stdio.h>
#include <stdint.h>

#define Py_T_INTEGER(type) ((sizeof(type) << 8) | ((type)-1 < 0 ? 0 : 1))

int main() {
    printf("%04x\n", Py_T_INTEGER(int8_t));
    printf("%04x\n", Py_T_INTEGER(uint8_t));
    printf("%04x\n", Py_T_INTEGER(int16_t));
    printf("%04x\n", Py_T_INTEGER(uint16_t));
    printf("%04x\n", Py_T_INTEGER(int32_t));
    printf("%04x\n", Py_T_INTEGER(uint32_t));
    printf("%04x\n", Py_T_INTEGER(int64_t));
    printf("%04x\n", Py_T_INTEGER(uint64_t));
}

zooba · 2024-03-28T17:15:15Z

A set of bits that can map into a PyLong_AsNativeBytes call would make sense.

The "sign" bit needs to be two bits to handle the cases:

allow/fail negative input values
allow/fail positive input values with MSB set

(I need to poke on #116053 to figure out how best to handle the extra argument for the second one of these in AsNativeBytes.)

An unsigned pid_t that allows assigning -1 would be "allow" for both, which is the case that isn't covered by the basic signed/unsigned definitions.

I like Petr's idea of taking the second byte for size and to indicate that the lower byte is flags.

serhiy-storchaka added topic-C-API 3.13 bugs and security fixes labels Mar 19, 2024

bedevere-app bot mentioned this issue Mar 19, 2024

gh-117031: Add support for new member types for PyMemberDef.type #117032

Open

serhiy-storchaka added the type-feature A feature request or enhancement label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support more integer types in PyMemberDef #117031

Support more integer types in PyMemberDef #117031

serhiy-storchaka commented Mar 19, 2024 •

edited by bedevere-app bot

serhiy-storchaka commented Mar 19, 2024

encukou commented Mar 20, 2024

vstinner commented Mar 22, 2024

serhiy-storchaka commented Mar 27, 2024

serhiy-storchaka commented Mar 27, 2024

vstinner commented Mar 27, 2024

encukou commented Mar 27, 2024

serhiy-storchaka commented Mar 27, 2024

encukou commented Mar 28, 2024

vstinner commented Mar 28, 2024

encukou commented Mar 28, 2024

vstinner commented Mar 28, 2024

encukou commented Mar 28, 2024

zooba commented Mar 28, 2024

Support more integer types in PyMemberDef #117031

Support more integer types in PyMemberDef #117031

Comments

serhiy-storchaka commented Mar 19, 2024 • edited by bedevere-app bot

Linked PRs

serhiy-storchaka commented Mar 19, 2024

encukou commented Mar 20, 2024

vstinner commented Mar 22, 2024

serhiy-storchaka commented Mar 27, 2024

serhiy-storchaka commented Mar 27, 2024

vstinner commented Mar 27, 2024

encukou commented Mar 27, 2024

serhiy-storchaka commented Mar 27, 2024

encukou commented Mar 28, 2024

vstinner commented Mar 28, 2024

encukou commented Mar 28, 2024

vstinner commented Mar 28, 2024

encukou commented Mar 28, 2024

zooba commented Mar 28, 2024

serhiy-storchaka commented Mar 19, 2024 •

edited by bedevere-app bot