Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more integer types in PyMemberDef #117031

Open
serhiy-storchaka opened this issue Mar 19, 2024 · 14 comments
Open

Support more integer types in PyMemberDef #117031

serhiy-storchaka opened this issue Mar 19, 2024 · 14 comments
Labels
3.13 bugs and security fixes topic-C-API type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Mar 19, 2024

I plan to add more wrappers for Posix structures. Many of them use types not supported in PyMemberDef, like uint32_t or pid_t. So the first step is to add support of many standard C and Posix integer types.

gh-114388 and gh-115011 were preparations to this step.

The open question: should we assign new codes for new types or define them as aliases to existing types if they are already exist (for example, Py_T_PID can be an alias of Py_T_INT, Py_T_LONG or Py_T_LONGLONG, depending on platform).

Linked PRs

@serhiy-storchaka serhiy-storchaka added topic-C-API 3.13 bugs and security fixes labels Mar 19, 2024
@serhiy-storchaka
Copy link
Member Author

cc @vstinner, @encukou

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Mar 19, 2024
Add support for standard C and Posix integer types like Py_T_UINT32,
Py_T_PTRDIFF, Py_T_OFF and Py_T_PID.
Add Py_T_SSIZE as alias of Py_T_PYSSIZET.
@encukou
Copy link
Member

encukou commented Mar 20, 2024

Do we need this in 3.13?
I would much rather add support for fields of custom size (and perhaps alignment/signedness/endianness), than special-case only the C/Posix types.

@vstinner
Copy link
Member

I like the idea of adding stdint.h types: int8/16/32/64_t, uint8/16/32/64_t, size_t and ssize_t. I'm also fine with intptr_t and uintptr_t since it's the recommended way to store an integer as a pointer in C.

I'm less sure about 128-bit flavor: is it supposed by all platforms supposed by Python? Currently, we don't use intmax_t. I would prefer to also leave this one aside for now.

Also, I'm not sure that we need aliases such as ptrdiff_t. I prefer to not add it yet.

For Unix types, "off_t" and pid_t", can we not add them as PyMemberDef types, but use the macro black magic comparing their SIZEOF to select the corresponding int or uint type with a size, such as int32_t for pid_t? If we add off_t, pid_t, what about uid_t and gid_t? and file descriptor? and socket handle? etc. System programming uses a tons of types.

In short, I prefer to start with a bare minimum set of types.

I would much rather add support for fields of custom size (and perhaps alignment/signedness/endianness),

I'm not aware of any type which requires a specific alignment. Usually, it "just" works :-)

Endianness: are you aware of an object which requires endianness conversion? Honestly, for "custom" use cases, just use getter and setter functions, no?

@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label Mar 27, 2024
@serhiy-storchaka
Copy link
Member Author

I need this to create wrappers of C structures. Several issues wait for this feature.

How do you specify the custom size/alignment/signedness/endianness? There is no place for this in PyMemberDef.

struct PyMemberDef {
    const char *name;
    int type;
    Py_ssize_t offset;
    int flags;
    const char *doc;
};

You can only encode this as a part of type or flags.

@serhiy-storchaka
Copy link
Member Author

Support of off_t and pid_t is needed for F_SETLK, F_SETLKW, and F_GETLK.

           struct flock {
               ...
               short l_type;    /* Type of lock: F_RDLCK,
                                   F_WRLCK, F_UNLCK */
               short l_whence;  /* How to interpret l_start:
                                   SEEK_SET, SEEK_CUR, SEEK_END */
               off_t l_start;   /* Starting offset for lock */
               off_t l_len;     /* Number of bytes to lock */
               pid_t l_pid;     /* PID of process blocking our lock
                                   (set by F_GETLK and F_OFD_GETLK) */
               ...
           };

(on different platforms the types and the order of fields can be different).

Support of pid_t is needed for F_GETOWN_EX and F_SETOWN_EX.

                  struct f_owner_ex {
                      int   type;
                      pid_t pid;
                  };

@vstinner
Copy link
Member

For me, it would be easier to review if your PR only add intX_t/uintX_t types. Later, we can discuss other types.


I understand the need for Unix types such as pid_t. My question is more if we can implement them as aliases to intX_t or uintX_t:

#if SIZEOF_PID_T == 8
#  define Py_T_PID Py_T_INT64
#elif SIZEOF_PID_T == 4
#  define Py_T_PID Py_T_INT32
// ... add more cases if needed ...
#else
#  error "unsupported pid_t type size"
#endif

The problem of this approach is that configure doesn't check if types are signed or not.

I see the that your implementation does it partially:

#ifdef MS_WINDOWS
# define Py_T_OFF      Py_T_LONGLONG
#else
# define Py_T_OFF      35
#endif
#define Py_T_PID 36

and:

#ifndef MS_WINDOWS
    case Py_T_OFF:
        v = _PyLong_FromByteArray((const unsigned char *)addr,
                                  sizeof(off_t), PY_LITTLE_ENDIAN, 1);
        break;
#endif
    case Py_T_PID:
        v = PyLong_FromPid(*(pid_t*)addr);
        break;

@encukou
Copy link
Member

encukou commented Mar 27, 2024

Let's go with size and signedness only, then. (Ignore alignment/endian, I was too fast to type that.)

Let's say that if ((type) & 0xFF00) >> 8 is non-zero, it gives the size of the type, and type & 1 is signedness. The other bits need to be 0. To define the members, we can add a macro like Py_T_INTEGER, so that Py_T_INTEGER(int32_t) gives you 0x401 and Py_T_INTEGER(uint32_t) gives 0x400.
Then anyone can add their favourite integer typedef, we're not limited to C/Posix ones.

@serhiy-storchaka
Copy link
Member Author

I was thinking about a similar idea when assigned values to new code types. It would be nice to encode signedness in the lowest bit, but unfortunately the existing code types do not follow any system.

As for the macro, how do you determine the signedness of the C type?

The larger issue is that there are more than two options for signedness. For example, -1 is a special value for uid_t and gid_t, but on some platforms these types are unsigned, and support values larger that the maximal value of the signed type of the same size. So _PyLong_FromUid and _Py_Uid_Converter are special, they cannot be made an alias of the converter for the standard C integer type. And we may need special converters for other Posix types, like dev_t (#89928, #113078).

Note also that the existing code types were not consistent with this. Only recently it was partially fixed, but there is more work, and it will take several releases to finish:

This is why I started with adding type codes for concrete C types. It is easier to specialize code for every concrete type. I am open to the idea of making some of new type codes aliases to other type codes if they do not need special converters yet, but we should decide what they should use -- standard integer C types (short, int, long, etc) or new fixed-size integer C types (intXX_t).

@encukou
Copy link
Member

encukou commented Mar 28, 2024

the existing code types do not follow any system.

That's fine. There'll always be exceptions, like PyObject* needing refcounting.
My proposed scheme allows exceptions when ((type) & 0xFF00) is zero. So all the existing codes are fine, and there's space for a couple hundred more.

how do you determine the signedness of the C type?

(((type) -1) < 0)? Perhaps there's some magic needed to avoid compiler warnings.

The larger issue is that there are more than two options for signedness. For example, -1 is a special value for uid_t and gid_t

More generally, something the C type system doesn't capture is whether negative Python integers can be stored in unsigned fields. Sadly, the current member types don't care about this property. Sometimes that's an error, but sometimes it's useful to allow this -- @zooba likes to give Windows error codes as examples here.
I see you've added a warning about it for Py_T_UINT; I wish that got some more review.

My proposed scheme has 7 bits left for details like this.


Anyway: IMO, at the very least we should do a systematic approach (with the macro) for the intNN_t and uintNN_t types -- those should all behave the same, and should match Py T_INT/Py_T_UINT in handling signedness.

@vstinner
Copy link
Member

Let's say that if ((type) & 0xFF00) >> 8 is non-zero, it gives the size of the type, and type & 1 is signedness.

What's the use case for that? Existing types don't implement it. I'm not convinced that it's useless.

Note: use _Py_IS_TYPE_SIGNED() to check if a type is signed or not, but it doesn't work in the preprocessor (#if), only in C.

@encukou
Copy link
Member

encukou commented Mar 28, 2024

What's the use case for that?

It allows the second part of the proposal: a Py_T_INTEGER(type) macro that anyone can use to support any integral type, not just the C/POSIX ones.

@vstinner
Copy link
Member

It allows the second part of the proposal: a Py_T_INTEGER(type) macro that anyone can use to support any integral type, not just the C/POSIX ones.

You cannot write a macro which checks the signedness of a type. I tested, the preprocessor cannot test it.

Do you have a working implementation (or just a proof-of-concept)?

@encukou
Copy link
Member

encukou commented Mar 28, 2024

You don't need to test this in the preprocessor. The macro only needs to produce a value to put in the memberdef.

#include <stdio.h>
#include <stdint.h>

#define Py_T_INTEGER(type) ((sizeof(type) << 8) | ((type)-1 < 0 ? 0 : 1))

int main() {
    printf("%04x\n", Py_T_INTEGER(int8_t));
    printf("%04x\n", Py_T_INTEGER(uint8_t));
    printf("%04x\n", Py_T_INTEGER(int16_t));
    printf("%04x\n", Py_T_INTEGER(uint16_t));
    printf("%04x\n", Py_T_INTEGER(int32_t));
    printf("%04x\n", Py_T_INTEGER(uint32_t));
    printf("%04x\n", Py_T_INTEGER(int64_t));
    printf("%04x\n", Py_T_INTEGER(uint64_t));
}

@zooba
Copy link
Member

zooba commented Mar 28, 2024

A set of bits that can map into a PyLong_AsNativeBytes call would make sense.

The "sign" bit needs to be two bits to handle the cases:

  • allow/fail negative input values
  • allow/fail positive input values with MSB set

(I need to poke on #116053 to figure out how best to handle the extra argument for the second one of these in AsNativeBytes.)

An unsigned pid_t that allows assigning -1 would be "allow" for both, which is the case that isn't covered by the basic signed/unsigned definitions.

I like Petr's idea of taking the second byte for size and to indicate that the lower byte is flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants