C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

vstinner · 2023-10-20T21:39:02Z

Feature or enhancement

The private _PyLong_AsByteArray() and _PyLong_FromByteArray() functions were removed in Python 3.13: see PR #108429.

@scoder asked what is the intended replacement for _PyLong_FromByteArray().

The replacement for _PyLong_FromByteArray() is PyObject_CallMethod((PyObject*)&PyList_Type, "from_bytes", "s#s", str, len, "big") but I'm not sure what is the easy way to set the signed parameter to True (default: signed=False).

The replacement for _PyLong_AsByteArray() is PyObject_CallMethod(my_int, "to_bytes", "ns", length, "big"). Same, I'm not sure how to easy set the signed parameter to True (default: signed=False).

I propose to add public PyLong_AsByteArray() and PyLong_FromByteArray() functions to the C API.

Python 3.12 modified PyLongObject: it's no longer a simple array of digits, but it's now a more less straightforward _PyLongValue structure which requires using unstable functions to access small "compact" values:

PyUnstable_Long_IsCompact()
PyUnstable_Long_CompactValue()

So having a reliable and simple way to import/export a Python int object as bytes became even more important.

A code search for _PyLong_AsByteArray in PyPI top 5,000 projects found 12 projects using it:

Cython (0.29.36)
blspy (2.0.2)
catboost (1.2)
fastobo (0.12.2)
gevent (22.10.2)
guppy3 (3.1.3)
line_profiler (4.0.3)
msgspec (0.16.0)
orjson (3.9.1)
pickle5 (0.0.12)
pyodbc (4.0.39)
rlp (3.0.0)

Linked PRs

The text was updated successfully, but these errors were encountered:

scoder · 2023-10-21T04:47:41Z

Thanks for creating the issue. I agree that the functions should be added. The current replacements seem awful for this kind of basic functionality. Going through an expensive Python call like this for converting between PyLong and large C integer types (int128_t) seems excessive.

Note that at least a couple of projects that you list use Cython implemented parts and thus probably just mention the function in there. I'm sure something like line_profiler would never end up calling it.

serhiy-storchaka · 2023-10-21T07:55:59Z

It was already discussed several times. This API lacks some features which would be needed for general use. You need to know the size of the resulting bytes array. Calculating it is not trivial, especially for negative integers. Also, it would be core convenient to support "native" ending, not just "big"/"littel".

I have been thinking about implementing a similar API for mpz_import/mpz_export in GMP (https://gmplib.org/manual/Integer-Import-and-Export) or mp_unpack/mp_unpack in libtommath (https://github.com/libtom/libtommath). It is powerful and a kind of standard. It is general enough to support internal PyLong represenation and the marshal format used for long integers (15 bits packed in 16-bit words). But it is only for unsigned integers, you are supposed to store the sign separately. It should be extended to support negative integers in several formats, for convenience and for performance.

vstinner · 2023-10-21T08:35:48Z

I'm not sure that passing the endian as a string is efficient if this function is part of hot code.

serhiy-storchaka · 2023-10-21T09:03:26Z

Not as a string. Just 3-variant value native/little/big instead of boolean little/big.

vstinner · 2023-10-21T09:15:01Z

Not as a string. Just 3-variant value native/little/big instead of boolean little/big.

Sorry, I was confused between C API (int for the endian) and the Python API (string for the endian).

scoder · 2023-10-21T09:35:28Z

It was already discussed several times. This API lacks some features which would be needed for general use.

Ok, then please put the existing function back in until there is a public replacement.

scoder · 2023-10-21T12:30:00Z

I created PR #111162

vstinner · 2023-10-22T21:17:47Z

This API lacks some features which would be needed for general use.

Which features are missing?

Do you have links to previous discussions if it was discussed previously?

vstinner · 2023-10-22T21:23:09Z

You need to know the size of the resulting bytes array.

wcstombs() can be called with NULL buffer to compute the buffer size. It avoids to have to provide a second API "calculate the buffet size".

I suppose that a common use case is also to convert a Python int object to C int type for which there is no C API. Like int128_t

scoder · 2023-10-23T07:19:24Z

I suppose that a common use case is also to convert a Python int object to C int type for which there is no C API. Like int128_t

Or something like 256 bytes for a crypto key, hash value or similar data type. Probably a known size, so that users can live with an exception if it doesn't fit (because it's an error or can't-handle-that situation). That said, a function to ask for the bit length of the integer value could be helpful in order to find a suitable integer/storage size. And also more generally to estimate the size of an integer value. Both together would probably cover a lot of use cases.

serhiy-storchaka · 2023-10-23T07:31:05Z

https://mail.python.org/archives/list/python-ideas@python.org/thread/V2EKXMKSQV25BMRPMDH47IM2OYCLY2TF/

encukou · 2023-10-23T11:02:21Z

I'm partial to API like mpz_export that accepts a buffer and length, and can:

Successfully fill the buffer, and output the length
Not fill the buffer, but output the necessary length

This allows handling the common cases with a minimum of function calls:

If the size is already known, you only need one function call
If the caller can pre-allocate a buffer, and the value happens to fit, this is again one function call
If the size is unknown, and a pre-allocated buffer can't be used or turns out to be too small, it's two function calls

But, IMO we also need general API for exporting/importing Python objects to/from buffers (see e.g. #15 in capi-workgroup/problems), and it would be good to make this consistent.

I'd prefer adding the original functions back until we design a proper replacement.

scoder · 2023-10-24T09:45:17Z

mpz_export() looks like a good blue print. It's based on "word data", though, not bytes. I'm not sure if that design is something important to copy. There are certainly use cases for this (it resembles SIMD-like operations, for example). However, I'm not convinced that reordering bytes in native/little/big-endian words is an important feature for a PyLong C-API. Whoever needs this kind of special functionality can probably implement it in a separate pass on their side. A simple byte array export in (overall) native/little/big ordering seems sufficient to get the data in and out.

Note that it goes together with an mpz_sgn() function to query the sign since that is not part of the export. That's reasonable since the sign is unlikely to become part of the internal PyLong digits representation even in the future. Given that Py_SIZE() cannot be used for the sign detection any more, a stable function and a fast inline function would both be helpful for this.

So, basically, the proposal is to add

a function PyLong_AsByteArray() for the unsigned export, modelled after mpz_export
a function PyLong_FromByteArray() for the unsigned import, modelled after mpz_import
a function PyLong_Sign() to detect the sign as -1, 0, 1
an inline function PyUnstable_Long_Sign() to read the sign without checks and ABI compatibility guarantees
a function PyLong_BitLength() to count the number of bits used by the PyLong value

IMO we also need general API for exporting/importing Python objects to/from buffers, and it would be good to make this consistent.

It's not strictly related, though. I think a PyLong number is sufficiently different from an arbitrary Python object array to not require a highly similar interface. If it can be made similar, fine. I wouldn't hold up one for the other, though.

Regarding Serhiy's concerns about missing ABI-stability of enum flags and arguments: we've used C macro names for this for ages, and they turn into simple integers that can be passed as C int arguments. Just #define some names and people will use them.

* gh-106320: Re-add _PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement. See #111140 See #111139 * gh-111262: Re-add _PyDict_Pop() to have a C-API until a new public one is designed.

vstinner · 2023-10-27T14:57:01Z

See also comments about removed _PyLong_New(): #108604 (comment)

vstinner · 2023-11-15T23:11:26Z

_PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() functions were restored by commit a8a89fc.

zooba · 2024-02-01T01:08:56Z

Reopening, because I think at a minimum we should have the two functions mentioned in the title.

My proposed API (I have an implementation, but not PR ready yet, and one open question) is basically the one Petr liked but simpler:

/* PyLong_AsByteArray: Copy the integer value to a native address.
   n is the number of bytes available in the buffer.
   Uses the current build's default endianness, and sign extends the value
   to fit the buffer.

   Returns 0 on success or -1 with an exception set, but if the buffer is
   not big enough, returns the desired buffer size without setting an
   exception. Note that the desired size may be larger than strictly
   necessary to avoid precise calculations. */
PyAPI_FUNC(int) PyLong_AsByteArray(PyObject* v, void* buffer, size_t n);

/* PyLong_FromByteArray: Create an integer value containing the number from
   a native buffer.
   n is the number of bytes to read from the buffer.
   Uses the current build's default endianness, and assumes the value was
   sign extended to 'n' bytes.

   Returns the int object, or NULL with an exception set. */
PyAPI_FUNC(PyObject *v) PyLong_FromByteArray(void* buffer, size_t n);

I'm comfortable making these only do default endianness, because they're really intended as an extension of all the other int conversions we have that also do default endianness. Alternate endianness is a specialised formatting or bit packing operation.

The "designed size may be larger than strictly necessary" is to allow returning sizeof(Py_ssize_t) for a "compact" int, rather than calculating the exact number of bits, and potentially doing the same for a larger int if we come up with a faster calculation.

I envision the use here to be like this (note the EAFP):

int value;
int n = PyLong_AsByteArray(o, &value, sizeof(value));
if (n < 0) goto abort; // exception, nothing useful we can do
if (n == 0) {
    // use value
} else {
    // malloc some memory
    n = PyLong_AsByteArray(o, <new memory>, n);
    // use new memory
    free(new_memory);
}

Similarly for FromByteArray - that should work efficiently against pointers to locals, so you don't have to choose the right API name for the type.

However, the bit I'm wavering on is what to do with unsigned values with the MSB set. Right now, you need to allow an extra byte to "prove" that 2**64-1 is not -1, which is just a pain when you'd rather PyLong_AsByteArray(o, &uint64_value, sizeof(unsigned int64_t)). With C++ templates this wouldn't even come up, but I can't decide on my own between:

a second function (easiest when native type is known at compile time, which seems likely)
a parameter (easiest when native type is only known at runtime...)
always treat the value as signed (most like Python, but harder for a caller to code)
always treat as unsigned but add a "negative" out parameter (not sure how callers would deal with that?)

I don't think we have perf concerns at the point where this matters, as we're already at the extremes of a 64-bit integer (for most platforms). That's too big for a "compact" int, so we're on the slow path already. But I do want to get the usability right. I'm leaning towards a PyLong_AsUnsignedByteArray function that basically differs only in the sign extension part.

encukou · 2024-02-01T12:05:21Z

I'd prefer exposing both endiannness and signedness as arguments. As I see it, the functions should be intended for serialization too, not just for converting to native ints -- and in that case, it's best to be explicit.

Perhaps we should use named flags, like:

int n = PyLong_AsByteArray(o, &value, sizeof(value), Py_AS_NATIVE_ENDIAN | Py_AS_SIGNED);

zooba · 2024-02-01T16:29:37Z

As I see it, the functions should be intended for serialization too

All the scenarios I've seen have just been about converting to native ints (in contexts where serialization may happen next, but has to happen via a native int). Can you/anyone show me some where the caller doesn't want the int value, but just wants to store the bytes? (And doesn't want to/can't use the struct module, which is intended for this case.)

FWIW, non-default endianness is inevitably a slow path. We can make this very fast for normal sized, native endian values, which are the vast majority of cases, but forcing an endianness has to slow things down.

zooba · 2024-02-01T16:41:49Z

How about this as a proposed API:

PyAPI_FUNC(int) PyLong_AsByteArray(PyObject* v, void* buffer, size_t n);
PyAPI_FUNC(int) PyLong_AsUnsignedByteArray(PyObject* v, void* buffer, size_t n);
PyAPI_FUNC(int) PyLong_AsByteArrayWithOptions(PyObject* v, void* buffer, size_t n, int flags);

Where the first two are essentially exported aliases that make it easier to read/write code without having to remember/write a set of flags every time?

gpshead · 2024-03-01T17:48:21Z

I like the direction this is going, yes, that is the way I was hoping an Unsigned API variant would behave. I do think it is useful to have a way to return that the value was negative. Petr's char *sign_out idea makes sense to me there, always fill that in with 0 or -1 if it is non-NULL.

…ve the test (python#115380) This expands the examples to cover both realistic use cases for the API. I noticed thing in the test that could be done better so I added those as well: We need to guarantee that all bytes of the result are overwritten and that too many are not written. Tests now pre-fills the result with data in order to ensure that. Co-authored-by: Steve Dower <steve.dower@microsoft.com>

pythongh-111140

…endianness* (GH-116053)

…ve the test (python#115380) This expands the examples to cover both realistic use cases for the API. I noticed thing in the test that could be done better so I added those as well: We need to guarantee that all bytes of the result are overwritten and that too many are not written. Tests now pre-fills the result with data in order to ensure that. Co-authored-by: Steve Dower <steve.dower@microsoft.com>

…just *endianness* (pythonGH-116053)

See python/cpython#111140

See python/cpython#111140 Also clean up and simplify the fallback implementation, fixing some reference leaks along the way.

scoder · 2024-05-05T05:35:02Z

The interface seems complete and usable now. Is this done now or is there anything left for this ticket to stay open?

gpshead · 2024-05-05T21:34:54Z

Looking things over I like the C API that what was settled upon. It seems to address all of the needs from our earlier discussions.

)

…python#118612)

…_PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement. See python/cpython#111140 See python/cpython#111139

…e function signature of the `_PyLong_AsByteArray` API See python/cpython#111140 and capi-workgroup/decisions#31

…#111162) * pythongh-106320: Re-add _PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement. See python#111140 See python#111139 * pythongh-111262: Re-add _PyDict_Pop() to have a C-API until a new public one is designed.

vstinner added type-feature A feature request or enhancement topic-C-API labels Oct 20, 2023

vstinner changed the title ~~C API: Consider adding a public PyLong_AsByteArray() and PyLong_FromByteArray() functions~~ C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions Oct 20, 2023

vstinner mentioned this issue Oct 20, 2023

gh-106320: Remove private PyLong C API functions #108429

Merged

scoder mentioned this issue Oct 21, 2023

gh-106320: Re-add some PyLong/PyDict C-API functions #111162

Merged

vstinner mentioned this issue Oct 27, 2023

gh-106320: Remove private _PyLong_New() function #108604

Merged

vstinner mentioned this issue Oct 30, 2023

[C API] Meta issue: add new public functions with doc+tests to replace removed private functions #111481

Closed

vstinner closed this as completed Dec 20, 2023

vstinner mentioned this issue Jan 31, 2024

Deprecate / remove _Py internal APIs from pyo3-ffi PyO3/pyo3#3762

Open

zooba reopened this Feb 1, 2024

davisagli mentioned this issue Mar 14, 2024

Preliminary support for Python 3.13a5 zopefoundation/zodbpickle#83

Merged

zooba added a commit to zooba/cpython that referenced this issue Mar 19, 2024

Merge remote-tracking branch 'upstream/main' into pythongh-111140

fbe42e7

zooba added a commit to zooba/cpython that referenced this issue Mar 28, 2024

Merge branch 'main' into pythongh-111140

d3deebf

zooba added a commit to zooba/cpython that referenced this issue Apr 4, 2024

Merge branch 'pythongh-111140' of https://github.com/zooba/cpython into

38ecfe2

pythongh-111140

encukou pushed a commit that referenced this issue Apr 5, 2024

gh-111140: PyLong_From/AsNativeBytes: Take *flags* rather than just *…

6876168

…endianness* (GH-116053)

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024

pythongh-111140: PyLong_From/AsNativeBytes: Take *flags* rather than …

29db213

…just *endianness* (pythonGH-116053)

v-chojas mentioned this issue Apr 18, 2024

PyToCType uses internal function removed from Python 3.13 mkleehammer/pyodbc#1344

Closed

scoder added a commit to scoder/cython that referenced this issue May 3, 2024

Use the large PyLong conversion functions in Py3.13.

eb68bce

See python/cpython#111140

scoder added a commit to cython/cython that referenced this issue May 4, 2024

Use the new large PyLong conversion functions in Py3.13. (GH-5997)

dc63743

See python/cpython#111140 Also clean up and simplify the fallback implementation, fixing some reference leaks along the way.

scoder added a commit to cython/cython that referenced this issue May 4, 2024

Use the new large PyLong conversion functions in Py3.13. (GH-5997)

cff39ef

See python/cpython#111140 Also clean up and simplify the fallback implementation, fixing some reference leaks along the way.

gpshead added a commit to gpshead/cpython that referenced this issue May 5, 2024

pythongh-111140: minor docs typos cleanup in the C example API calls.

d8cfb91

bedevere-app bot mentioned this issue May 5, 2024

gh-111140: minor docs typos cleanup in the C example API calls. #118612

Merged

gpshead closed this as completed May 5, 2024

gpshead added the 3.13 bugs and security fixes label May 5, 2024

gpshead added a commit that referenced this issue May 5, 2024

gh-111140: minor docs typos cleanup in the C example API calls. (#118612

b744fa5

)

SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024

pythongh-111140: minor docs typos cleanup in the C example API calls. (…

1eb0a54

…python#118612)

Dunedan mentioned this issue Jun 5, 2024

Build not working with Python 3.13.0b1 pytries/marisa-trie#104

Closed

Xmader added a commit to Distributive-Network/PythonMonkey that referenced this issue Aug 22, 2024

fix: Python 3.13 added the PyLong_AsNativeBytes API, but changed th…

428353c

…e function signature of the `_PyLong_AsByteArray` API See python/cpython#111140 and capi-workgroup/decisions#31

Xmader mentioned this issue Sep 20, 2024

python 3.13 support Distributive-Network/PythonMonkey#442

Merged

Xmader mentioned this issue Sep 27, 2024

Fails to build on Python 3.13: _PyLong_AsByteArray is a private API googlefonts/compreffor#156

Open

vharitonsky mentioned this issue Oct 4, 2024

New release needed to support Python3.13 snowballstem/pystemmer#47

Closed

odrling mentioned this issue Oct 6, 2024

Fails to build for Python 3.13 edgedb/edgedb-python#525

Closed

skirpichev mentioned this issue Nov 19, 2024

Vote on PEP 757 – C API to import-export Python integers capi-workgroup/decisions#45

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

vstinner commented Oct 20, 2023 •

edited by bedevere-app bot

Loading

scoder commented Oct 21, 2023

serhiy-storchaka commented Oct 21, 2023

vstinner commented Oct 21, 2023

serhiy-storchaka commented Oct 21, 2023

vstinner commented Oct 21, 2023

scoder commented Oct 21, 2023

scoder commented Oct 21, 2023

vstinner commented Oct 22, 2023

vstinner commented Oct 22, 2023

scoder commented Oct 23, 2023 via email

serhiy-storchaka commented Oct 23, 2023

encukou commented Oct 23, 2023

scoder commented Oct 24, 2023

vstinner commented Oct 27, 2023

vstinner commented Nov 15, 2023

zooba commented Feb 1, 2024

encukou commented Feb 1, 2024

zooba commented Feb 1, 2024 •

edited

Loading

zooba commented Feb 1, 2024

gpshead commented Mar 1, 2024

scoder commented May 5, 2024

gpshead commented May 5, 2024

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

Comments

vstinner commented Oct 20, 2023 • edited by bedevere-app bot Loading

Feature or enhancement

Linked PRs

scoder commented Oct 21, 2023

serhiy-storchaka commented Oct 21, 2023

vstinner commented Oct 21, 2023

serhiy-storchaka commented Oct 21, 2023

vstinner commented Oct 21, 2023

scoder commented Oct 21, 2023

scoder commented Oct 21, 2023

vstinner commented Oct 22, 2023

vstinner commented Oct 22, 2023

scoder commented Oct 23, 2023 via email

serhiy-storchaka commented Oct 23, 2023

encukou commented Oct 23, 2023

scoder commented Oct 24, 2023

vstinner commented Oct 27, 2023

vstinner commented Nov 15, 2023

zooba commented Feb 1, 2024

encukou commented Feb 1, 2024

zooba commented Feb 1, 2024 • edited Loading

zooba commented Feb 1, 2024

gpshead commented Mar 1, 2024

scoder commented May 5, 2024

gpshead commented May 5, 2024

vstinner commented Oct 20, 2023 •

edited by bedevere-app bot

Loading

zooba commented Feb 1, 2024 •

edited

Loading