Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-97016: Convert PyBytes_AS_STRING() to function #97017

Closed
wants to merge 1 commit into from
Closed

gh-97016: Convert PyBytes_AS_STRING() to function #97017

wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 22, 2022

Convert the following static inline functions to regular functions:

  • PyByteArray_AS_STRING()
  • PyByteArray_GET_SIZE()
  • PyBytes_AS_STRING()
  • PyBytes_GET_SIZE()

Remove the _PyByteArray_empty_string variable. It was excluded from the limited C API.

In bytesobject.c and bytearrayobject.c, add static inline functions and use them for best performance:

  • _PyByteArray_AS_STRING()
  • _PyByteArray_GET_SIZE()
  • _PyBytes_AS_STRING()
  • _PyBytes_GET_SIZE()

Convert the following static inline functions to regular functions:

* PyByteArray_AS_STRING()
* PyByteArray_GET_SIZE()
* PyBytes_AS_STRING()
* PyBytes_GET_SIZE()

Remove the _PyByteArray_empty_string variable. It was excluded from
the limited C API.

In bytesobject.c and bytearrayobject.c, add static inline functions
and use them for best performance:

* _PyByteArray_AS_STRING()
* _PyByteArray_GET_SIZE()
* _PyBytes_AS_STRING()
* _PyBytes_GET_SIZE()
@vstinner
Copy link
Member Author

@erlend-aasland: Would you mind to review this PR?

With this change, _Py_CAST() is still used by Include/cpython/bytesobject.h because of macros casting PyBytes_AS_STRING() and PyBytes_GET_SIZE() arguments to PyObject* using _PyObject_CAST(). But at least, at the ABI level, there are now regular function calls, the implementation details is no longer leaked.

@vstinner
Copy link
Member Author

Micro-benchmark on PyBytes_AS_STRING() and PyBytes_GET_SIZE() function calls to measure the worst case: when the hot code is just these two function calls.

Mean +- std dev: [ref] 21.6 ns +- 2.3 ns -> [regular_func] 27.0 ns +- 0.5 ns: 1.25x slower

In average, converting PyBytes_AS_STRING() and PyBytes_GET_SIZE() static inline functions to regular functions makes them 1.25x slower, it adds +2.7 ns per function call (+5.4 ns for the two function calls). Well, currently PyBytes_AS_STRING() just reads a structure member, whereas now it's a function call. Obviously, a function call is slower than reading directly memory.

This worst case is when PyBytes_AS_STRING() and PyBytes_GET_SIZE() cannot be inlined by LTO: I added the benchmark to _testcapi which is built as a shared library, so they are regular function calls (cannot be inlined by LTO).

Benchmark run on Python built with LTO (without PGO) with CPU pinning.

In short, my micro-benchmark loops on this code:

        PyObject *bytes = PyBytes_FromString("abc");
        char *s = PyBytes_AS_STRING(bytes);
        Py_ssize_t len = PyBytes_GET_SIZE(bytes);
        Py_DECREF(bytes);

@vstinner
Copy link
Member Author

I compared this PR to its parent commit.

Commands:

./configure --with-lto && make
# at PR commit
rm -rf env; make clean; make && ./python -m venv env && env/bin/python -m pip install pyperf && env/bin/python bench.py -v -o regular_func.json

# get the parent commit
git checkout HEAD^

# at parent commit
rm -rf env; make clean; make && ./python -m venv env && env/bin/python -m pip install pyperf && env/bin/python bench.py -v -o ref.json

# compare
python -m pyperf compare_to ref.json regular_func.json

Linux booted with isolcpus=5,11 rcu_nocbs=5,11 command line options and I ran sudo python3 -m pyperf system tune.

@vstinner
Copy link
Member Author

In average, converting PyBytes_AS_STRING() and PyBytes_GET_SIZE() static inline functions to regular functions makes them 1.25x slower, it adds +2.7 ns per function call (+5.4 ns for the two function calls).

I don't expect these functions to be called in hot code. Usually, you get the pointer, and then work on the pointer in a loop. I expect the overhead to not be significant in practice.

@vstinner
Copy link
Member Author

An alternative is to get rid of the static inline implementation and creates aliases to existing functions:

 #define PyByteArray_AS_STRING PyByteArray_AsString
 #define PyByteArray_GET_SIZE PyByteArray_GetSize
 #define PyBytes_AS_STRING PyBytes_AsString
 #define PyBytes_GET_SIZE PyBytes_GetSize

These functions check the type of their argument, whereas the current static inline functions don't. Example:

char *
PyBytes_AsString(PyObject *op)
{
    if (!PyBytes_Check(op)) {
        PyErr_Format(PyExc_TypeError,
             "expected bytes, %.200s found", Py_TYPE(op)->tp_name);
        return NULL;
    }
    return ((PyBytesObject *)op)->ob_sval;
}

@vstinner vstinner marked this pull request as draft September 26, 2022 16:08
@vstinner
Copy link
Member Author

This change is somehow controversial, so I prefer to mark is as a draft for now :-)

@vstinner
Copy link
Member Author

vstinner commented Nov 2, 2022

The benefits of this change are not obvious since Python ABI is still unstable, whereas this change introduces a performance overflow. Moreover, there are already these functions:

  • PyByteArray_AsString()
  • PyByteArray_Size()
  • PyBytes_AsString()
  • PyBytes_Size()

@vstinner vstinner closed this Nov 2, 2022
@vstinner vstinner deleted the bytes_func branch November 2, 2022 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants