-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Py_INCREF() and PyObject_INIT() to inlined functions #79240
Comments
CPython has been created in 1990. In 1990, it made sense to use C macros. But nowadays, inlined functions can be used instead: "Python versions greater than or equal to 3.6 use C89 with several select C99 features: (...) static inline functions" I propose to convert 4 macros to inlined functions:
Advantages:
Drawbacks:
The two attached PRs implements these changes. |
Context: I was unhappy with _Py_NewReference() macro implementation when I had to modify it to fix a bug, bpo-35053. That's why I would like to convert it to a static inline function. |
Does this slow down debug builds at all? It probably will not end will if Py_INCREF is ever not inlined. |
I tested PR 10079 using gdb on Fedora 28 with GCC 8.1.1 to check if Py_INCREF/Py_DECREF functions are inlined. I understand that "static inline void Py_INCREF()" is *not* inline by gcc -O0, but it *is* inlined using gcc -Og which is the *default* optimization level of ./configure --with-debug. To develop on Python, I force -O0, because the compilation time matters more than runtime performance for me :-) Compilation on my laptop using MAKEFLAGS=-j9:
... But Py_INCREF/DECREF are always inlined, even with -O0, when using __attribute__((always_inline))! I will work on a change to use that. == gcc -O0 == (gdb) disassemble Py_IncRef (gdb) disassemble Py_DecRef == gcc -Og == (gdb) disassemble Py_IncRef (gdb) disassemble Py_DecRef |
I modified PR 10079 to add a Py_STATIC_INLINE(TYPE) macro:
Tests on Linux, example: "./configure --with-pydebug CC=clang CFLAGS="-O0" && make clean && make platform"
Test done on Fedora 28 with GCC 8.1.1 and clang 6.0.1. |
On Windows, a Debug build doesn't inline Py_INCREF/DECREF even if it uses __forceinline. I looked at the Py_IncRef() and Py_DecRef() assembly in Visual Studio using a breakpoint. Using /Ob1, Py_INCREF/DECREF are inlined as expected. I set this option in the pythoncore project. Do you think that I should modify the 38 other projects of the pcbuild solution to enable /Ob1 in debug build? Documentations. Inline Functions (C++): -Od: disable optimization ("d" stands for Debug) /Ob (Inline Function Expansion): |
Ok, I confirm that Py_XINCREF() is properly inlined in Py_IncRef() with the latest version of my PR 10079. I tested:
|
Is it guarantied that static inline functions will be inlined and will be not called as external functions from the Python library? The latter would break binary compatibility of extensions compiled with newer Python headers with older binary Python libraries. |
First, I'm not sure that the "stable ABI" really works in practice, there are many issues: Converting the macro to a static inline function is a minor step in the direction of a more stable API and ABI. To come back to your question: depending on the compiler and compiler options, Py_INCREF/DECREF() can generate a *function call*. I tuned Py_STATIC_INLINE() to always inline Py_INCREF/DECREF() for GCC, clang and MSVC which are the major compilers used by Python on Windows, Linux, macOS and FreeBSD. "static inline" is still something new to me. Maybe someone will come with a compiler with which it doesn't work as expected. IMHO we have to go through these issues, and it's only be testing for real that we will see these issues. I mean, we *have to* get ride of these ugly macros used in Python header files. They are causing subtle bugs and are hard to maintain. See my first message for advantages of static inline functions. |
Ok, I made more checks. In short, PR 10079 has no impact on the ABI compatibility. I modified Py_STATIC_INLINE() to remove __attribute__((always_inline)). In this case, the ./python binary contains multiple "instances" (what's the correct name for that?) of the "_Py_INCREF" inline function: These functions are *LOCAL*, I understand that they are not exported. I also checked the _struct module: vstinner@apu$ readelf -sW build/lib.linux-x86_64-3.8-pydebug/_struct.cpython-38dm-x86_64-linux-gnu.so | c++filt -t |grep -E '(INC|DEC)REF' Again, these functions are "LOCAL", not imported nor exported. I undertand that _struct.so doesn't depend on libpython for INCREF/DECREF: it contains its own "implementation". |
Interesting article on inline/static inline and symbols: |
Why do we need this Py_STATIC_INLINE macro? If you want to have one for enabling always inline, that's fine with me, since it's compiler-specific. But the current Py_STATIC_INLINE macro seems to conflate linkage with always-inline behavior. |
Benjamin:
For about the name, there is already Py_LOCAL_INLINE which also uses "static inline", but has a very different usage: "Py_LOCAL can be used instead of static to get the fastest possible calling convention for functions that are local to a given module." So I chise "Py_STATIC_INLINE" name, it describes the basic implementation ("static inline TYPE"). Py_STATIC_INLINE() is designed to replace a preprocessor macro with a function, when you care that the code is "always inlined". (Maybe the name is not perfect ;-)) Honestly, I'm not sure that it really matters that the function is "always" inlined. The main issue is that "static" requires to duplicate the function in each file which uses the macro, when the function cannot/is not inlined. Previously, you asked me:
That's why I wrote Py_STATIC_INLINE() to "force" inlining and PR 10094 to enable inlining for Debug build on MSVC.
I'm not sure that I get it. Do you talk about "static" inside the macro? bpo-33407 modifies Py_DEPRECATED() to support Visual Stuido, but it requires to modify how the macro is used: it now must be used at the start, rather than at the end: I chose to put "static" and "inline" in the same macro. We already have many other similar macros like PyAPI_FUNC(TYPE) and Py_LOCAL(TYPE). I would like to have a common way to "declare a function behaving as a macro". Please see also the discussion on the PR itself, Neil discuss what's the best way to declare an inline function: By the way, was it you who required "static inline" support in PEP-7? :-) |
Too bad: this change broke the compilation on AMD64 Windows7 SP1 3.x, AMD64 Windows8 3.x and AMD64 Windows10 3.x: I don't understand why: it works well on my Windows 10 VM with my up to date Visual Studio Community 2017 (version 15.8.8). The compilation also worked on AppVeyor. For me, the only explanation is that buildbots use older compiler versions. Note: I checked that _decimal is compiled on my VM. |
I checked buildbots: my PR 10128 fixed all Windows buildbots, good. |
See also bpo-35081: "Rename Include/internals/ to Include/pycore/" which is linked to this issue. |
On Fri, Oct 26, 2018, at 03:18, STINNER Victor wrote:
I would like to see Py_LOCAL_INLINE removed, too, fwiw.
"always inline" is different from "static inline". So, it's not appropriate to make a macro named the latter that has the former former.
We don't want functions that behave like macros... otherwise, we would be writing macros. If we want a function to always be inlined, we should explicitly request that from the compiler.
Yes, which is why we shouldn't need a macro to write it. |
Oh. Why? Do you want to directly use "static" and "static inline"? I guess that Py_LOCAL and Py_LOCAL_INLINE have been added to use __fastcall with MSVC. ... __fastcall is mostly interesting in x86 (32-bit), but x86-64 calling convention is "fast" by default no? __fastcall pass the first two arguments in registers, but x86-64 already pass the first four arguments in registers... Do you mean that __fastcall is no longer revelant? |
Oh ok. So if we decide to keep it, it should be renamed to Py_STATIC_ALMOST_ALWAYS_INLINE() or something like that :-)
Oh. It seems like I misunderstood you. I understood that you required to have zero impact on performance on debug build. *I* want to use functions because it's the regular C language: regular scope rules, no preprocessor magic, the compiler detects errors if the function is misused, etc. But I'm not sure about the drawbacks of converting a macro to a function. I don't want to be the only one responsible to regressions :-) If you support the change, we can drop "__attribute__((always_inline))" and use "regular" "static inline" functions :-)
Ok ok. I updated my PR 10079 to simply use "static inline". |
If my PR 10079 is merged without Py_STATIC_INLINE(), I will remove the macro. I'm not sure if we need an "always inline" macro or not. Note: I'm in favor of moving closer to the C language and not abusing __attribute__(...) :-) |
Here you have Benjamin, it's gone :-) |
TODO: Convert _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() macros to static inline functions, but there are inter-dependencies issues: see bpo-35081 and |
I am collecting information on how different compilers behave with the ARM GCC will report back when I have enough information on these. |
See also bpo-35230: "Remove _Py_REF_DEBUG_COMMA". |
I wrote a pull request to replace static inline functions with C macros: PR 10669. I ran a benchmark on speed.python.org server using the "performance" benchmark suite: I understand that from the benchmark results that converting macros to static inline functions has no significant impact on performances. Two benchmarks are 1.06x and 1.07x faster but it can be explained by the PGO compilation which is not reliable. One benchmark is way slower, but it seems like the benchmark has an issue. If you look at the 3 latest run on speed.python.org, I see:
I don't think that any change in _pickle or pickle explains this significant slowdown. IMHO it's just that the benchmark is not reliable :-/ We have a performance timeline on the last 5 years, and this benchmark doesn't have a straight line, we can see that the result is a little bit random :-/ -- speed.python.org runs Ubuntu 16.04 with gcc 5.4.0. The result are the two attached (compressed) JSON files:
Comparison ignoring difference smaller than -5% and +5%, macros are the reference: vstinner@apu$ python3 -m perf compare_to --table -G --min-speed=5 macros.json.gz inline.json.gz Not significant (54): (...) Raw comparison, full data, macros are the reference: vstinner@apu$ python3 -m perf compare_to --table -G macros.json.gz inline.json.gz Not significant (16): regex_effbot; go; json_loads; xml_etree_parse; dulwich_log; nbody; regex_compile; scimark_fft; pickle_pure_python; unpickle; pickle_list; sqlite_synth; richards; logging_silent; deltablue; sqlalchemy_imperative |
I ran a coarse benchmark on the debug mode using PR 10669: I ran the full Python test suite. => I don't see any significant impact on performance. git clean -fdx # remove *all* untracked files Result:
I used my laptop to run the benchmark: Fedora 29 with GCC 8.2.1. I was still using the laptop to run other tasks. |
Unexpected cool effect of static inline functions (copy of my email): Le jeu. 15 nov. 2018 à 01:06, Gregory P. Smith <greg at krypto.org> a écrit :
Oh. That's very interesting. I just tried gdb and I confirm that gdb understands well inlined I also tried perf record/perf report: if I annotate a function That's nice! Victor |
Stefan Behnel wrote: """ I ran a benchmark on the compilation time using PR 10669: there is no significant slowdown (+4 seconds, 6% slower, in the worst case). I ran a benchmark on my laptop:
Result in release mode:
Result in debug mode:
I only used "real" time (I ignored user and sys times). |
"PyObject* PyObject_INIT(PyObject *op, PyTypeObject *typeobj)" Don't cast their argument to PyObject*/PyVarObject* which can introduce new warnings when upgrading from Python 3.7 to Python 3.8. |
I ran my benchmarks, I close the PR 10669 (replace static inline functions with macros). You should still be able to use it as patch: |
I worked around this issue by adding a macro to cast the argument and declare the static inline function with a different name. Example: static inline void _Py_INCREF(PyObject *op)
{
_Py_INC_REFTOTAL;
op->ob_refcnt++;
}
#define Py_INCREF(op) _Py_INCREF(_PyObject_CAST(op)) |
I close the issue, it seems like all subtasks have been completed. Summary of the issue:
-- Benjamin Peterson would "like to see Py_LOCAL_INLINE removed, too, fwiw", but it's part of the public C API. It would require a deprecation period. I'm not interested to touch the public C API. Benjamin: please open a new issue if you still want to remove it :-) |
Advantages of inline functions over C macros: https://mail.python.org/pipermail/python-dev/2018-November/155805.html There are multiple advantages:
Are you aware that Python had macros like: #define _Py_REF_DEBUG_COMMA ,
#define _Py_CHECK_REFCNT(OP) /* a semicolon */; I let you judge the quality of this macro: #define _Py_NewReference(op) ( \
_Py_INC_TPALLOCS(op) _Py_COUNT_ALLOCS_COMMA \
_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \
Py_REFCNT(op) = 1) Is it an expression? Can it be used in "if (test) (*) Py_INCREF doc: |
Could we have used function overloading to handle the different types? Rather than reintroducing the macro for the sake of the cast? |
Sorry, I don't know what is function overloading. Is it a C++ thing? At least, this issue shouldn't make the situation worse :-) Please open a new issue if you have a solution for this problem. I am now |
Ah, you're right, it's only C++. My bad. |
See bpo-45094: "Consider using __forceinline and __attribute__((always_inline)) on static inline functions (Py_INCREF, Py_TYPE) for debug builds". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: