-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup object.h header #83723
Comments
In bpo-39489, I removed the COUNT_ALLOCS special build. The object.h header can now be cleaned up to simplify the code. |
Ok, object.h looks much better now! I close the issue. |
PyTuple_Check() got slower across the board. This is problematic because the principal use case for PyTuple_Check() is as a guard for various fast paths. The direct cause of the degradation is that the inlining of PyType_Check() didn't go so well — commit 509dd90. There are two issues. First, we lost the fast path for an exact type match that was present in the 3.8 version of PyType_Check(). Second, functions like PyType_Check() cannot be fully inlined if they call other non-inlined functions like PyType_GetFlags(). The original unreviewed commit doesn't revert cleanly because subsequent changes were made. Instead, I suggest:
That would fix the performance regression while still treating type objects as opaque. ------- Incomplete inlines ---------------------------------------------- static inline int
PyType_HasFeature(PyTypeObject *type, unsigned long feature) {
return ((PyType_GetFlags(type) & feature) != 0);
}
^---- Non-static function cannot be inlined
#define PyType_FastSubclass(type, flag) PyType_HasFeature(type, flag)
static inline int _PyType_Check(PyObject *op) {
return PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_TYPE_SUBCLASS);
}
#define PyType_Check(op) _PyType_Check(_PyObject_CAST(op)) ------- Old Type Check Code --------------------------------------------- #define PyObject_TypeCheck(ob, tp) \
(Py_TYPE(ob) == (tp) || PyType_IsSubtype(Py_TYPE(ob), (tp)))
^--- Fast path for exact match is now gone ------- Non-static function cannot be inlined -------------------------- unsigned long
PyType_GetFlags(PyTypeObject *type)
{
return type->tp_flags;
} |
Christian Heimes has expressed an interest (quite rudely) in unreviewed commits elsewhere. I wonder if that applies to everyone regardless of employer. |
Ah, I see that the subject has already been answered in the negative: https://discuss.python.org/t/make-code-review-mandatory/4174 He must have forgotten. |
Let's just focus on getting it fixed before 3.9 goes out. |
Stefan, stop it! I feel offended and harassed by you. Do not ever talk to me in public, private, or even insinuate to reference to my person in any public conversation. |
I have no opinion about this commit (I did not benchmark anything, I do have the opinion that unreviewed commits occur quite frequently They are not inherently bad, so I hope we can focus on the outcome. |
Good catch, Raymond. I'll mark this as a release blocker to ensure I don't miss it. The last beta is scheduled for July 20th. Is this enough time for you? I would very much like to avoid refactors of any sort during the release candidate stage. |
Raymond:
Raymond gave more context in this email: As far as I know, only macOS is impacted by this performance regression (119 nsec => 152 nsec on a microbenchmark). On Windows and Linux, Python is built with LTO and PGO optimizations. For example, the Python package provided by Fedora 32 is built with GCC 10 using LTO and PGO. Using GCC 10 and LTO, the PyTuple_Check() code is inlined even if PyType_GetFlags() is function call in PyType_HasFeature(). I didn't know that on macOS, LTO was not used. I created bpo-41181 to request to enable LTO on the macOS installer builder, but Ned Deily rejected the issue (read the issue for the rationale). I dislike comparing performances when LTO and PGO optimizations are not used. LTO enables tons of optimizations and PGO makes benchmark results more reproducible. One idea is to stop running benchmarks on macOS, and at least only run benchmarks on binaries built with LTO and PGO. -- Raymond:
I'm not sure the performance changed with this commit. IMHO it's more the commit 45ec5b9: bpo-40170 and #63577. Background behind this change: see PEP-620 for the overall plan, see bpo-39573 and bpo-40170 to make PyObject and PyTypeObject structures opaque. The idea is to slowly move third party extension modules towards the stable ABI (PEP-384). Internally, CPython continues to access directly structure members.
#63577 was reviewed and approved by another core dev. The impact of performance of these changes was taken in account, and that's why it was decided to add a new internal _PyType_HasFeature() function. It wasn't noticed that PyTuple_Check() calls the public PyType_HasFeature() instead of the internal _PyType_HasFeature(). As I wrote, using LTO, the code is inlined anyway, so it's the same thing. |
Stefan Krah:
I don't understand the relationship between this issue, Christian Heimes and making reviews mandatory or not. As I wrote, the commit 45ec5b9 was reviewed and approved by another core dev. Stefan, please stop personal attack and stay at the technical level. |
Here are two timings for math.dist(). They were run with the production macOS builds from python.org: $ python3.8 -m timeit -r 11 -s 'from math import dist' -s 'p=(1.1, 2.2); q=(1.7, 2.3)' 'dist(p, q)'
5000000 loops, best of 11: 58.4 nsec per loop
$ python3.9 -m timeit -r 11 -s 'from math import dist' -s 'p=(1.1, 2.2); q=(1.7, 2.3)' 'dist(p, q)'
5000000 loops, best of 11: 66.9 nsec per loop The attached screen shot shows that the only change between the two versions is that the subclass check is inlined and fast in 3.8, but is an external function call in 3.9. ---- 3.8 subclass check ----------- ---- 3.9 subclass check -----------
The C code for math.dist() is unchanged between 3.8 and 3.9. Both use PyTuple_Check(). This isn't unique. Every single PyTuple_Check() is the similarly affected (approx. 225 occurrences). Presumably, this affects other type checks as well. |
Serhiy, do you have any insights to offer? |
A brief note for Victor that *nothing* was directed against him. On
This one I have some trouble with. It cites PyPy as a bottleneck, but It is also a bit unusual that the PEP is in draft status but a lot |
s/PyPy as a bottleneck/the C-API as a bottle neck for PyPy/ |
Victor, is there any reason PyType_GetFlags() can't be converted to a macro or an inlined function? That seems like a simple and robust fix that won't get in the way of anything else you're doing. We shouldn't disregard macOS timings. It is an important platform —dominant in data science, dominant among most of my clients, and dominant among the core developers at the last sprint. None of the inlining should depend on LTO and PGO. That is fragile and will create hard-to-explain variations between builds (even consecutive builds on the same platform). Third-party extensions, SO and DLL files won't benefit from that. Also, it makes it difficult to individually optimize and analyze a function without a costly rebuild at every step. |
I do not know the purpose of this issue. The new code does not look cleaner to me, but maybe it is only for me. But performance regressions should be fixed. We cannot rely on compiler-specific global optimization, which is hard to test and does not work for dynamic linkage (third-party extensions and applications with built-in Python). Even "inline" is just a hint to the compiler and does not guarantee inlining the code. Raymond's suggestions look reasonable to me. |
Since the PEP-620 is still a draft, I created PR 21390 to fix the performance issue. As I wrote, I didn't expect any impact on performances since I added _PyType_HasFeature() which access directly the structure member. But I was wrong. This change is not strictly required by my work for now, so I just revert it until the PEP is accepted. If I reapply the change later, I will take care of ensuring that it's properly optimized in CPython internals (access the member, don't call a function), especially on macOS. If you want to continue the discussion, please discuss on bpo-40170, since this issue is unrelated. Stefan Krah:
Raymond:
Serhiy Storchaka:
I explained the rationale in bpo-40170 and PEP-620. It's not a "cleanup change". Serhiy: we are discussing the commit 45ec5b9 of bpo-40170. This discussion is happening in the wrong bpo which seems to confuse people. So I close again this issue which is fixed. |
Raymond, did you replace the screenshot with a later one showing that things are fixed now? The timestamp suggests it went up at the same time as your comment, but what I see in the .png file is that the two are identical other than addresses. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: