Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup object.h header #83723

Closed
vstinner opened this issue Feb 3, 2020 · 28 comments
Closed

Cleanup object.h header #83723

vstinner opened this issue Feb 3, 2020 · 28 comments
Labels
3.9 only security fixes topic-C-API

Comments

@vstinner
Copy link
Member

vstinner commented Feb 3, 2020

BPO 39542
Nosy @rhettinger, @pitrou, @vstinner, @encukou, @skrah, @ambv, @JimJJewett, @serhiy-storchaka
PRs
  • bpo-39542: Move object.h debug functions to internal C API #18331
  • bpo-39542: Simplify _Py_NewReference() #18332
  • bpo-39542: Make _Py_NewReference() opaque in C API #18346
  • bpo-39542: Exclude trashcan from the limited C API #18362
  • bpo-39542: Make PyObject_INIT() opaque in limited C API #18363
  • bpo-39542: Convert PyType_Check() to static inline function #18364
  • bpo-39542: Define PyTypeObject earlier in object.h #18366
  • bpo-39542: Declare _Py_AddToAllObjects() in pycore_object.h #18368
  • bpo-39542: Document limited C API changes #18378
  • Revert "bpo-39542: Define PyTypeObject earlier in object.h (GH-18366)" #18384
  • Files
  • Screen Shot 2020-07-07 at 10.40.52 AM.png: Disassembly of math.dist() in 3.8 vs 3.9
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-07-08.08:30:06.202>
    created_at = <Date 2020-02-03.15:42:01.002>
    labels = ['expert-C-API', '3.9']
    title = 'Cleanup object.h header'
    updated_at = <Date 2020-07-11.17:52:28.975>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-07-11.17:52:28.975>
    actor = 'Jim.Jewett'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-07-08.08:30:06.202>
    closer = 'vstinner'
    components = ['C API']
    creation = <Date 2020-02-03.15:42:01.002>
    creator = 'vstinner'
    dependencies = []
    files = ['49303']
    hgrepos = []
    issue_num = 39542
    keywords = ['patch']
    message_count = 28.0
    messages = ['361305', '361307', '361311', '361384', '361422', '361425', '361429', '361431', '361432', '361442', '361505', '372962', '372983', '372984', '372993', '372994', '372995', '373028', '373230', '373231', '373234', '373235', '373236', '373237', '373239', '373282', '373288', '373530']
    nosy_count = 8.0
    nosy_names = ['rhettinger', 'pitrou', 'vstinner', 'petr.viktorin', 'skrah', 'lukasz.langa', 'Jim.Jewett', 'serhiy.storchaka']
    pr_nums = ['18331', '18332', '18346', '18362', '18363', '18364', '18366', '18368', '18378', '18384']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue39542'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 3, 2020

    In bpo-39489, I removed the COUNT_ALLOCS special build. The object.h header can now be cleaned up to simplify the code.

    @vstinner vstinner added 3.9 only security fixes topic-C-API labels Feb 3, 2020
    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 3, 2020

    New changeset 4b52416 by Victor Stinner in branch 'master':
    bpo-39542: Move object.h debug functions to internal C API (GH-18331)
    4b52416

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 3, 2020

    New changeset 49932fe by Victor Stinner in branch 'master':
    bpo-39542: Simplify _Py_NewReference() (GH-18332)
    49932fe

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2020

    New changeset 40e547d by Victor Stinner in branch 'master':
    bpo-39542: Make _Py_NewReference() opaque in C API (GH-18346)
    40e547d

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2020

    New changeset 0fa4f43 by Victor Stinner in branch 'master':
    bpo-39542: Exclude trashcan from the limited C API (GH-18362)
    0fa4f43

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2020

    New changeset f58bd7c by Victor Stinner in branch 'master':
    bpo-39542: Make PyObject_INIT() opaque in limited C API (GH-18363)
    f58bd7c

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2020

    New changeset 509dd90 by Victor Stinner in branch 'master':
    bpo-39542: Convert PyType_Check() to static inline function (GH-18364)
    509dd90

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2020

    New changeset 0e4e735 by Victor Stinner in branch 'master':
    bpo-39542: Define PyTypeObject earlier in object.h (GH-18366)
    0e4e735

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2020

    Ok, object.h looks much better now! I close the issue.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2020

    New changeset 58f4e1a by Victor Stinner in branch 'master':
    bpo-39542: Declare _Py_AddToAllObjects() in pycore_object.h (GH-18368)
    58f4e1a

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2020

    New changeset 2844336 by Victor Stinner in branch 'master':
    bpo-39542: Document limited C API changes (GH-18378)
    2844336

    @rhettinger
    Copy link
    Contributor

    PyTuple_Check() got slower across the board. This is problematic because the principal use case for PyTuple_Check() is as a guard for various fast paths.

    The direct cause of the degradation is that the inlining of PyType_Check() didn't go so well — commit 509dd90.

    There are two issues. First, we lost the fast path for an exact type match that was present in the 3.8 version of PyType_Check(). Second, functions like PyType_Check() cannot be fully inlined if they call other non-inlined functions like PyType_GetFlags().

    The original unreviewed commit doesn't revert cleanly because subsequent changes were made. Instead, I suggest:

    • restore the short-cut for an exact type match, and
    • convert PyType_GetFlags() to a macro or an inlined function

    That would fix the performance regression while still treating type objects as opaque.

    ------- Incomplete inlines ----------------------------------------------
    ------- line 639 in object.h --------------------------------------------

    static inline int
    PyType_HasFeature(PyTypeObject *type, unsigned long feature) {
        return ((PyType_GetFlags(type) & feature) != 0);
    }
                   ^---- Non-static function cannot be inlined
    
    #define PyType_FastSubclass(type, flag) PyType_HasFeature(type, flag)
    
    static inline int _PyType_Check(PyObject *op) {
        return PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_TYPE_SUBCLASS);
    }
    #define PyType_Check(op) _PyType_Check(_PyObject_CAST(op))

    ------- Old Type Check Code ---------------------------------------------
    ------- line 646 in object.h --------------------------------------------

    #define PyObject_TypeCheck(ob, tp) \
        (Py_TYPE(ob) == (tp) || PyType_IsSubtype(Py_TYPE(ob), (tp)))
                     ^--- Fast path for exact match is now gone

    ------- Non-static function cannot be inlined --------------------------
    ------- line 2339 in typeobject.c -------------------------------------

    unsigned long
    PyType_GetFlags(PyTypeObject *type)
    {
        return type->tp_flags;
    }

    @rhettinger rhettinger reopened this Jul 3, 2020
    @rhettinger rhettinger reopened this Jul 3, 2020
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 4, 2020

    Christian Heimes has expressed an interest (quite rudely) in unreviewed commits elsewhere. I wonder if that applies to everyone regardless of employer.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 4, 2020

    Ah, I see that the subject has already been answered in the negative:

    https://discuss.python.org/t/make-code-review-mandatory/4174

    He must have forgotten.

    @rhettinger
    Copy link
    Contributor

    Let's just focus on getting it fixed before 3.9 goes out.

    @tiran
    Copy link
    Member

    tiran commented Jul 4, 2020

    Stefan, stop it! I feel offended and harassed by you. Do not ever talk to me in public, private, or even insinuate to reference to my person in any public conversation.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 4, 2020

    I have no opinion about this commit (I did not benchmark anything,
    but I wouldn't be surprised if Victor did).

    I do have the opinion that unreviewed commits occur quite frequently
    and nearly every committer has made some (or a lot), including the
    committers who mention them.

    They are not inherently bad, so I hope we can focus on the outcome.

    @ambv
    Copy link
    Contributor

    ambv commented Jul 5, 2020

    Good catch, Raymond. I'll mark this as a release blocker to ensure I don't miss it. The last beta is scheduled for July 20th. Is this enough time for you? I would very much like to avoid refactors of any sort during the release candidate stage.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 7, 2020

    Raymond:

    PyTuple_Check() got slower across the board. This is problematic because the principal use case for PyTuple_Check() is as a guard for various fast paths.

    Raymond gave more context in this email:
    https://mail.python.org/archives/list/python-dev@python.org/message/Q3YHYIKNUQH34FDEJRSLUP2MTYELFWY3/

    As far as I know, only macOS is impacted by this performance regression (119 nsec => 152 nsec on a microbenchmark). On Windows and Linux, Python is built with LTO and PGO optimizations.

    For example, the Python package provided by Fedora 32 is built with GCC 10 using LTO and PGO. Using GCC 10 and LTO, the PyTuple_Check() code is inlined even if PyType_GetFlags() is function call in PyType_HasFeature().

    I didn't know that on macOS, LTO was not used. I created bpo-41181 to request to enable LTO on the macOS installer builder, but Ned Deily rejected the issue (read the issue for the rationale).

    I dislike comparing performances when LTO and PGO optimizations are not used. LTO enables tons of optimizations and PGO makes benchmark results more reproducible.

    One idea is to stop running benchmarks on macOS, and at least only run benchmarks on binaries built with LTO and PGO.

    --

    Raymond:

    The direct cause of the degradation is that the inlining of PyType_Check() didn't go so well — commit 509dd90.

    I'm not sure the performance changed with this commit. IMHO it's more the commit 45ec5b9: bpo-40170 and #63577.

    Background behind this change: see PEP-620 for the overall plan, see bpo-39573 and bpo-40170 to make PyObject and PyTypeObject structures opaque. The idea is to slowly move third party extension modules towards the stable ABI (PEP-384). Internally, CPython continues to access directly structure members.

    The original unreviewed commit (...)

    #63577 was reviewed and approved by another core dev.

    The impact of performance of these changes was taken in account, and that's why it was decided to add a new internal _PyType_HasFeature() function.

    It wasn't noticed that PyTuple_Check() calls the public PyType_HasFeature() instead of the internal _PyType_HasFeature(). As I wrote, using LTO, the code is inlined anyway, so it's the same thing.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 7, 2020

    Stefan Krah:

    Christian Heimes has expressed an interest (quite rudely) in unreviewed commits elsewhere. I wonder if that applies to everyone regardless of employer.

    I don't understand the relationship between this issue, Christian Heimes and making reviews mandatory or not.

    As I wrote, the commit 45ec5b9 was reviewed and approved by another core dev.

    Stefan, please stop personal attack and stay at the technical level.

    @rhettinger
    Copy link
    Contributor

    Here are two timings for math.dist(). They were run with the production macOS builds from python.org:

    $ python3.8 -m timeit -r 11 -s 'from math import dist' -s 'p=(1.1, 2.2); q=(1.7, 2.3)' 'dist(p, q)'
    5000000 loops, best of 11: 58.4 nsec per loop
    
    $ python3.9 -m timeit -r 11 -s 'from math import dist' -s 'p=(1.1, 2.2); q=(1.7, 2.3)' 'dist(p, q)'
    5000000 loops, best of 11: 66.9 nsec per loop

    The attached screen shot shows that the only change between the two versions is that the subclass check is inlined and fast in 3.8, but is an external function call in 3.9.

    ---- 3.8 subclass check -----------
    movq 8(%r12), %rax
    movl $0, 32(%rsp)
    testb $4, 171(%rax)
    je L779

    ---- 3.9 subclass check -----------

    movq    8(%r12), %rdi
    call    _PyType_GetFlags
    movl    $0, 32(%rsp)
    testl   $67108864, %eax
    je  L856
    

    The C code for math.dist() is unchanged between 3.8 and 3.9. Both use PyTuple_Check().

    This isn't unique. Every single PyTuple_Check() is the similarly affected (approx. 225 occurrences). Presumably, this affects other type checks as well.

    @rhettinger
    Copy link
    Contributor

    Serhiy, do you have any insights to offer?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 7, 2020

    A brief note for Victor that *nothing* was directed against him. On
    the contrary, msg372995 was supporting him in case the commit had
    actually been unreviewed, which it apparently wasn't. Sorry for
    jumping on the bandwagon there.

    PEP-620 for the overall plan.

    This one I have some trouble with. It cites PyPy as a bottleneck, but
    IIRC Armin Rigo was on record saying that such changes would not help
    PyPy, and he would come up with a counter proposal. Has this changed?

    It is also a bit unusual that the PEP is in draft status but a lot
    of things are already committed.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 7, 2020

    s/PyPy as a bottleneck/the C-API as a bottle neck for PyPy/

    @rhettinger
    Copy link
    Contributor

    Victor, is there any reason PyType_GetFlags() can't be converted to a macro or an inlined function? That seems like a simple and robust fix that won't get in the way of anything else you're doing.

    We shouldn't disregard macOS timings. It is an important platform —dominant in data science, dominant among most of my clients, and dominant among the core developers at the last sprint.

    None of the inlining should depend on LTO and PGO. That is fragile and will create hard-to-explain variations between builds (even consecutive builds on the same platform). Third-party extensions, SO and DLL files won't benefit from that. Also, it makes it difficult to individually optimize and analyze a function without a costly rebuild at every step.

    @serhiy-storchaka
    Copy link
    Member

    I do not know the purpose of this issue. The new code does not look cleaner to me, but maybe it is only for me.

    But performance regressions should be fixed. We cannot rely on compiler-specific global optimization, which is hard to test and does not work for dynamic linkage (third-party extensions and applications with built-in Python). Even "inline" is just a hint to the compiler and does not guarantee inlining the code.

    Raymond's suggestions look reasonable to me.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 8, 2020

    Since the PEP-620 is still a draft, I created PR 21390 to fix the performance issue.

    As I wrote, I didn't expect any impact on performances since I added _PyType_HasFeature() which access directly the structure member. But I was wrong.

    This change is not strictly required by my work for now, so I just revert it until the PEP is accepted.

    If I reapply the change later, I will take care of ensuring that it's properly optimized in CPython internals (access the member, don't call a function), especially on macOS.

    If you want to continue the discussion, please discuss on bpo-40170, since this issue is unrelated.

    Stefan Krah:

    This one I have some trouble with. It cites PyPy as a bottleneck,
    but IIRC Armin Rigo was on record saying that such changes would
    not help PyPy, and he would come up with a counter proposal.
    Has this changed?

    Raymond:

    Victor, is there any reason PyType_GetFlags() can't be converted to a macro or an inlined function?

    Serhiy Storchaka:

    I do not know the purpose of this issue. The new code does not look cleaner to me, but maybe it is only for me.

    I explained the rationale in bpo-40170 and PEP-620.

    It's not a "cleanup change". Serhiy: we are discussing the commit 45ec5b9 of bpo-40170.

    This discussion is happening in the wrong bpo which seems to confuse people. So I close again this issue which is fixed.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Jul 11, 2020

    Raymond, did you replace the screenshot with a later one showing that things are fixed now? The timestamp suggests it went up at the same time as your comment, but what I see in the .png file is that the two are identical other than addresses.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants