Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize wrapper descriptors using FASTCALL #75724

Closed
vstinner opened this issue Sep 21, 2017 · 8 comments
Closed

Optimize wrapper descriptors using FASTCALL #75724

vstinner opened this issue Sep 21, 2017 · 8 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@vstinner
Copy link
Member

BPO 31543
Nosy @rhettinger, @pitrou, @vstinner, @methane, @serhiy-storchaka
PRs
  • [WIP] bpo-31543: Optimize wrapper descriptors #3685
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-11-24.01:10:38.616>
    created_at = <Date 2017-09-21.13:25:31.091>
    labels = ['interpreter-core', '3.7', 'performance']
    title = 'Optimize wrapper descriptors using FASTCALL'
    updated_at = <Date 2017-11-24.01:10:38.614>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2017-11-24.01:10:38.614>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-11-24.01:10:38.616>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2017-09-21.13:25:31.091>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 31543
    keywords = ['patch']
    message_count = 8.0
    messages = ['302692', '302697', '302709', '302710', '302717', '302724', '302733', '306865']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'pitrou', 'vstinner', 'methane', 'serhiy.storchaka']
    pr_nums = ['3685']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue31543'
    versions = ['Python 3.7']

    @vstinner
    Copy link
    Member Author

    Attached pull request adds a fastpath for wrapper descriptors to use the FASTCALL calling convention. It's a follow up of bpo-31410 and all my work on FASTCALL during Python 3.6 and 3.7 development cycles.

    Microbenchmark:

    ./python -m perf timeit -s 'import array; obj=array.array("b"); wrap=array.array.__len__' 'wrap(obj)'

    Result:

    haypo@selma$ ./python -m perf compare_to ref.json patch.json
    Mean +- std dev: [ref] 59.2 ns +- 0.6 ns -> [patch] 28.2 ns +- 0.9 ns: 2.10x faster (-52%)

    It removes 31 nanoseconds on such very fast C function, array_length().

    Attached PR is still a work-in-progress. First I would like to know if it's worth it because working on polishing the actual code.

    @vstinner vstinner added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Sep 21, 2017
    @pitrou
    Copy link
    Member

    pitrou commented Sep 21, 2017

    First I would like to know if it's worth it

    I don't know. What's the point of optimizing array.array.__len__(obj)? People usually call len(obj) for that...

    @vstinner
    Copy link
    Member Author

    I don't know. What's the point of optimizing array.array.__len__(obj)? People usually call len(obj) for that...

    Right, type.method(self) is less than common than self.method().

    I looked at the stdlib. I found that the following method are called using wrapper descriptors:

    • object.__repr__(self)
    • object.__getattribute__(self, name)
    • _asyncio.Future.__del__(self)
    • int.__str__(obj)
    • etc.

    But it seems like such calls are rare compared to other kinds of function calls.

    --

    By the way, _PyMethodDescr_FastCallKeywords() is only called from call_function() in Python/ceval.c. It's not used in Objects/call.c. Maybe we should use it there as well? It seems like this is a question about tracing. But maybe we can copy/paste the code from call_function()?

    @serhiy-storchaka
    Copy link
    Member

    It optimizes the same cases as bpo-31410. bpo-31410 removed a half of the overhead for wrapper descriptors, Victor's patch removes the remaining half. Actually it makes calling the descriptor faster than calling the corresponding builtin! But bpo-31410 changes were simple, just few lines, and PR 3685 looks much more complex.

    Are non-fast wrappers still needed? If the patch replaces the code instead of adding it this would decrease its cost. AFAIK the only descriptors that should support non-fast calling convention are __new__, __init__ and __call__.

    As for practicality, this change should slightly speed up the code that directly calls __eq__, __lt__. For example classes decorated with total_ordering. Victor, can you provide microbenchmarks with more practical code?

    @rhettinger
    Copy link
    Contributor

    This seems like a straight-forward win. I don't think we really need to see microbenchmarks before going forward with this one.

    @vstinner
    Copy link
    Member Author

    The most complex slots are __new__, __init__ and __call__ because they
    accept keywords. It's hard to design a calling convention which optimize
    all cases because of the backward compatibility. The risk is to convert a
    dict to args + kwnames (tuple) and then back to a dict: two expensive and
    useless conversions.

    It's my 3rd or 4th attempt to optimize new/init/call :-) My previous
    attempt was to add new tp_fastXXX slots to types, but again thr backward
    compatibilty was a major pain.
    https://bugs.python.org/issue29358

    Here the scope is much small and backward compatibilty shouldn't affect us.

    I will try to find a way to optimize these slots as well and complete my
    patch.

    @serhiy-storchaka
    Copy link
    Member

    These slots are the only slots that accept keyword arguments and arbitrary number of positional parameters. Other slots can use fast calls.

    @vstinner
    Copy link
    Member Author

    It seems like these specific descriptors are rare, so the added complexity is not worth it. I close my issue as rejected.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants