Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster parsing keyword arguments #71761

Closed
serhiy-storchaka opened this issue Jul 19, 2016 · 13 comments
Closed

Faster parsing keyword arguments #71761

serhiy-storchaka opened this issue Jul 19, 2016 · 13 comments
Assignees
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Jul 19, 2016

BPO 27574
Nosy @brettcannon, @rhettinger, @gpshead, @pitrou, @vstinner, @larryhastings, @serhiy-storchaka
Files
  • faster_keyword_args_parse.patch
  • faster_keyword_args_parse_2.patch
  • faster_keyword_args_parse_alt.patch
  • faster_keyword_args_parse_alt2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-12-17.20:41:04.491>
    created_at = <Date 2016-07-19.16:03:55.797>
    labels = ['interpreter-core', 'performance']
    title = 'Faster parsing keyword arguments'
    updated_at = <Date 2016-12-17.20:41:04.490>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2016-12-17.20:41:04.490>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-12-17.20:41:04.491>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-07-19.16:03:55.797>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['43794', '43996', '45946', '45947']
    hgrepos = []
    issue_num = 27574
    keywords = ['patch']
    message_count = 13.0
    messages = ['270832', '270889', '271834', '271928', '272201', '272231', '272411', '272656', '272930', '272979', '272981', '272983', '283516']
    nosy_count = 8.0
    nosy_names = ['brett.cannon', 'rhettinger', 'gregory.p.smith', 'pitrou', 'vstinner', 'larry', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue27574'
    versions = ['Python 3.6']

    @serhiy-storchaka
    Copy link
    Member Author

    serhiy-storchaka commented Jul 19, 2016

    Parsing keyword arguments is much more slow than parsing positional arguments. Parsing time can be larger that useful execution time.

    $ ./python -m timeit "b'a:b:c'.split(b':', 1)"
    1000000 loops, best of 3: 0.638 usec per loop
    $ ./python -m timeit "b'a:b:c'.split(b':', maxsplit=1)"
    1000000 loops, best of 3: 1.64 usec per loop

    The main culprit is that Python strings are created for every keyword name on every call.

    Proposed patch adds alternative API that caches keyword names as Python strings in special object. Argument Clinic is changed to use this API in generated file. An effect of the optimization:

    $ ./python -m timeit "b'a:b:c'.split(b':', maxsplit=1)"
    1000000 loops, best of 3: 0.826 usec per loop

    Invocations of PyArg_ParseTupleAndKeywords() in non-generated code are kept, since API is not stable yet. Later I'm going to cache parsed format strings and speed up parsing positional arguments too.

    @serhiy-storchaka serhiy-storchaka added interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jul 19, 2016
    @brettcannon
    Copy link
    Member

    brettcannon commented Jul 20, 2016

    I haven't reviewed the patch, but the idea is great as I know one of Larry's hopes of using Argument Clinic was to allow for this kind of speed-up.

    @serhiy-storchaka
    Copy link
    Member Author

    serhiy-storchaka commented Aug 2, 2016

    Ping.

    @serhiy-storchaka
    Copy link
    Member Author

    serhiy-storchaka commented Aug 3, 2016

    Updated patch addresses Antoine's comments. All checks of format string are moved into parser_init.

    I experimented with Antoine's idea about making vgetargskeywords a simple wrapper around vgetargskeywordsfast with one-shot parser, but this slows down parsing positional arguments too much (due to creating Python strings for unused keyword names).

    @vstinner
    Copy link
    Member

    vstinner commented Aug 9, 2016

    See also the old issue bpo-17170 "string method lookup is too slow".

    @serhiy-storchaka
    Copy link
    Member Author

    serhiy-storchaka commented Aug 9, 2016

    Indeed, in bpo-17170 this issue was discussed first.

    @rhettinger
    Copy link
    Contributor

    rhettinger commented Aug 11, 2016

    Normally, LGTM is an almost useless comment, but the patch does in fact look good to me. I like how compact and straight-forward the changes are to the individual parsing calls.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Aug 14, 2016

    New changeset e527715bd0b3 by Serhiy Storchaka in branch 'default':
    Issue bpo-27574: Decreased an overhead of parsing keyword arguments in functions
    https://hg.python.org/cpython/rev/e527715bd0b3

    @vstinner
    Copy link
    Member

    vstinner commented Aug 17, 2016

    The issue can now be closed no?

    @serhiy-storchaka
    Copy link
    Member Author

    serhiy-storchaka commented Aug 17, 2016

    I left this issue open for three reasons.

    1. I had ideas and almost finished patch for different optimization. Unfortunately my hope was not justified, new implementation is slower. If I fail to fix it in few days, I'll close the issue.

    2. For bikeshedding in case somebody want to suggest different names or interface.

    3. I was going to convert most occurrences of PyArg_ParseTupleAndKeywords() to Argument Clinic for achieving larger effect of this optimization. But this patch was larger than I expected.

    @brettcannon
    Copy link
    Member

    brettcannon commented Aug 17, 2016

    I think for converting uses to Argument Clinic it can be done in a more iterative process on a per-module basis. How many modules do we have left to convert? If it isn't ridiculously huge we could open individual issues to convert them each.

    @serhiy-storchaka
    Copy link
    Member Author

    serhiy-storchaka commented Aug 17, 2016

    Yes, I came to conclusion than needed to push existing issues for separate files. I'm sure there are ready patches waiting for review. Now there is additional reason for converting to Argument Clinic. But some files contain only one PyArg_ParseTupleAndKeywords(), I think we can convert them in one patch.

    @serhiy-storchaka
    Copy link
    Member Author

    serhiy-storchaka commented Dec 17, 2016

    Just for the history, there are two alternative patches. They unpack keyword arguments to linear array. I expected this approach can add more optimization, but actually the benefit is too small or negative.

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 17, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants