Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7 #70468

Closed
1st1 opened this issue Feb 3, 2016 · 25 comments
Closed

ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7 #70468

1st1 opened this issue Feb 3, 2016 · 25 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@1st1
Copy link
Member

1st1 commented Feb 3, 2016

BPO 26280
Nosy @gvanrossum, @pitrou, @vstinner, @markshannon, @serhiy-storchaka, @1st1, @DimitrisJim, @pablogsal, @iritkatriel
PRs
  • bpo-26280: Port BINARY_SUBSCR to PEP 659 adaptive interpreter #27043
  • Files
  • 26280_stats.diff: quick and dirty counters for a few types
  • subscr_stats.txt
  • subscr1.patch
  • subscr2.patch: Removed tuple block, inlined a few things
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-07-15.12:23:09.333>
    created_at = <Date 2016-02-03.17:02:18.525>
    labels = ['interpreter-core', '3.11', 'performance']
    title = 'ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7'
    updated_at = <Date 2021-07-15.15:11:26.236>
    user = 'https://github.com/1st1'

    bugs.python.org fields:

    activity = <Date 2021-07-15.15:11:26.236>
    actor = 'gvanrossum'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-07-15.12:23:09.333>
    closer = 'iritkatriel'
    components = ['Interpreter Core']
    creation = <Date 2016-02-03.17:02:18.525>
    creator = 'yselivanov'
    dependencies = []
    files = ['41814', '41826', '41939', '42049']
    hgrepos = []
    issue_num = 26280
    keywords = ['patch']
    message_count = 25.0
    messages = ['259492', '259512', '259514', '259516', '259517', '259602', '259611', '259625', '259627', '259628', '259632', '259651', '259652', '259677', '259711', '260376', '260377', '260497', '261034', '396986', '396998', '396999', '397031', '397543', '397557']
    nosy_count = 10.0
    nosy_names = ['gvanrossum', 'pitrou', 'vstinner', 'Mark.Shannon', 'serhiy.storchaka', 'yselivanov', 'zbyrne', 'Jim Fasarakis-Hilliard', 'pablogsal', 'iritkatriel']
    pr_nums = ['27043']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue26280'
    versions = ['Python 3.11']

    @1st1
    Copy link
    Member Author

    1st1 commented Feb 3, 2016

    See also issue bpo-21955

    @1st1 1st1 added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Feb 3, 2016
    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 3, 2016

    Yury,
    Are you going to tackle this one, or would you like me to?

    @serhiy-storchaka
    Copy link
    Member

    Would be nice to collect statistics about arguments types during running the testsuite. May be list is not the most popular type of collection.

    @1st1
    Copy link
    Member Author

    1st1 commented Feb 3, 2016

    Zach, first I was going to collect some stats on this (as Serhiy also suggests). It would be interesting to collect some stats on how many times BINARY_SUBSCR receives lists, tuples, dicts, and other types. I'd instrument the code to collect those stats and run Python tests suite and benchmarks.

    I'm pretty sure that optimizing lists (and tuples?) is a great idea. It would also be nice to optimize [-1] lookup. I'd also measure if we should add a fast path for dicts (PyDict_CheckExact + PyDict_GetItem).

    If you have time to work on this issue, then by all means go ahead. We'll be glad to assist you and review the patch.

    @1st1
    Copy link
    Member Author

    1st1 commented Feb 3, 2016

    Zach, BTW, you can see how I instrumented ceval for stats collection in the patch for issue bpo-26219. That's only for the research purposes, we won't commit that... or maybe we will, but in a separate issue :).

    @pitrou
    Copy link
    Member

    pitrou commented Feb 4, 2016

    I'm pretty sure that optimizing lists (and tuples?) is a great idea.

    I think it's a good idea indeed.

    It would also be nice to optimize [-1] lookup

    How is that different from the above? :)

    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 5, 2016

    Ok, I've started on the instrumenting, thanks for that head start, that would have taken me a while to figure out where to call the stats dump function from. Fun fact: BINARY_SUBSCR is called 717 starting python.

    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 5, 2016

    I'll put together something comprehensive in a bit, but here's a quick preview:

    $ ./python
    Python 3.6.0a0 (default, Feb  4 2016, 20:08:03) 
    [GCC 4.6.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> exit()
    Total BINARY_SUBSCR calls: 726
    List BINARY_SUBSCR calls: 36
    Tuple BINARY_SUBSCR calls: 103
    Dict BINARY_SUBSCR calls: 227
    Unicode BINARY_SUBSCR calls: 288
    Bytes BINARY_SUBSCR calls: 68
    [-1] BINARY_SUBSCR calls: 0
    $ python bm_elementtree.py -n 100 --timer perf_counter
    ...[snip]...
    Total BINARY_SUBSCR calls: 1078533
    List BINARY_SUBSCR calls: 513
    Tuple BINARY_SUBSCR calls: 1322
    Dict BINARY_SUBSCR calls: 1063075
    Unicode BINARY_SUBSCR calls: 13150
    Bytes BINARY_SUBSCR calls: 248
    [-1] BINARY_SUBSCR calls: 0

    Lib/test$ ../../python -m unittest discover
    ...[snip]...^C <== I got bored waiting
    KeyboardInterrupt
    Total BINARY_SUBSCR calls: 4732885
    List BINARY_SUBSCR calls: 1418730
    Tuple BINARY_SUBSCR calls: 1300717
    Dict BINARY_SUBSCR calls: 1151766
    Unicode BINARY_SUBSCR calls: 409924
    Bytes BINARY_SUBSCR calls: 363029
    [-1] BINARY_SUBSCR calls: 26623

    So dict seems to be the winner here

    @1st1
    Copy link
    Member Author

    1st1 commented Feb 5, 2016

    Looks like we want to specialize it for lists, tuples, and dicts; as expected. Not so sure about [-1, but I suggest to benchmark it anyways ;)

    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 5, 2016

    One thing I forgot to do was count slices.

    @serhiy-storchaka
    Copy link
    Member

    Looks as statistics varies from test to test too much. Could you please collect the statistics for running all tests?

    @pitrou
    Copy link
    Member

    pitrou commented Feb 5, 2016

    The test suite is not an appropriate workload to run benchmarks or statistics. Can you run with the benchmarks suite instead?

    @pitrou
    Copy link
    Member

    pitrou commented Feb 5, 2016

    I also suggest counting the number of BINARY_SUBSCR calls that are *not* one of the builtin types under consideration. Also, it would be good to distinguish slicing from integer indexing, for lists and tuples.

    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 5, 2016

    I'm attaching output from a selection of the benchmarks, I'm counting non-builtins and slices, but for everything, not just lists and tuples.

    Quick observation: math workloads seem list heavy, text workloads seem dict heavy, and tuples are usually somewhere in the middle.

    @vstinner vstinner changed the title ceval: Optimize [] operation similarly to CPython 2.7 ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7 Feb 6, 2016
    @vstinner
    Copy link
    Member

    vstinner commented Feb 6, 2016

    Oh, I just created a duplicate with a patch for list[int]: issue bpo-26301.

    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 17, 2016

    Here's a patch that looks likes Victor's from the duplicate, but with tuples covered as well. I ran some straight forward micro benchmarks but haven't bothered running the benchmark suite yet. Unsurprisingly, optimized paths are faster, and the others take a penalty.

    [0]byrnez@byrnez-laptop:/git/python$ ./python.orig -m timeit -s "l = [1,2,3,4,5,6]" "l[3]"
    10000000 loops, best of 3: 0.0306 usec per loop
    [0]byrnez@byrnez-laptop:
    /git/python$ ./python -m timeit -s "l = [1,2,3,4,5,6]" "l[3]"
    10000000 loops, best of 3: 0.0243 usec per loop

    [0]byrnez@byrnez-laptop:/git/python$ ./python.orig -m timeit -s "l = (1,2,3,4,5,6)" "l[3]"
    10000000 loops, best of 3: 0.0291 usec per loop
    [0]byrnez@byrnez-laptop:
    /git/python$ ./python -m timeit -s "l = (1,2,3,4,5,6)" "l[3]"
    10000000 loops, best of 3: 0.0241 usec per loop

    [0]byrnez@byrnez-laptop:/git/python$ ./python.orig -m timeit -s "l = 'asdfasdf'" "l[3]"
    10000000 loops, best of 3: 0.034 usec per loop
    [0]byrnez@byrnez-laptop:
    /git/python$ ./python -m timeit -s "l = 'asdfasdf'" "l[3]"
    10000000 loops, best of 3: 0.0366 usec per loop

    [0]byrnez@byrnez-laptop:/git/python$ ./python.orig -m timeit -s "l = [1,2,3,4,5,6]" "l[:3]"
    10000000 loops, best of 3: 0.124 usec per loop
    [0]byrnez@byrnez-laptop:
    /git/python$ ./python -m timeit -s "l = [1,2,3,4,5,6]" "l[:3]"
    10000000 loops, best of 3: 0.125 usec per loop

    @vstinner
    Copy link
    Member

    I suggest to try to inline PyList_GetItem: use PyList_GET_ITEM and raise
    the exception manually if needed.

    I'm not sure that it's ok to add PyLong_AsSize_t() to the slow path. Copy
    the code in each if? A macro can help.

    @vstinner vstinner changed the title ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7 ceval: Optimize list Feb 17, 2016
    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 19, 2016

    Is it worth handling the exception, or just let it take the slow path and get caught by PyObject_GetItem()? We're still making sure the index is in bounds.

    Also, where would be an appropriate place to put a macro for adjusting negative indices?

    @zbyrne
    Copy link
    Mannequin

    zbyrne mannequin commented Feb 29, 2016

    The new patch "subscr2" removes the tuple block, and addresses Victor's comments. This one looks a little faster, down to 0.0215 usec for the same test.

    @vstinner vstinner changed the title ceval: Optimize list ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7 Mar 1, 2016
    @iritkatriel
    Copy link
    Member

    This looks like a case of specialization.

    @gvanrossum
    Copy link
    Member

    Very much so. Irit, do you want to give it a try? (Note there are helpful instructions now in Python/adaptive.md.)

    @iritkatriel
    Copy link
    Member

    Sure, I'll have a look.

    @iritkatriel iritkatriel added the 3.11 only security fixes label Jul 5, 2021
    @pablogsal
    Copy link
    Member

    Here is my previous attempt at this, for reference:

    #9853

    @markshannon
    Copy link
    Member

    New changeset 641345d by Irit Katriel in branch 'main':
    bpo-26280: Port BINARY_SUBSCR to PEP-659 adaptive interpreter (GH-27043)
    641345d

    @gvanrossum
    Copy link
    Member

    Way to go Irit!--
    --Guido (mobile)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants