Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge BINARY_SUBSCR_LIST_INT with BINARY_SUBSCR_LIST_TUPLE #91407

Closed
eendebakpt mannequin opened this issue Apr 7, 2022 · 4 comments
Closed

Merge BINARY_SUBSCR_LIST_INT with BINARY_SUBSCR_LIST_TUPLE #91407

eendebakpt mannequin opened this issue Apr 7, 2022 · 4 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@eendebakpt
Copy link
Mannequin

eendebakpt mannequin commented Apr 7, 2022

BPO 47251
Nosy @eendebakpt
PRs
  • gh-91407: Merge BINARY_SUBSCR_LIST_INT with BINARY_SUBSCR_TUPLE_INT #32404
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-04-07.19:10:15.380>
    labels = ['interpreter-core', '3.11']
    title = 'Merge BINARY_SUBSCR_LIST_INT with BINARY_SUBSCR_LIST_TUPLE'
    updated_at = <Date 2022-04-07.19:12:32.413>
    user = 'https://github.com/eendebakpt'

    bugs.python.org fields:

    activity = <Date 2022-04-07.19:12:32.413>
    actor = 'pieter.eendebak'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2022-04-07.19:10:15.380>
    creator = 'pieter.eendebak'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 47251
    keywords = ['patch']
    message_count = 1.0
    messages = ['416937']
    nosy_count = 1.0
    nosy_names = ['pieter.eendebak']
    pr_nums = ['32404']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue47251'
    versions = ['Python 3.11']

    @eendebakpt
    Copy link
    Mannequin Author

    eendebakpt mannequin commented Apr 7, 2022

    The implementations of BINARY_SUBSCR_LIST_INT and BINARY_SUBSCR_TUPLE_INT are almost identical. They can be merged, so there is one opcode less and the code is shared.

    @eendebakpt eendebakpt mannequin added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 7, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kevin-chau
    Copy link

    kevin-chau commented Apr 11, 2022

    The code is very similar, but they deal with different underlying objects, so one calls PyList_GET_SIZE() vs PyTuple_GET_SIZE(). Surely they are similar enough to do some refactoring of shared code, but wouldn't there still need to be two different opcodes?

            TARGET(BINARY_SUBSCR_LIST_INT) {
                assert(cframe.use_tracing == 0);
                PyObject *sub = TOP();
                PyObject *list = SECOND();
                DEOPT_IF(!PyLong_CheckExact(sub), BINARY_SUBSCR);
                DEOPT_IF(!PyList_CheckExact(list), BINARY_SUBSCR);
    
                // Deopt unless 0 <= sub < PyList_Size(list)
                Py_ssize_t signed_magnitude = Py_SIZE(sub);
                DEOPT_IF(((size_t)signed_magnitude) > 1, BINARY_SUBSCR);
                assert(((PyLongObject *)_PyLong_GetZero())->ob_digit[0] == 0);
                Py_ssize_t index = ((PyLongObject*)sub)->ob_digit[0];
                DEOPT_IF(index >= PyList_GET_SIZE(list), BINARY_SUBSCR);
                STAT_INC(BINARY_SUBSCR, hit);
                PyObject *res = PyList_GET_ITEM(list, index);
                assert(res != NULL);
                Py_INCREF(res);
                STACK_SHRINK(1);
                Py_DECREF(sub);
                SET_TOP(res);
                Py_DECREF(list);
                JUMPBY(INLINE_CACHE_ENTRIES_BINARY_SUBSCR);
                NOTRACE_DISPATCH();
            }
    
            TARGET(BINARY_SUBSCR_TUPLE_INT) {
                assert(cframe.use_tracing == 0);
                PyObject *sub = TOP();
                PyObject *tuple = SECOND();
                DEOPT_IF(!PyLong_CheckExact(sub), BINARY_SUBSCR);
                DEOPT_IF(!PyTuple_CheckExact(tuple), BINARY_SUBSCR);
    
                // Deopt unless 0 <= sub < PyTuple_Size(list)
                Py_ssize_t signed_magnitude = Py_SIZE(sub);
                DEOPT_IF(((size_t)signed_magnitude) > 1, BINARY_SUBSCR);
                assert(((PyLongObject *)_PyLong_GetZero())->ob_digit[0] == 0);
                Py_ssize_t index = ((PyLongObject*)sub)->ob_digit[0];
                DEOPT_IF(index >= PyTuple_GET_SIZE(tuple), BINARY_SUBSCR);
                STAT_INC(BINARY_SUBSCR, hit);
                PyObject *res = PyTuple_GET_ITEM(tuple, index);
                assert(res != NULL);
                Py_INCREF(res);
                STACK_SHRINK(1);
                Py_DECREF(sub);
                SET_TOP(res);
                Py_DECREF(tuple);
                JUMPBY(INLINE_CACHE_ENTRIES_BINARY_SUBSCR);
                NOTRACE_DISPATCH();
            }
    

    TARGET(BINARY_SUBSCR_LIST_INT) {

    #define BINARY_SUBSCR_LIST_INT 20

    Also, there's a typo in the issue title, it should be BINARY_SUBSCR_TUPLE_INT, not BINARY_SUBSCR_LIST_TUPLE, but I'm not sure how easy it is to change that since this issue was part of the migration.

    @eendebakpt
    Copy link
    Contributor

    @kevin-chau @markshannon Thanks for the feedback on the PR. Merging the two opcodes does not seem a good idea at this moment. Factoring out common code is possible (see item 4. below), but I would like to know whether this is worth the effort.

    1. Although the two opcodes can be combined into a single opcode, this introduces branching which can hurt performance. This is in the lines
                if (is_list)
                  res = PyList_GET_ITEM(sequence, index);
                else
                   res = PyTuple_GET_ITEM(sequence, index);
    

    The list object stores the elements in PyObject **ob_item;, whereas the tuple has PyObject *ob_item[1];. I see no method to access these two different structures without some kind of branching.

    1. A performance gain because the two opcodes are actually merged can only happen for strange constructions like
    a=[ [1,2,3,4], (1,2,None)] * 100
    for seq in a:
    	value = seq[1]
    

    I doubt this is common in real code.

    1. To reduce branching I tested combining DOPTS statements, e.g. replace
                DEOPT_IF(!PyLong_CheckExact(sub), BINARY_SUBSCR);
                DEOPT_IF(!PyList_CheckExact(list), BINARY_SUBSCR);
    

    with

    DEOPT_IF(!PyLong_CheckExact(sub) || !PyList_CheckExact(list), BINARY_SUBSCR);
    

    (several other variations have been tried as well). This showed no performance improvements

    1. I tried Mark's suggestion to factor out common code of BINARY_SUBSCR_LIST_INT and BINARY_SUBSCR_LIST_TUPLE. The result is in main...eendebakpt:combine_list_tuple_opcode_v3. Factoring out the code is possible, without any performance loss or gain.

    Is the factoring out worthwhile to make a new PR for?

    Performance benchmarks for 4.

    Microbenchmark

    import pyperf
    runner = pyperf.Runner()
    
    setup='l=[1,2,None,4,5,6,7,8,9,10]; t=(1,None,3,4)'
    
    runner.timeit(name=f"list", stmt=f"x=l[3]",setup=setup)
    runner.timeit(name=f"tuple", stmt=f"x=t[2]",setup=setup)
    code="""
    for ii in range(400):
    	idx=ii%4
    	a=l[idx]
    	b=t[idx]	
    """
    runner.timeit(name=f"mixed", stmt=code,setup=setup)
    
    code="""
    a=[ [1,2,3,4], (1,2,None)] * 100
    for seq in a:
    	value = seq[1]
    """
    runner.timeit(name=f"mixed_same_opcode", stmt=code,setup=setup)
    

    results in Geometric mean: 1.00x slower

    Pyperformance

    chaos: Mean +- std dev: [base] 87.8 ms +- 0.8 ms -> [patch] 88.4 ms +- 1.3 ms: 1.01x slower
    fannkuch: Mean +- std dev: [base] 514 ms +- 15 ms -> [patch] 524 ms +- 2 ms: 1.02x slower
    float: Mean +- std dev: [base] 103 ms +- 2 ms -> [patch] 104 ms +- 1 ms: 1.01x slower
    go: Mean +- std dev: [base] 163 ms +- 1 ms -> [patch] 167 ms +- 2 ms: 1.02x slower
    hexiom: Mean +- std dev: [base] 7.66 ms +- 0.04 ms -> [patch] 8.05 ms +- 0.06 ms: 1.05x slower
    json_dumps: Mean +- std dev: [base] 15.1 ms +- 0.5 ms -> [patch] 15.0 ms +- 0.2 ms: 1.01x faster
    logging_silent: Mean +- std dev: [base] 133 ns +- 1 ns -> [patch] 133 ns +- 0 ns: 1.00x slower
    nbody: Mean +- std dev: [base] 111 ms +- 2 ms -> [patch] 113 ms +- 1 ms: 1.02x slower
    nqueens: Mean +- std dev: [base] 106 ms +- 1 ms -> [patch] 110 ms +- 1 ms: 1.04x slower
    pickle_dict: Mean +- std dev: [base] 30.9 us +- 0.2 us -> [patch] 31.2 us +- 1.0 us: 1.01x slower
    pickle_list: Mean +- std dev: [base] 4.41 us +- 0.04 us -> [patch] 4.35 us +- 0.04 us: 1.01x faster
    pyflate: Mean +- std dev: [base] 532 ms +- 8 ms -> [patch] 557 ms +- 5 ms: 1.05x slower
    python_startup: Mean +- std dev: [base] 10.9 ms +- 0.5 ms -> [patch] 10.7 ms +- 0.4 ms: 1.01x faster
    raytrace: Mean +- std dev: [base] 381 ms +- 3 ms -> [patch] 378 ms +- 4 ms: 1.01x faster
    regex_compile: Mean +- std dev: [base] 163 ms +- 1 ms -> [patch] 167 ms +- 1 ms: 1.03x slower
    regex_dna: Mean +- std dev: [base] 238 ms +- 2 ms -> [patch] 240 ms +- 3 ms: 1.01x slower
    richards: Mean +- std dev: [base] 57.4 ms +- 1.5 ms -> [patch] 58.0 ms +- 0.8 ms: 1.01x slower
    scimark_fft: Mean +- std dev: [base] 447 ms +- 10 ms -> [patch] 443 ms +- 4 ms: 1.01x faster
    scimark_lu: Mean +- std dev: [base] 142 ms +- 2 ms -> [patch] 145 ms +- 2 ms: 1.03x slower
    scimark_sparse_mat_mult: Mean +- std dev: [base] 6.18 ms +- 0.11 ms -> [patch] 5.87 ms +- 0.09 ms: 1.05x faster
    spectral_norm: Mean +- std dev: [base] 150 ms +- 7 ms -> [patch] 146 ms +- 3 ms: 1.02x faster
    unpack_sequence: Mean +- std dev: [base] 52.6 ns +- 1.0 ns -> [patch] 53.3 ns +- 1.1 ns: 1.01x slower
    unpickle: Mean +- std dev: [base] 15.0 us +- 0.3 us -> [patch] 14.7 us +- 0.2 us: 1.02x faster
    unpickle_pure_python: Mean +- std dev: [base] 301 us +- 2 us -> [patch] 303 us +- 2 us: 1.01x slower
    xml_etree_parse: Mean +- std dev: [base] 176 ms +- 3 ms -> [patch] 175 ms +- 3 ms: 1.01x faster
    xml_etree_process: Mean +- std dev: [base] 71.3 ms +- 0.7 ms -> [patch] 71.8 ms +- 0.6 ms: 1.01x slower
    
    Benchmark hidden because not significant (20): 2to3, deltablue, json_loads, logging_format, logging_simple, meteor_contest, pathlib, pickle, pickle_pure_python, pidigits, python_startup_no_site, regex_effbot, regex_v8, scimark_monte_carlo, scimark_sor, sqlite_synth, telco, unpickle_list, xml_etree_iterparse, xml_etree_generate
    
    Geometric mean: 1.00x slower
    

    @eendebakpt
    Copy link
    Contributor

    @markshannon The issue can be closed afaik. I am not allowed (maybe because it was opened by the mannequin?)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants