Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opcode cache for LOAD_ATTR #86259

Closed
pablogsal opened this issue Oct 20, 2020 · 9 comments
Closed

Add opcode cache for LOAD_ATTR #86259

pablogsal opened this issue Oct 20, 2020 · 9 comments
Assignees
Labels
3.10 interpreter-core

Comments

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Oct 20, 2020

BPO 42093
Nosy @gvanrossum, @vstinner, @1st1, @gvanrossum, @pablogsal
PRs
  • #22803
  • #24070
  • #24582
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pablogsal'
    closed_at = <Date 2020-10-20.05:27:36.232>
    created_at = <Date 2020-10-20.03:18:36.160>
    labels = ['interpreter-core', '3.10']
    title = 'Add opcode cache for LOAD_ATTR'
    updated_at = <Date 2021-02-21.11:02:18.003>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2021-02-21.11:02:18.003>
    actor = 'vstinner'
    assignee = 'pablogsal'
    closed = True
    closed_date = <Date 2020-10-20.05:27:36.232>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2020-10-20.03:18:36.160>
    creator = 'pablogsal'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42093
    keywords = ['patch']
    message_count = 9.0
    messages = ['379083', '379087', '383859', '384087', '384254', '384354', '384364', '384366', '387451']
    nosy_count = 5.0
    nosy_names = ['gvanrossum', 'vstinner', 'yselivanov', 'Guido.van.Rossum', 'pablogsal']
    pr_nums = ['22803', '24070', '24582']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue42093'
    versions = ['Python 3.10']

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Oct 20, 2020

    From the creators of "opcode cache for LOAD_GLOBAL" (https://bugs.python.org/issue26219) now it's time for "opcode cache for LOAD_ATTR: the revenge". This issue/PR builds on top of Yury's original patch in the same way https://bugs.python.org/issue26219 did for LOAD_GLOBAL.

    These are the benchmark results for the pyperformance test suite with PGO/LTO/CPU ISOLATION in a tuned system with pyperf:

    +-------------------------+--------------------------------------+-------------------------------------+
    | Benchmark | 2020-10-20_01-18-master-de73d432bb29 | 2020-10-20_02-28-cache-68f931f6938a |
    +=========================+======================================+=====================================+
    | go | 407 ms | 349 ms: 1.17x faster (-14%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | raytrace | 822 ms | 730 ms: 1.13x faster (-11%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | unpickle_pure_python | 497 us | 447 us: 1.11x faster (-10%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | scimark_sor | 311 ms | 280 ms: 1.11x faster (-10%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | hexiom | 15.4 ms | 14.0 ms: 1.10x faster (-9%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | logging_silent | 302 ns | 276 ns: 1.10x faster (-9%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | chaos | 176 ms | 163 ms: 1.08x faster (-7%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | pyflate | 1.01 sec | 948 ms: 1.06x faster (-6%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | scimark_lu | 246 ms | 232 ms: 1.06x faster (-6%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | pickle_pure_python | 712 us | 674 us: 1.06x faster (-5%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | regex_effbot | 4.49 ms | 4.26 ms: 1.05x faster (-5%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | scimark_monte_carlo | 160 ms | 153 ms: 1.05x faster (-5%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | richards | 120 ms | 115 ms: 1.05x faster (-4%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | 2to3 | 458 ms | 442 ms: 1.04x faster (-4%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | regex_v8 | 33.7 ms | 32.5 ms: 1.04x faster (-3%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | scimark_sparse_mat_mult | 7.16 ms | 6.93 ms: 1.03x faster (-3%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | deltablue | 12.1 ms | 11.7 ms: 1.03x faster (-3%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | regex_dna | 268 ms | 261 ms: 1.03x faster (-3%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | meteor_contest | 152 ms | 148 ms: 1.03x faster (-3%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | genshi_xml | 89.0 ms | 87.1 ms: 1.02x faster (-2%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | logging_simple | 12.8 us | 12.5 us: 1.02x faster (-2%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | genshi_text | 42.4 ms | 41.5 ms: 1.02x faster (-2%) |
    +-------------------------+--------------------------------------+-------------------------------------+
    | nbody | 215 ms | 211 ms: 1.02x faster (-2%) |
    +-------------------------+--------------------------------------+-------------------------------------+

    Not significant (35): chameleon; django_template; dulwich_log; fannkuch; float; json_dumps; json_loads; logging_format; mako; nqueens; pathlib; pickle; pickle_dict; pickle_list; pidigits; python_startup; python_startup_no_site; regex_compile; scimark_fft; spectral_norm; sqlalchemy_declarative; sqlalchemy_imperative; sqlite_synth; sympy_expand; sympy_sum; sympy_str; telco; tornado_http; unpack_sequence; unpickle; unpickle_list; xml_etree_parse; xml_etree_iterparse; xml_etree_generate; xml_etree_process; sympy_integrate

    @pablogsal pablogsal added the 3.10 label Oct 20, 2020
    @pablogsal pablogsal self-assigned this Oct 20, 2020
    @pablogsal pablogsal added the interpreter-core label Oct 20, 2020
    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Oct 20, 2020

    New changeset 109826c by Pablo Galindo in branch 'master':
    bpo-42093: Add opcode cache for LOAD_ATTR (GH-22803)
    109826c

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Dec 27, 2020

    Wow, this is amazing. I just found that this is now faster than slots. Should we mention that in What's New?

    (Of course there's an optimization possible for slots as well, but it would require complicating the cache struct. Maybe in 3.11. :-)

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Dec 30, 2020

    Wow, this is amazing. I just found that this is now faster than slots.

    Not only that: is even faster than the highly-tuned namedtuple access descriptors that used to be the faster access to an attribute:

    3.9 results
    -----------
    17.6 ns read_classvar_from_class
    16.3 ns read_classvar_from_instance
    23.2 ns read_instancevar
    19.7 ns read_instancevar_slots
    17.9 ns read_namedtuple
    39.2 ns read_boundmethod

    Now this is the faster way to get an attribute:

    3.10 results
    ------------
    17.9 ns read_classvar_from_class
    16.9 ns read_classvar_from_instance
    14.1 ns read_instancevar
    20.0 ns read_instancevar_slots
    18.0 ns read_namedtuple
    40.7 ns read_boundmethod

    Should we mention that in What's New?

    Good idea!. I will prepare a PR complementing the current paragraph.

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Jan 3, 2021

    New changeset 9e8fe19 by Pablo Galindo in branch 'master':
    bpo-42093: Tweak the what's new message about the new LOAD_ATTR opcode cache (GH-24070)
    9e8fe19

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Jan 4, 2021

    Thanks! Do you have any plans for further inline caches? I was wondering if we could reverse the situation for slots again by adding slots support to the LOAD_ATTR opcode inline cache...

    @pablogsal
    Copy link
    Member Author

    @pablogsal pablogsal commented Jan 5, 2021

    Thanks! Do you have any plans for further inline caches?

    Yeah, we are experimenting with some ideas here: https://bugs.python.org/issue42115.

    I was wondering if we could reverse the situation for slots again by adding slots support to the LOAD_ATTR opcode inline cache...

    I think we can do it as long as we can detect easily if a given descriptor is immutable. The problem of mutability is this code:

    class Descriptor:
        pass
    
    class C:
        def __init__(self):
            self.x = 1
        x = Descriptor()
    
    def f(o):
        return o.x
    
    o = C()
    for i in range(10000):
        assert f(o) == 1
    
    Descriptor.__get__ = lambda self, instance, value: 2
    Descriptor.__set__ = lambda *args: None
    
    print(f(o))

    In this case, if we do not skip the cache for mutable descriptors, the code will not reflect the new result (2 instead of 1). __slots__ are immutable descriptors so we should be good as long as we can detect them.

    @gvanrossum
    Copy link
    Member

    @gvanrossum gvanrossum commented Jan 5, 2021

    Hm, I was thinking to recognize the specific type of descriptor used by slots and cache only that. Though we would still have to consider updates to C.__dict__ (that's handled by looking at the dict version right?).

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 21, 2021

    New changeset d5fc998 by Victor Stinner in branch 'master':
    bpo-42093: Cleanup _PyDict_GetItemHint() (GH-24582)
    d5fc998

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 interpreter-core
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants