-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add opcode cache for LOAD_ATTR #86259
Comments
From the creators of "opcode cache for LOAD_GLOBAL" (https://bugs.python.org/issue26219) now it's time for "opcode cache for LOAD_ATTR: the revenge". This issue/PR builds on top of Yury's original patch in the same way https://bugs.python.org/issue26219 did for LOAD_GLOBAL. These are the benchmark results for the pyperformance test suite with PGO/LTO/CPU ISOLATION in a tuned system with pyperf: +-------------------------+--------------------------------------+-------------------------------------+ Not significant (35): chameleon; django_template; dulwich_log; fannkuch; float; json_dumps; json_loads; logging_format; mako; nqueens; pathlib; pickle; pickle_dict; pickle_list; pidigits; python_startup; python_startup_no_site; regex_compile; scimark_fft; spectral_norm; sqlalchemy_declarative; sqlalchemy_imperative; sqlite_synth; sympy_expand; sympy_sum; sympy_str; telco; tornado_http; unpack_sequence; unpickle; unpickle_list; xml_etree_parse; xml_etree_iterparse; xml_etree_generate; xml_etree_process; sympy_integrate |
Wow, this is amazing. I just found that this is now faster than slots. Should we mention that in What's New? (Of course there's an optimization possible for slots as well, but it would require complicating the cache struct. Maybe in 3.11. :-) |
Not only that: is even faster than the highly-tuned namedtuple access descriptors that used to be the faster access to an attribute: 3.9 results Now this is the faster way to get an attribute: 3.10 results
Good idea!. I will prepare a PR complementing the current paragraph. |
Thanks! Do you have any plans for further inline caches? I was wondering if we could reverse the situation for slots again by adding slots support to the LOAD_ATTR opcode inline cache... |
Yeah, we are experimenting with some ideas here: https://bugs.python.org/issue42115.
I think we can do it as long as we can detect easily if a given descriptor is immutable. The problem of mutability is this code: class Descriptor:
pass
class C:
def __init__(self):
self.x = 1
x = Descriptor()
def f(o):
return o.x
o = C()
for i in range(10000):
assert f(o) == 1
Descriptor.__get__ = lambda self, instance, value: 2
Descriptor.__set__ = lambda *args: None
print(f(o)) In this case, if we do not skip the cache for mutable descriptors, the code will not reflect the new result (2 instead of 1). __slots__ are immutable descriptors so we should be good as long as we can detect them. |
Hm, I was thinking to recognize the specific type of descriptor used by slots and cache only that. Though we would still have to consider updates to C.__dict__ (that's handled by looking at the dict version right?). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: