-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement per-opcode cache in ceval #70407
Comments
I assume there's going to be a patch or more of a description of what your idea is? :) |
Yeah, I needed a URL of the issue for my email to python-dev ;) Here's a link to the email, that explains a lot about this patch: https://mail.python.org/pipermail/python-dev/2016-January/142945.html The patch is also attached (opcache1.patch). |
BTW, there are a couple of unit-tests that fail. Both can be easily fixed. To really move this thing forward, we need to profile the memory usage. First, it would be interesting to see how much additional memory is consumed if we optimize every code object. That will give us an idea of the overhead, and if it is significant, we can experiment with various heuristics to minimize it. |
If you run hg.python.org/benchmarks on Linux it has a flag to measure memory. |
Great. I'll re-run the benchmarks. BTW, the latest results are here: https://gist.github.com/1st1/aed69d63a2ff4de4c7be |
Yes, this patch depends on PEP-509. |
I'm concerned by the test_descr failure. ====================================================================== Traceback (most recent call last):
File "/home/haypo/prog/python/default/Lib/test/test_descr.py", line 4176, in test_vicious_descriptor_nonsense
self.assertEqual(c.attr, 1)
AttributeError: 'C' object has no attribute 'attr'
----------------------------------------------------------------------
====================================================================== Traceback (most recent call last):
File "/home/haypo/prog/python/default/Lib/test/test_sys.py", line 895, in test_objecttypes
check(get_cell().__code__, size('5i9Pi3P'))
File "/home/haypo/prog/python/default/Lib/test/support/__init__.py", line 1475, in check_sizeof
test.assertEqual(result, size, msg)
AssertionError: 192 != 160 : wrong size for <class 'code'>: got 192, expected 160
----------------------------------------------------------------------
====================================================================== Traceback (most recent call last):
File "/home/haypo/prog/python/default/Lib/test/test_gdb.py", line 857, in test_pycfunction
self.assertIn('#0 <built-in method gmtime', gdb_output)
AssertionError: '#0 <built-in method gmtime' not found in 'Breakpoint 1 at 0x6518e3: file ./Modules/timemodule.c, line 338.\n[Thread debugging using libthread_db enabled]\nUsing host libthread_db library "/lib64/libthread_db.so.1".\n\nBreakpoint 1, time_gmtime (self=<module at remote 0x7ffff7e6ff58>, args=(1,)) at ./Modules/timemodule.c:338\n338\t if (!parse_time_t_args(args, "|O:gmtime", &when))\n#1 <built-in method gmtime of module object at remote 0x7ffff7e6ff58>\n#4 Frame 0x7ffff7fb27b8, for file <string>, line 3, in foo ()\n#7 Frame 0x7ffff7f423f8, for file <string>, line 5, in bar ()\n#10 Frame 0x7ffff7fb25e0, for file <string>, line 6, in <module> ()\n' |
Victor, Thanks for the initial review. I'll work on the patch sometime later next week. As for test_vicious_descriptor_nonsense -- yeah, I saw that too. I know what's going on there and I know how to fix that. FWIW that test tests a very counter-intuitive and quirky CPython implementation detail. |
From the two checks on Python/compile.c: + expr_ty meth = e->v.Call.func; I just wonder how many times those kind of checks/guards are done (Naive Idea): Thanks in advance! |
Stuff done in compile.c is cached in *.pyc files. The for-loop you saw shouldn't take a lot of time - it iterates through function parameters, which an average function doesn't have more than 3-6. |
See the issue bpo-26647 which may make the implementation of this cache simpler. See also my message about inline caching: |
Victor brought this patch to my attention as the motivation for PEP-509. Unfortunately the patch doesn't apply cleanly and I don't have time to try and piece together all the different parts. From the description on python-dev you linked to there are actually a few different patches that all work together and result in a significant speedup. Is there any chance that you could get all this up to date so it applies cleanly to the CPython 3.6 repo (the hg version!), and report some realistic benchmarks? The speedups reported sound great, but I worry that there's a catch (e.g. memory cost that doesn't become apparent until you have a large program, or minimal speedup on realistic code). This is currently holding up approval of PEP-509. |
Hi Guido, I'll try to update the patch soon.
Here's an excerpt from my email [1] to Python-dev on memory footprint: To measure the max/average memory impact, I tuned my code to optimize [1] https://mail.python.org/pipermail/python-dev/2016-February/143025.html |
Thanks, that's a cool stat. Please do update the patch. |
Alright, attaching a rebased patch (opcache3.patch). Some caveats:
All in all, this patch can be used for benchmarking, but it's not yet ready for a thorough code review. Last thing, if you flip OPCODE_CACHE_STATS to 1 in ceval.c, python will print debug stats on opcode cache. |
I'm confused by the relationship between this and bpo-26110. That seems to be a much simpler patch (which also doesn't apply cleanly). If 26110 really increases method calls by 20%, what does this add? (By which I mean (a) what additional optimizations does it have, and (b) what additional speedup does it have?) I'm asking because I'm still trying to look for reasons why I should accept PEP-509, and this is brought up as a reason. |
I'm also looking for some example code that would show clearly the kind of speedup we're talking about. |
This patch embeds the implementation of 26110. I'm no longer sure it was a good idea to have two issues instead of one, everybody seems to be confused about that ;)
I'm sorry for the long response, please bare with me. This issue is complex, and it's very hard to explain it all in a short message. The patch from 26110 implements LOAD_METHOD/CALL_METHOD pair of opcodes. The idea is that we can avoid instantiation of BoundMethod objects for code that looks like "something.method(...)'. I wanted to first get in shape the patch from 26110, commit it, and then, use the patch from this issue to add additional speedups. This patch implements a generic per-opcode cache. Using that cache, it speeds up LOAD_GLOBAL, LOAD_ATTR, and LOAD_METHOD (from 26110) opcodes. The cache works on per-code object basis, only optimizing code objects that were run more than 1,000 times.
This optimization makes micro optimizations like "def smth(len=len)" obsolete. LOAD_GLOBAL becomes much faster, almost as fast as LOAD_FAST.
Python programs have a lot of LOAD_ATTRs. It also seems that dicts of classes are usually stable, and that objects rarely shadow class attributes.
Speeds up method calls another 15%, speeds up global name lookups and attribute lookups.
Here're some benchmark results: https://gist.github.com/1st1/b1978e17ee8b82cc6432
I'd say each of the above optimizations speeds up macro-benchmarks by 2-4%. Combined, they speed up CPython 7-15%.
Re PEP-509 and these patches:
I'd say that if you aren't sure about PEP-509 right now, then we can wait a couple of months and decide later. |
OK, I get it. I think it would be really helpful if bpo-26110 was updated, reviewed and committed -- it sound like a good idea on its own, and it needs some burn-in time due to the introduction of two new opcodes. (That's especially important since there's also a patch introducing wordcode, i.e. bpo-26647 and undoubtedly the two patches will conflict.) It also needs to show benchmark results on its own (but I think you've got that). I am also happy to see the LOAD_GLOBAL optimization, and it alone may be sufficient to save PEP-509 (though I would recommend editing the text of the PEP dramatically to focus on a brief but crisp description of the change itself and the use that LOAD_GLOBAL would make of it and the microbench results; it currently is a pain to read the whole thing). I have to read up on what you're doing to LOAD_ATTR/LOAD_METHOD. In the mean time I wonder how that would fare in a world where most attr lookups are in namedtuples. As a general recommendation I would actually prefer more separate patches (even though it's harder on you), just with very clearly stated relationships between them. A question about the strategy of only optimizing code objects that are called a lot. Doesn't this mean that a "main" function containing a major inner loop doesn't get the optimization it might deserve? PS. I like you a lot but there's no way I'm going to "bare" with you. :-) |
Right. Victor asked me to review the wordcode patch (and maybe even commit), and I'll try to do that this week. LOAD_METHOD/CALL_METHOD patch needs some refactoring, right now it's a PoC-quality. I agree it has to go first.
Alright. I'll work on this with Victor.
I think there will be no difference, but I can add a microbenchmark and see.
NP. This is a big change to review, and the last thing I want is to accidentally make CPython slower.
Right, the "main" function won't be optimized. There are two reasons of why I added this threshold of 1000 calls before optimizations:
Haha, this is my typo of the month, I guess ;) |
All sounds good. Glad the issue of long-running loops is at least on your |
With the 3.7 beta deadline just around the corner, 3.8 will be the next opportunity to reconsider this idea. |
Nick, Ned, Guido, Would it be OK if we add this to 3.7beta2? I feel kind of bad about this one... few thoughts:
|
It's up to the release manager, but personally it feels like you're pushing Now maybe we'll need to come up with a different way to do releases, but |
NP, I totally understand.
That would be great. |
I concur with Guido and Nick; let's target this for 3.8. |
IMHO we need more time than beta1=>beta2 window to review properly the design, test the implementation, fix any potential regression, etc. Such change is perfect for the *start of a development cycle, but IMHO it's bad when are close to the *end*. beta1=>final is supposed to only be made of bugfixes, not introducing new bugs... (I bet that adding any feature fatally introduces new bugs) |
I committed cache only for LOAD_GLOBAL, which is much simpler than Caches for other instructions will be implemented 3.9 or later. |
The opcode cache introduced an issue in reference leak hunting: see bpo-37146. |
I understand that the initial issue (add an opcode cache) has been implemented. Well done Yury and INADA-san! Please open new issues for follow-up like the new optimization ideas ;-) |
"What's new in Python 3.8" says that this change speeds up the LOAD_GLOBAL opcode by 40%. What's the evidence for the claimed speedup? |
#12884 (comment) == Microbenchmark == $ master/python -m perf timeit -s 'def foo(): int; str; bytes; float; int; str; bytes; float' -- 'foo()'
.....................
Mean +- std dev: 213 ns +- 5 ns
$ opcache/python -m perf timeit -s 'def foo(): int; str; bytes; float; int; str; bytes; float' -- 'foo()'
.....................
Mean +- std dev: 120 ns +- 2 ns == pyperformance == $ ./cpython/python -m perf compare_to master.json opcache_load_global.json -G --min-speed=2
Slower (2):
- pickle: 19.1 us +- 0.2 us -> 19.7 us +- 0.8 us: 1.03x slower (+3%)
- unpickle_list: 8.66 us +- 0.04 us -> 8.85 us +- 0.06 us: 1.02x slower (+2%) Faster (23):
Benchmark hidden because not significant (32): (...) |
Given that I really don't think we should be making claims like "40% speedup" for such contrived examples. It looks like the speedup is real, but only about 2% speedup overall. |
I personally think it would be fine to change the wording to say "measurable speed-up" and not attribute a specific number. |
On Mon, Oct 7, 2019 at 9:41 PM Mark Shannon <report@bugs.python.org> wrote:
Do you mean every microbenchmark measuring single feature is meaningless? I think it is worth enough for readers who understand what |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: