-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework CALL_FUNCTION* opcodes #71400
Comments
Currently there are 4 opcodes (CALL_FUNCTION, CALL_FUNCTION_VAR, CALL_FUNCTION_KW, CALL_FUNCTION_VAR_KW) for calling a function depending of presenting the var-positional and var-keyword arguments: func(arg1, ..., argN, name1=kwarg1, ..., nameM=kwargM)
func(arg1, ..., argN, *args, name1=kwarg1, ..., nameM=kwargM)
func(arg1, ..., argN, name1=kwarg1, ..., nameM=kwargM, **kwargs)
func(arg1, ..., argN, *args, name1=kwarg1, ..., nameM=kwargM, **kwargs) The number of positional and keyword arguments are packed in oparg, and both numbers are limited by 255. Thus the single keyword argument makes oparg not fitting in 8 bit and requires EXTENDED_ARG. The stack contains first positional arguments, then optional args tuple, then keyword arguments, then optional kwargs dict. For every keyword argument the two values are pushed on the stack: argument name (always constant string) and argument value. I collected a statistic about opcodes in compiled code during running Python tests [1] (maybe it is biased, but this is large and multifarious assembly, and I hope it takes some representation about average Python code). According to it about 90% of compiled function calls are calls with the fixed number of positional arguments (CALL_FUNCTION with oparg < 256), the rest 10% are calls with the fixed number of positional and keyword arguments (CALL_FUNCTION with oparg >= 256), and only about 0.5% are calls with the var-positional or var-keyword arguments. I propose to use the different sets of opcodes that corresponds to these cases: func(arg1, ..., argN)
func(arg1, ..., argN, name1=kwarg1, ..., nameM=kwargM)
func(*args, **kwargs)
Benefits:
This proposition was discussed on Python-Ideas [2]. [1] http://permalink.gmane.org/gmane.comp.python.ideas/39993 |
I'd like to take on creating a patch for this proposal once bpo-27140 lands |
I was going to write a patch myself, but since I'm sure in your skills, I yield it to you. |
Do you take this for your Demur? |
I've been working on this, may have the ceval portion mostly worked out but can't test until I finish the compile portion. Haven't had time this week, will have time to focus this weekend |
Attaching first iteration. Very drafty. Still need to fix test_dis; will run test suite this evening. Perf on pybench went from 16.5s to 17.5s. It was 18.3s prior to reintroducing use of fast_function. Code still needs clean up besides investigation into how to optimize Changes not described in the original patch concept: I made CALL_FUNCTION_EX oparg carry 2 flag bits on whether there's a kwdict or a vararg tuple. Code may be simpler to only have kwdict be optional. BUILD_MAP_UNPACK_WITH_CALL was modified to only use the highest bit of the 2nd byte to track where function_pos is. Most uses of BUILD_MAP_UNPACK_WITH_CALL is f(*va,**kw) so I chose to only set the highest bit when there isn't a positional unpacking. If we pushed an empty tuple then that flag bit could be removed |
I seemed to have been added to the nosy list. I guess that means that my opinions are wanted ;) I have wanted to clean up the code around making calls for quite a while. I just haven't had time. I agree that adding a simpler opcode for making calls with positional only args is worthwhile. What I would suggest is a new internal calling convention which embodies the six-part form of CALL_FUNCTION_VAR_KW, that is (number-of-postionals, pointer-to-positionals, number-of-keywords, pointer-to-keywords, starargs-tuple, doublestarargs-dict). The new function would look something like this:
PyEval_Call_Callable(
PyObject *callable,
int npos,
PyObject **positionals,
int nkws,
PyObject **keywords,
PyObject *starargs,
PyObject *dictargs) {
if (Is_PyFunction(callable))
return PyFunction_Call(callable, npos, ...);
} This would mean that the logic for frame creation would be where it belongs (in the code for the Function object) and that special casing the various forms of CFunctions could be moved to where that belongs. Given what Serhiy says about calls with just positional parameters being by far the most common, it sounds as if it would make sense to add With those changes, it would then make sense to replace the four opcodes with |
Thank you Demur. I left few comments on Rietveld (this is only the quick glance). Good news that this change allows to simplify BUILD_MAP_UNPACK_WITH_CALL. But using 15th bit means using EXTENDED_ARG. I think it would be simpler to just push an empty tuple with LOAD_CONST. The same number of opcodes, but simpler implementation and nicer disassemble output. Mark's idea about using oparg of the general variant for specifying a number of positional arguments without packing them in a tuple looks interesting. But I don't bother about this case. According to the statistics of running CPython tests *args or *kwargs are used only in 0.4% of calls. This is hardly worth optimizing. I'm going to collect more detailed statistics for number of args and function/method distinguishing. As about PyFunction_CallSimple() etc, see bpo-26814 and bpo-27128. _PyObject_FastCall() protocol and new calling protocol should cover the case of new CALL_FUNCTION and may be CALL_FUNCTION_KW (>99% of calls in sum). |
callfunc2 fixes test_dis, addresses code review, currently implements a copy of _PyEval_EvalCodeWithName as _PyEval_EvalCodeWithName2 which changes a few lines to work with new keyword stack layout so that we can use fast_function with kwargs CALL_FUNCTION_EX is benchmarking much slower (4x slower when using **kw), but unpatched hits similar perf when using multiple **kw. So most slowdown is due to BUILD_MAP_UNPACK_WITH_CALL being slow. So some patch which speeds up intersection check (eg optimize to not allocate intersection when disjoint) should greatly diminish the perf loss on this simpler implementation. I'll open a separate issue for this |
bpo-27358 is too complex, more complex than this issue. But I think the simple fix the regression in case of the single **kw is checking wherever the sum dict is empty.
+ if (with_call && PyDict_Size(sum)) { |
Added comments on Rietveld. |
callfunc3 addresses most feedback. Doesn't address _PyEval_EvalCodeWithName2 code bloat, & I disagree with mentioning BUILD_MAP_UNPACK_WITH_CALL change in magic number update as the ABI of BUILD_MAP_UNPACK_WITH_CALL is unchanged. ie if we were to implement bpo-27358 after this there would be no need to bump the magic number One thing which looks odd to me is the INCREF/DECREF calls on function objects surrounding calls. Especially odd is CALL_FUNCTION_EX where we finish with two calls to Py_DECREF(func). It shows up in IMPORT_NAME too |
The ABI of BUILD_MAP_UNPACK_WITH_CALL is changed. oparg=514 meant merging 2 dicts, but with the patch it will mean merging 514 dicts. The INCREF/DECREF calls on function objects surrounding calls are not needed, because the refcount was increased when the function pushed on the stack. They were needed if use PyMethod optimization, since PyMethod_GET_FUNCTION() returns borrowed link. Seems you have missed my comments on compile.c. Added more comments on Rietveld. What about CALL_FUNCTION_EX benchmarking? Did the optimization help? |
Pybench is now only ~200ms slower instead of 1200ms slower. But the whole point of this patch is that CALL_FUNCTION_EX shouldn't be optimized for, so I'd much prefer real benchmarking results |
Since the most common use of CALL_FUNCTION_EX is.. def f(*x,*kw):
other_func(*x, **kw) I've added some code to BUILD_MAP_UNPACK_WITH_CALL & BUILD_TUPLE_UNPACK to not allocate a new object if called with oparg of 1 & TOP() is correct type |
See also the issue bpo-27809 "_PyObject_FastCall(): add support for keyword arguments" where we discuss if we can completely avoid the creation of a temporary dictionary to pass keyword arguments. |
FYI I released a first version of the "new" Python benchmark suite: performance 0.1 (quickly fixed with performance 0.1.1): The README file contains a few information how to run a stable benchmark. Please help me to complete this documentation :-) I may help you to run these benchmarks. |
I'm working on a new calling convention: "fast call". I already pushed changes implementing first functions: _PyObject_FastCallDict(PyObject *func, PyObject **stack, Py_ssize_t nargs, PyObject *kwargs) kwargs is a Python dict, but it can be NULL. _PyObject_FastCallDict() avoids the creation a temporary tuple when you don't have to pass keyword arguments. But I'm interested by an evolution to avoid also the creation of a temporary dictionary: issue bpo-27830, "Add _PyObject_FastCallKeywords(): avoid the creation of a temporary dictionary for keyword arguments". This function would be the fundation for a new calling convention for C functions: bpo-27810, "Add METH_FASTCALL: new calling convention for C functions". I didn't read your patch carefully, but I would like to make sure that your patch would benefit of these APIs to avoid the creation of temporary tuple and/or dict. "func(*args, key=value)" uses CALL_FUNCTION_EX and so requires to create a temporary dictionary. Maybe it's ok, I don't think that this syntax is commonly used. "func(*args)" uses CALL_FUNCTION_EX: if args is a tuple, can you confirm that it would be possible to use _PyObject_FastCallDict() and/or _PyObject_FastCallKeywords() to avoid the creation a new temporary tuple? Hum, I don't think that "func(*args)" can be optimized if args is a list because a list is mutable, we cannot use a direct access to the internal C array of a list. |
I uploaded callfunc-6.patch which is supposed to be a rebased version of callfunc5.patch. I made tiny coding style changes. Python/ceval.c changed a lot since 1 month 1/2, so I'm not 100% sure that the rebase is correct. Known issues:
I didn't review the change yet. |
Rebased change. |
Serhiy, Victor and I want this to be merged tomorrow morning PST to land in 3.6 before the freeze. Would you be able to merge this yourself? If not, we'll do it ourselves in the morning. |
I fixed test_extcall and removed _PyEval_EvalCodeWithName2(): callfunc-8.patch. I don't know how to fix test_traceback. @serhiy: I really need the patch tomorrow. Would you mind to push it? If you are unable to fix test_traceback, just skip the test with a FIXME and open a new issue (it should be easy to fix it later). I really want to finish METH_FASTCALL, issue bpo-27810, before the first beta next monday and basically tomorrow is for me the last time when I can still push things. If we really find a major design issue, you still have 3 to 4 months to revert the change. If we find a minor design flaw or another optimization, again, we will still have time to enhance the code during the beta stage. The most important part is to push the major change to reduce changes after the first beta. |
I'm going to push the patch today. But changes to test_extcall look as a regression to me. |
I'm going to push the patch in ~10 min since I don't see it pushed yet.
I agree that it looks like a regression, but it might be tricky to Again, I have to rush before of the feature freeze, but I hope that we |
Please wait. I just have sat down at my computer. |
Oh, you are here :-) Go ahead! |
Can you please give me a status and/or an ETA? This change is blocking me for 2 more changes. |
I just fixed test_extcall and test_traceback, and going to get rid of the |
What do you mean? I already fixed test_extcall and code duplication? "I fixed test_extcall and removed _PyEval_EvalCodeWithName2(): callfunc-8.patch." I proposed to skip the failing test in test_traceback and fix it after the beta release, I don't think that it's a major issue, it can be fixed later. Can you please push callfunc-8.patch right now (skipping failing tests in test_traceback). |
New changeset a77756e480c2 by Victor Stinner in branch 'default': |
I'm really sorry Serhiy that I didn't give you more time to finish the review and push yourself the change :-( Feel free to enhance the code, I don't expect major changes. For test_traceback, I opened the issue bpo-28050: if you already know how to fix it, please go ahead ;-) |
You changed the test. I made the code passing unchanged test (in additional Here is my current patch. |
New changeset 854b08acca97 by Victor Stinner in branch 'default': |
Here is rebased patch. |
callfunc-10.patch looks like an even better enhancement compared to Python 3.5, nice work. Would you be ok to wait after the beta1 release? The change doesn't seem to change the bytecode too much, nor the C API, whereas I'm trying to push a few other changes for FASTCALL. Moreover, I didn't understand fully the change, I left some questions. |
I'm getting segfaults with a77756e480c2 (thanks to 'hg bisect') on Ubuntu 12.04. I've checked buildbots, but they seem happy. In test_sys: test_recursionlimit_recovery (test.test_sys.SysModuleTest) ... Fatal Python error: Segmentation fault In test_runpy: test_main_recursion_error (test.test_runpy.RunPathTestCase) ... Fatal Python error: Segmentation fault I'm also getting the following error with Serhiy's latest patch: python: Objects/abstract.c:2385: _PyStack_AsDict: Assertion `PyDict_GetItem(kwdict, key) != ((void *)0)' failed. |
Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-09-11 08:55 Berker: "I'm getting segfaults with a77756e480c2 (thanks to 'hg bisect') on Ubuntu 12.04 (...) We should check if the change counts correctly the number of frames. In callfunc-10.patch, Serhiy modified a counter related to recursion limit/number of frames to fix test_traceback (issue bpo-28050). He wrote on the review: "Seems the number 50 is arbitrary. Now we should increase it. Why? This issue needs further investigation. Maybe this is a sign of regression." To me, it's suspicious that we have to modify an unit test. Changing the opcode or internals must not have an impact on unit tests on recursion error. Note: It might also be related to my FASTCALL changes, but you wrote that "hg bisect" pointed you to this change. Berker: "I'm also getting the following error with Serhiy's latest patch: python: Objects/abstract.c:2385: _PyStack_AsDict: Assertion `PyDict_GetItem(kwdict, key) != ((void *)0)' failed." This one might be related to the compact dict change. Note: Just to make be sure that it is not a local issue, please try a "make distclean" and make sure that all .pyc files are removed, and then recompile Python. |
I forgot to mention that in my earlier comment. I usually run "make distclean" before "./configure --with-pydebug; make -s -j" so unfortunately running "make distclean" didn't help. |
Ah, it's my fault. I didn't test the latest patch with debug build. Here should be == instead of !=. Could you please test updated patch? Unfortunately I can's reproduce crashes in test_sys and test_runpy (on 32-bit Linux). |
Thanks, Serhiy. test_recursionlimit_recovery in test_sys passed, but now I'm getting the following failure: FAIL: test_recursionlimit_fatalerror (test.test_sys.SysModuleTest) I modified test_recursionlimit_fatalerror to use subTest() and it looks like the test passed when sys.setrecursionlimit(50). See i=1000 in the following output: ====================================================================== Traceback (most recent call last):
File "/home/berker/hacking/cpython/default/Lib/test/test_sys.py", line 283, in test_recursionlimit_fatalerror
err)
AssertionError: b'Fatal Python error: Cannot recover from stack overflow' not found in b'' test_runpy still segfaults: test_main_recursion_error (test.test_runpy.RunPathTestCase) ... Fatal Python error: Segmentation fault I wrote a quick and dirty reproducer for this. I can attach here if you think it would be useful for debugging. |
Please attach a reproducer Berker. I still can't reproduce crashes. Maybe it depends on platform or installed third-party module or Python command line options. |
New changeset 51b635e81958 by Serhiy Storchaka in branch 'default': |
See http://bugs.python.org/issue28086 - this introduced a regression in the test suite. |
I have committed the patch before beta 1 because it change bytecode generating and interpreting. The bytecode generated before the patch is compatible with the patched interpreter, but the bytecode generated by patched compiler can be not compatible with unpatched interpreter. After releasing beta 1 this would require changing bytecode magic number. |
The change 51b635e81958 introduced memory leaks. Example: $ ./python -m test -R 3:3 test_extcall
(...)
test_extcall leaked [102, 102, 102] references, sum=306
test_extcall leaked [41, 41, 41] memory blocks, sum=123 I'm analyzing the change to try to identify leaks. |
New changeset f217419d08f0 by Victor Stinner in branch 'default': |
New changeset f860b7a775c5 by Victor Stinner in branch 'default': |
New changeset e372c0ad32ce by Victor Stinner in branch 'default': New changeset 2558bc4a4ebf by Victor Stinner in branch 'default': |
Why don't we change the magic number before b1 (if your bytecode change is already committed)? |
The magic number changed many times between Python 3.5 and the future Python 3.6 beta 1. Maybe we skipped increasing this number for one bytecode change, but it should only impact the few developers who worked on the development version. In this case: just remove all ".pyc" files and it should be fine ;-) In short: I think it's fine, don't do anything else ;-) |
Objects/methodobject.c: In function ‘_PyCFunction_FastCallKeywords’: |
Thanks for the report, warning fixed in the issue bpo-28105. |
I think that's all with this issue. |
Thanks Demur and Serhiy for your great work! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: