-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PEP 590 vectorcall to speed up calls to range(), list() and dict() #81388
Comments
PEP-590 allows us the short circuit the __new__, __init__ slow path for commonly created builtin types. |
Can we call tp_call instead of vectorcall when kwargs is not empty? Lines 209 to 219 in 7f41c8e
For example, dict_init may be faster than dict_vectorcall when |
One thing that keeps bothering me when using vectorcall for type.__call__ is that we would have two completely independent code paths for constructing an object: the new one using vectorcall and the old one using tp_call, which in turn calls tp_new and tp_init. In typical vectorcall usages, there is no need to support the old way any longer: we can set tp_call = PyVectorcall_Call and that's it. But for "type", we still need to support tp_new and tp_init because there may be C code out there that calls tp_new/tp_init directly. To give one concrete example: collections.defaultdict calls PyDict_Type.tp_init One solution is to keep the old code for tp_new/tp_init. This is what Mark did in PR 13930. But this leads to duplication of functionality and is therefore error-prone (different code paths may have subtly different behaviour). Since we don't want to break Python code calling dict.__new__ or dict.__init__, not implementing those is not an option. But to be compatible with the vectorcall signature, ideally we want to implement __init__ using METH_FASTCALL, so __init__ would need to be a normal method instead of a slot wrapper of tp_init (similar to Python classes). This would work, but it needs some support in typeobject.c |
$ ./python -m pyperf timeit --compare-to ./python-master 'dict()'
python-master: ..................... 89.9 ns +- 1.2 ns
python: ..................... 72.5 ns +- 1.6 ns Mean +- std dev: [python-master] 89.9 ns +- 1.2 ns -> [python] 72.5 ns +- 1.6 ns: 1.24x faster (-19%) $ ./python -m pyperf timeit --compare-to ./python-master -s 'import string; a=dict.fromkeys(string.ascii_lowercase); b=dict.fromkeys(string.ascii_uppercase)' -- 'dict(a, **b)'
python-master: ..................... 1.41 us +- 0.04 us
python: ..................... 1.53 us +- 0.04 us Mean +- std dev: [python-master] 1.41 us +- 0.04 us -> [python] 1.53 us +- 0.04 us: 1.09x slower (+9%) --- There is some overhead in old dict merging idiom. But it seems reasonable compared to the benefit. LGTM. |
Victor, frozenset is the last basic builtin collection which is not applied to this improvement yet. pyperf compare_to master.json bpo-37207.json
I definitely agree with this opinion. So I ask your opinion before submit the patch. What do you think? |
I would prefer to see a PR to give my opinion :) |
Remaining issue: optimize list(iterable), PR 18928. I reviewed the PR and I'm waiting for Petr. |
All PRs are now merged. Thanks to everybody who was involved in this issue. It's a nice speedup which is always good to take ;-) |
The change to dict() was not covered by the smaller PRs. |
Oh sorry, I missed the dict. |
@vstinner @petr.viktorin I 'd like to experiment dict vector call and finalize the work. |
Definitely! |
+------------------+-------------------+-----------------------------+ +------------------+--------------------+-----------------------------+ +--------------------+---------------------+-----------------------------+ |
@vstinner @petr.viktorin Looks like benchmark showing very impressive result. |
Yes! If you think a patch is ready for review, just submit it. There's not much we can comment on before we see the code :) (I hope that doesn't contradict what your mentor says...) |
When I designed the FASTCALL calling convention, I experimented a new tp_fastcall slot to PyTypeObject to optimize __call__() method: bpo-29259. Results on the pyperformance benchmark suite were not really convincing and I had technical issues (decide if tp_call or tp_fastcall should be called, handle ABI compatibility and backward compatibility, etc.). I decided to give up on this idea. I'm happy to see that PEP-590 managed to find its way into Python internals and actually make Python faster ;-) |
Can we now close this issue? Or does someone plan to push further optimizations. Maybe new issues can be opened for next optimizations? |
Ah, by the way, I also made an attempt to use the FASTCALL calling convention for tp_new and tp_init: bpo-29358. Again, the speedup wasn't obvious and the implementation was quite complicated with many corner cases. So I gave up on this one. It didn't seem to be really worth it. |
As discussed briefly in Mark's PR, benchmarks like this are now slower: ret = dict(**{'a': 2, 'b': 4, 'c': 6, 'd': 8}) Python 3.8: Mean +- std dev: 281 ns +- 9 ns |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: