-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use FASTCALL for collections.deque methods: index, insert, rotate #73638
Comments
Attached patch changes index(), insert() and rotate() functions of the collections.deque type to use FASTCALL calling convention. I chose to only modify these functions since they use METH_VARARGS which requires to create a temporary tuple, whereas other functions use METH_NOARGS or METH_O which is already fast ;-) I know that Raymond, maintainer of the collections module, is not a big fan of Argument Clinic ;-) So I wrote the minimum change and chose to not use Argument Clinic yet. By the way, the index() method has the signature "D.index(value, [start, [stop]])" which is not supported by Argument Clinic yet: see issue bpo-29299. For these reasons, I propose to wait to convert collections.deque to Argument Clinic, it can be done later. Ok, now the cool part: it makes these methods faster ;-)
$ ./python -m perf timeit -s 'import collections; d=collections.deque()' 'd.rotate()' --compare-to=../default-ref/python Median +- std dev: [ref] 70.5 ns +- 0.9 ns -> [patch] 64.2 ns +- 0.3 ns: 1.10x faster (-9%) $ ./python -m perf timeit -s 'import collections; d=collections.deque()' 'd.rotate(1)' --compare-to=../default-ref/python Median +- std dev: [ref] 107 ns +- 1 ns -> [patch] 86.2 ns +- 1.1 ns: 1.24x faster (-20%) $ ./python -m perf timeit -s 'import collections' 'd=collections.deque(); d.insert(0, None); d.insert(1, None); d.insert(2, None); d.insert(3, None); d.insert(4, None)' --compare-to=../default-ref/python -p3 Median +- std dev: [ref] 699 ns +- 6 ns -> [patch] 591 ns +- 5 ns: 1.18x faster (-15%) $ ./python -m perf timeit -s 'import collections; d=collections.deque((None,))' 'd.index(None)' --compare-to=../default-ref/python Median +- std dev: [ref] 115 ns +- 1 ns -> [patch] 92.5 ns +- 0.8 ns: 1.24x faster (-19%) |
I think _PyArg_NoStackKeywords() should be called before _PyArg_ParseStack(), otherwise this can cause not correct error messages. |
deque-2.patch calls _PyArg_NoStackKeywords() before _PyArg_ParseStack(). |
The patch is simple and technically it looks correct, but I'm not a fan of using FASTCALL outside of Argument Clinic at this stage. The API still can be changed. Fortunately only three deque methods could have a benefit from using FASTCALL. |
Over this looks good. Just one other minor tweak (one that has served me well elsewhere) would be to bypass the cross-module function call with a cheap (near zero cost) register variable test: if (kwnames != NULL && !_PyArg_NoStackKeywords("rotate", kwnames)) {
return NULL;
} |
This has just been optimized by Serhiy, change 82d1c8d15e18. So, is deque-2.patch good now? |
Yes, go ahead and apply. |
New changeset 1c048539200c by Victor Stinner in branch 'default': |
Raymond: "Yes, go ahead and apply." Great, done. Thanks for the reviews Serhiy and Raymond. As I wrote, you can consider to use Argument Clinic later, but there is no urgency for that ;-) I close the issue. |
New changeset 55949f988dc1b943796d9852cc4d588c58cc4255 by Victor Stinner in branch 'master': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: