New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize wrapper descriptors using FASTCALL #75724
Comments
Attached pull request adds a fastpath for wrapper descriptors to use the FASTCALL calling convention. It's a follow up of bpo-31410 and all my work on FASTCALL during Python 3.6 and 3.7 development cycles. Microbenchmark: ./python -m perf timeit -s 'import array; obj=array.array("b"); wrap=array.array.__len__' 'wrap(obj)' Result: haypo@selma$ ./python -m perf compare_to ref.json patch.json It removes 31 nanoseconds on such very fast C function, array_length(). Attached PR is still a work-in-progress. First I would like to know if it's worth it because working on polishing the actual code. |
I don't know. What's the point of optimizing |
Right, type.method(self) is less than common than self.method(). I looked at the stdlib. I found that the following method are called using wrapper descriptors:
But it seems like such calls are rare compared to other kinds of function calls. -- By the way, _PyMethodDescr_FastCallKeywords() is only called from call_function() in Python/ceval.c. It's not used in Objects/call.c. Maybe we should use it there as well? It seems like this is a question about tracing. But maybe we can copy/paste the code from call_function()? |
It optimizes the same cases as bpo-31410. bpo-31410 removed a half of the overhead for wrapper descriptors, Victor's patch removes the remaining half. Actually it makes calling the descriptor faster than calling the corresponding builtin! But bpo-31410 changes were simple, just few lines, and PR 3685 looks much more complex. Are non-fast wrappers still needed? If the patch replaces the code instead of adding it this would decrease its cost. AFAIK the only descriptors that should support non-fast calling convention are __new__, __init__ and __call__. As for practicality, this change should slightly speed up the code that directly calls __eq__, __lt__. For example classes decorated with total_ordering. Victor, can you provide microbenchmarks with more practical code? |
This seems like a straight-forward win. I don't think we really need to see microbenchmarks before going forward with this one. |
The most complex slots are __new__, __init__ and __call__ because they It's my 3rd or 4th attempt to optimize new/init/call :-) My previous Here the scope is much small and backward compatibilty shouldn't affect us. I will try to find a way to optimize these slots as well and complete my |
These slots are the only slots that accept keyword arguments and arbitrary number of positional parameters. Other slots can use fast calls. |
It seems like these specific descriptors are rare, so the added complexity is not worth it. I close my issue as rejected. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: