New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize hook calling a bit #280
Conversation
I forgot to mention, it is intended to be reviewed commit-by-commit. |
@bluetech oh nice! I'll try to look at this over the weekend 👍 |
2d5426b
to
c7b3ba9
Compare
A dict keys view supports set-like operations.
This check is quite expensive, try to reduce its overhead.
PluginManager adds an adapter lambda to the hook call path. This adds overhead and makes the stack trace more messy. Change the call convention such that the adaptation is not needed, and remove the lambda.
Rebased, doesn't depend on other PRs now. I also remoeved one of the micro-optimization commits, on second thought it's probably not worth it and is distracting. |
return self._hookexec(self, self.get_hookimpls(), kwargs) | ||
# This is written to avoid expensive operations when not needed. | ||
if self.spec: | ||
for argname in self.spec.argnames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is looping through the args everytime hoping for none missing from the call faster then just always checking what's missing using the set
difference?
Seems like in average case this will be slower (assuming calls are written correctly most of the time)?
I'm not actually sure I can think of a case where this is faster?
I don't think a for
loop will ever be faster then a set
difference but I could be wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an unscientific benchmark (Python 3.8, Archlinux), also added a variant which uses issubset
. Only checks the happy path.
def old(argnames, kwargs):
if argnames:
notincall = set(argnames) - kwargs.keys()
if notincall:
pass
def old_subset(argnames, kwargs):
if argnames:
if not set(argnames).issubset(kwargs.keys()):
pass
def new(argnames, kwargs):
for argname in argnames:
if argname not in kwargs:
break
import timeit
kwargs = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}
argnames = list(kwargs)
print("old: ", timeit.timeit("old(argnames, kwargs)", "from __main__ import old, argnames, kwargs"))
print("old_subset:", timeit.timeit("old_subset(argnames, kwargs)", "from __main__ import old_subset, argnames, kwargs"))
print("new: ", timeit.timeit("new(argnames, kwargs)", "from __main__ import new, argnames, kwargs"))
Output:
old: 0.7920419139554724
old_subset: 0.6716860989108682
new: 0.2587350399699062
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Python 3.8, Archlinux)
You're of my kind 😸
new: 0.2587350399699062
So slick; I guess for
the win 🏄♂️
) | ||
break | ||
|
||
firstresult = self.spec.opts.get("firstresult") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I had in mind :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: includes PR #279, please ignore the duplicate commits (will rebase once that one is settled).This PR start to optimize the hook calling path, mostly for the benefit of pytest.
For this pytest file which we use as a useful benchmark of pytest overhead,
Before:
10752102 function calls (10196645 primitive calls) in 9.270 seconds
After:
10351436 function calls (9896147 primitive calls) in 8.918 seconds
The main change stems from looking at the stack trace of a hook call. Before this PR it was this (pytest often has several of these nested):
This PR removes the
<lambda>
frame, and follow up PRs will (try) to remove the_hookexec
frame and the duplicate_multicall
frame (which is cosmetic).