New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-104078: Improve performance of PyObject_HasAttrString #104079
Conversation
How does the performance comparison look for an attribute that does exist? |
it's about 1.2x faster
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This provides a significant performance improvement in both attribute-exists and (especially) attribute-doesn't-exist cases, without making the code any harder to understand.
There's a bit more duplication with PyObject_GetAttrString
, but these two methods are right next to each other and it's logical for them to have parallel structure.
It's also logical -- and perhaps more likely to avoid behavioral inconsistency -- for PyObject_HasAttrString
to go through PyObject_HasAttr
.
I wonder why you are seeing those speedups. The PR essentially looks like it's manually inlining the code - something the compiler would normally do by itself where needed. Also: The test will only check the second part of the PR, since Python instances have the tp_getattro set, but not tp_getattr. Related to this, I believe the original and new code for checking the tp_getattr slot are not quite correct, since in all other lookup code, tp_getattro is checked and used before tp_getattr. I don't remember the details around what is supposed to happen when both are set, but if they are, the PyObject_HasAttrString() code will use a different path than e.g. PyObject_HasAttr(). |
The important change in the PR is that it makes
Yes, I noticed this, but a) I'm not sure how common it is for types to set
I wondered about this too. I assume the reason this was done this way originally is because |
Thanks for added insights, @carljm. I agree that it's better for |
fixes gh-104078
microbenchmark on main
microbenchmark with PR
about 3.3X faster