Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-31336: Speed up type creation, which is highly dominated by slow dict lookups. #3279

Merged
merged 20 commits into from Oct 1, 2017

Conversation

scoder
Copy link
Contributor

@scoder scoder commented Sep 4, 2017

This gives me around 20% better performance for class Test: pass. Admittedly, that's a micro-benchmark, but the change is obvious enough to not merit staying slow.

https://bugs.python.org/issue31336

return NULL;
}
}

res = NULL;
/* keep a strong reference to mro because type->tp_mro can be replaced
during PyDict_GetItem(dict, name) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment.

@@ -2994,7 +3010,7 @@ _PyType_Lookup(PyTypeObject *type, PyObject *name)
assert(PyType_Check(base));
dict = ((PyTypeObject *)base)->tp_dict;
assert(dict && PyDict_Check(dict));
res = PyDict_GetItem(dict, name);
res = _PyDict_GetItem_KnownHash(dict, name, hash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyDict_GetItem() always clears errors. Call PyErr_Clear() if _PyDict_GetItem_KnownHash() returns NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see well the difference between PyDict_GetItem() and _PyDict_GetItem_KnownHash() in term of performance. Maybe the difference is that PyDict_GetItem() calls PyErr_Fetch/PyErr_Restore. In that case, PyDict_GetItemWithError() would have the same speed, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using PyDict_GetItemWithError() adds only a half of the speed up.

The difference is:

  1. PyDict_GetItem() calls PyThreadState_GET/PyErr_Fetch/PyErr_Restore.
  2. PyDict_GetItem() checks for str and reads a hash every time. I don't understand well why this affects performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KnownHash is extremely short in comparison and probably gets inlined and streamlined with LTO. Substantially less branching.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I didn't expect the Spanish Inquisition!. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@scoder
Copy link
Contributor Author

scoder commented Sep 4, 2017

Nice catches, but certainly, I didn't expect the Spanish Inquisition!

@bedevere-bot
Copy link

Nobody expects the Spanish Inquisition!

@serhiy-storchaka: please review the changes made to this pull request.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use the _Py_IDENTIFIER API rather than using _PyDict_GetItem_KnownHash().

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I didn't expect the Spanish Inquisition!. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Stefan Behnel added 2 commits September 5, 2017 19:07
…r ceases to be a valid base type, there'll probably be larger code sections to change than this one.
@scoder
Copy link
Contributor Author

scoder commented Sep 5, 2017

I agree that using the _Py_IDENTIFIER API would be nice, but changing the whole setup is more work than I would currently like to invest into this. This can still be done later. My changes do not make it any harder.

Instead, I followed Naokis idea of spliting _PyType_Lookup() into two functions, one to do the lookup and one that handles the method caching and calls the other on misses.

@@ -6958,8 +6990,14 @@ update_one_slot(PyTypeObject *type, slotdef *p)
return p;
}
do {
descr = _PyType_Lookup(type, p->name_strobj);
descr = _PyType_LookupUncached(type, p->name_strobj, &error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a comment explaining why the uncached lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure enough. Added.

Stefan Behnel added 2 commits September 6, 2017 07:19
…common in Python 3 now.

Also make the step from "return NULL" error handling to "goto error" reference cleanup explicit.
nbases = 1;
}
else
Py_INCREF(bases);
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any effect of this change? I tested class C: pass and class C(object): pass and didn't see any difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this part a cleanup that makes it clearer what operations are needed when, and how error cases are dealt with.
I didn't measure any speed difference either.

if (res != NULL)
break;
if (PyErr_Occurred())
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaks mro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if (!PyUnicode_CheckExact(name) ||
(hash = ((PyASCIIObject *) name)->hash) == -1)
{
hash = PyObject_Hash(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyObject_Hash() can call user code. mro is a borrowed reference here, it should be increfed before calling PyObject_Hash().

Or calculate the hash before getting mro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll move the hash() call up.

/* Internal API to look for a name through the MRO, bypassing the method cache.
This returns a borrowed reference, and might set an exception! */
static PyObject *
_PyType_LookupUncached(PyTypeObject *type, PyObject *name, int *error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local functions usually use different name convention. All lower characters and no Py prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to hint at the name of the existing function. A different name would be less clear, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this rather adds a confusion.

Copy link
Contributor Author

@scoder scoder Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the function to find_name_in_mro().

}

Py_hash_t hash;
*error = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you could return a special value for signalling error. E.g. (PyObject *)(-1). Test what is faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can set error to -1 on exceptions. That will distinguish all three cases: ok, error with exception, error without exception. Only -1 will need a call to PyErr_Clear() then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike things like (PyObject *)(-1) because I wouldn't want to assume that they are not valid pointer values on any system whatsoever.

Distinguish between "error" and "error with exception" cases in _PyType_LookupUncached().
Fix a reference leak of "mro" on lookup errors.
Resolves issues found by Serhiy Storchaka.
@scoder
Copy link
Contributor Author

scoder commented Sep 10, 2017

@serhiy-storchaka: thank you for your excellent feedback.
I applied the changes and couldn't see a performance degradation from them.

/* Internal API to look for a name through the MRO, bypassing the method cache.
This returns a borrowed reference, and might set an exception! */
static PyObject *
_PyType_LookupUncached(PyTypeObject *type, PyObject *name, int *error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this rather adds a confusion.

in a context that propagates the exception out.
*/
PyErr_Clear();
PyType_Ready(type) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either move PyType_Ready() to the previous line (if the resulting line will be not too long), or indent it as in the current code and move { to a new line for readability (new PEP 7 rule).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned it up a little.

in a context that propagates the exception out.
*/
if (error == -1)
PyErr_Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyErr_Clear() always is called after _PyType_LookupUncached() if error == -1. Why not call it inside _PyType_LookupUncached()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is which interface is better: should the function itself always swallow any exceptions, or should the callers decide how to deal with them? The comments in the original _PyType_Lookup() implementation suggest that others have already been as unhappy as me about the interface of that function, so why keep making people unhappy?

the same type will call it again -- hopefully
in a context that propagates the exception out.
*/
if (error == -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add braces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@scoder
Copy link
Contributor Author

scoder commented Sep 13, 2017

The XMLParser.__init__() code in _elementtree.c does this:

    self->handle_start = PyObject_GetAttrString(target, "start");
    self->handle_data = PyObject_GetAttrString(target, "data");
    self->handle_end = PyObject_GetAttrString(target, "end");
    self->handle_comment = PyObject_GetAttrString(target, "comment");
    self->handle_pi = PyObject_GetAttrString(target, "pi");
    self->handle_close = PyObject_GetAttrString(target, "close");
    self->handle_doctype = PyObject_GetAttrString(target, "doctype");
    PyErr_Clear();

In the failing test, it should find close but not pi. Thus, it looks up close with a live AttributeError set from the pilookup. Since _PyDict_GetItem_KnownHash() may or may not set an exception, we have to check for a live exception after calling it, and that finds the old exception of the last attribute lookup and decides that its own lookup failed.

The correct place to fix this is obviously _elementtree.c (please do), but if one module has this bug, it's unlikely to be the only one. What do you think? To me, it would feel wrong to allow this misbehaviour inside of CPython by explicitly handling it somehow.

@scoder
Copy link
Contributor Author

scoder commented Sep 13, 2017

That being said, this can obviously be made to work again by ignoring all exceptions also in the new helper function, just as _PyType_Lookup() previously did. Sounds wrong to do that, but it keeps things working that worked before.

@scoder
Copy link
Contributor Author

scoder commented Sep 13, 2017

I pushed a fix that eats lookup exceptions, exactly like _PyType_Lookup() previously did.

…g lookup exceptions, just like "_PyType_Lookup()" did previously.
@@ -0,0 +1,2 @@
Speed up class creation by reducing overhead in the necessary special method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add speed up estimation: 10-20%

@serhiy-storchaka
Copy link
Member

What was the cause of this failure? Could this failure be fixed by fixing ElementTree?

@scoder
Copy link
Contributor Author

scoder commented Sep 13, 2017

See my comment above. Yes, ET should definitely be fixed. I pushed #3545.

@scoder
Copy link
Contributor Author

scoder commented Sep 13, 2017

Feel free to revert 4efde8e if you think that fixing cElementTree is enough here.

Stefan Behnel added 2 commits September 14, 2017 10:12
…ce eating lookup exceptions, just like "_PyType_Lookup()" did previously."

This reverts commit 4efde8e.
…) from functions that swallow live exceptions.
@scoder
Copy link
Contributor Author

scoder commented Sep 14, 2017

I've reverted that change and added assert()s instead. Fixing the behaviour of _PyType_Lookup() is a separate issue (I've created a ticket in the tracker).

@scoder
Copy link
Contributor Author

scoder commented Sep 14, 2017

Note that both AppVeyor and Travis will be happy again once #3545 is merged.

@vstinner vstinner dismissed their stale review September 15, 2017 12:42

I don't recall why I was opposed to the change, but it changed a lot in the meanwhile :-)

@@ -2994,7 +3010,7 @@ _PyType_Lookup(PyTypeObject *type, PyObject *name)
assert(PyType_Check(base));
dict = ((PyTypeObject *)base)->tp_dict;
assert(dict && PyDict_Check(dict));
res = PyDict_GetItem(dict, name);
res = _PyDict_GetItem_KnownHash(dict, name, hash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see well the difference between PyDict_GetItem() and _PyDict_GetItem_KnownHash() in term of performance. Maybe the difference is that PyDict_GetItem() calls PyErr_Fetch/PyErr_Restore. In that case, PyDict_GetItemWithError() would have the same speed, no?

if (error == -1) {
/* It is unlikely by not impossible that there has been an exception
during lookup. Since this function originally expected no errors,
we ignore them here in order to keep up the interface. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that "Since this function originally expected no errors" matters here. Why not reporting the exception to the caller and handle it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a patch that cleans up much of the mess around _PyType_Lookup(). See https://bugs.python.org/issue31465

@vstinner
Copy link
Member

"KnownHash is extremely short in comparison and probably gets inlined and streamlined with LTO. Substantially less branching." oh nice, good to know!

if (mro == NULL) {
*error = 1;
return NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the only non-exception raising error case left, I actually wonder if this is really an error case. If there is no MRO in a ready-ied type, isn't that just fine from the point of view of a lookup?

@scoder
Copy link
Contributor Author

scoder commented Sep 18, 2017

From my POV, this is ready for merging.
All further changes regarding the general exception handling in _PyType_Lookup() are tracked by bpo-31465 and #3616.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just proposed a minor coding style change.

The change enhances error handling, nice.

I don't understand everything, but I trust Python test suite to make sure that the change doesn't break anything :-)

if (bases == NULL)
goto error;
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 7 nitpick: when you modify code, it's better to add { ... } to if blocks.

@serhiy-storchaka serhiy-storchaka merged commit 2102c78 into python:master Oct 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants