New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #79700: Bad performance with namespaced nodes due to wrong libxml assumption #11376
Conversation
Feel free to land the first commit and rebase |
Looping to the end of the list is wasteful. We can just put the new nodes at the front of the list. I don't believe we can fully prepend, because libxml2 may assume that the xml namespace is the first one, so we'll put the new ones as the second one.
Incidentally, This also fixes the largest performance bottleneck of https://bugs.php.net/bug.php?id=77894 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, I just have one small question. :)
if (xmlStrEqual(nsptr->prefix, (xmlChar *)prefix) && xmlStrEqual(nsptr->href, (xmlChar *)uri)) { | ||
goto out; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because there is nothing to do? As the node is "equal"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Note that the loop variable is the same as the return value. So when we found an exact match in the loop the loop variable is returned.
Fixes https://bugs.php.net/bug.php?id=79700 (that issue description also contains the benchmark!).
The issue talks about a wrong assumption, but I don't think it's a wrong assumption per-se. See also GH-5719 for more context and the emails from https://mail.gnome.org/archives/xml/2020-June/msg00000.html.
The purpose of oldNs AFAIK is actually to store the namespaces which can't be stored on the document nodes. They can't be freed because these nodes may be reused (and they don't refcount or anything). Also: not storing them on the oldNs list will leak memory when all relevant XML nodes are destroyed.
The problem in the original issue is the ever-growing list of duplicate ns nodes.
This PR solves this.
Commit list:
Use a prepending strategy instead of appending in dom_set_old_ns()
improves performance of the list management (see commit description).Reuse namespaces from doc->oldNs if possible in dom_get_ns()
solves the actual problem by reusing ns nodes.Targeting master because of potential impact of risky code changes.
Marking as draft for now because I need to double check this stuff.