Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #79700: Bad performance with namespaced nodes due to wrong libxml assumption #11376

Closed
wants to merge 4 commits into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Jun 5, 2023

Fixes https://bugs.php.net/bug.php?id=79700 (that issue description also contains the benchmark!).
The issue talks about a wrong assumption, but I don't think it's a wrong assumption per-se. See also GH-5719 for more context and the emails from https://mail.gnome.org/archives/xml/2020-June/msg00000.html.

The purpose of oldNs AFAIK is actually to store the namespaces which can't be stored on the document nodes. They can't be freed because these nodes may be reused (and they don't refcount or anything). Also: not storing them on the oldNs list will leak memory when all relevant XML nodes are destroyed.

The problem in the original issue is the ever-growing list of duplicate ns nodes.
This PR solves this.

Commit list:

  • Commit 1 Use a prepending strategy instead of appending in dom_set_old_ns() improves performance of the list management (see commit description).
  • Commit 2 Reuse namespaces from doc->oldNs if possible in dom_get_ns() solves the actual problem by reusing ns nodes.
  • Commit 3 adds a test to assert we cannot get cycles in the ns list.
  • Commit 4 adds some comment explaining why we can't get cycles.

Targeting master because of potential impact of risky code changes.
Marking as draft for now because I need to double check this stuff.

@Girgias
Copy link
Member

Girgias commented Jun 6, 2023

Feel free to land the first commit and rebase

Looping to the end of the list is wasteful. We can just put the new
nodes at the front of the list. I don't believe we can fully prepend,
because libxml2 may assume that the xml namespace is the first one, so
we'll put the new ones as the second one.
@nielsdos nielsdos marked this pull request as ready for review June 7, 2023 18:33
@nielsdos
Copy link
Member Author

nielsdos commented Jun 7, 2023

Incidentally, This also fixes the largest performance bottleneck of https://bugs.php.net/bug.php?id=77894

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, I just have one small question. :)

Comment on lines +1525 to +1527
if (xmlStrEqual(nsptr->prefix, (xmlChar *)prefix) && xmlStrEqual(nsptr->href, (xmlChar *)uri)) {
goto out;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because there is nothing to do? As the node is "equal"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Note that the loop variable is the same as the return value. So when we found an exact match in the loop the loop variable is returned.

@nielsdos nielsdos closed this in a38e3c9 Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants