Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Reparenting of nodes can get wrong namespaces when prefixes are reused #2494

Open
TreyE opened this issue Apr 3, 2022 · 3 comments · May be fixed by #2495
Open

[bug] Reparenting of nodes can get wrong namespaces when prefixes are reused #2494

TreyE opened this issue Apr 3, 2022 · 3 comments · May be fixed by #2495

Comments

@TreyE
Copy link

TreyE commented Apr 3, 2022

If I reuse namespace prefixes for different nodes (such as when I use a blank XML namespace, i.e. xmlns=), I can get the wrong namespace when the node is reparented.

Code that Causes the Error

#! /usr/bin/env ruby

require 'nokogiri'

doc = Nokogiri::XML("<root xmlns=\"urn:something\"><child1 xmlns=\"urn:something_else\"/><child2/></root>")

node = doc.at_xpath("//ns:child2",{"ns" => "urn:something"})
new_parent = doc.at_xpath("//ns:child1",{"ns" => "urn:something_else"})
new_parent.add_child(node)

puts doc.to_xml

Note that this will now output incorrect namespaces on the "child2" node, like the following:

<?xml version="1.0"?>
<root xmlns="urn:something">
  <child1 xmlns="urn:something_else">
    <child2/>
  </child1>
</root>

Note that the "child2" node has now been incorrectly 'moved' into the "urn:something_else" namespace.

A more correct output would be something similar to:

<?xml version="1.0"?>
<root xmlns="urn:something">
  <child1 xmlns="urn:something_else">
    <child2 xmlns="urn:something"/>
  </child1>
</root>

Source of the Behaviour

I've tracked this down to static void relink_namespace(xmlNodePtr reparented) in ext/nokogiri/xml_node.c.
It seems this code doesn't check for nodes above the current node possibly 'squatting' on a namespace, for example:

  1. The node being relinked has an ancestor node that has a matching, 'correct', namespace definition.
  2. The node being relinked also has an ancestor node that has a namespace definition that matches by prefix, but NOT by href.
  3. The node with the with the incorrect definition is not only an ancestor of the node being relinked, it is also a descendent of the node with the 'correct' definition - meaning that traversing up the tree you will hit the node with the incorrect definition first.

Tests and PRs

A test which reproduces this issue an an associated correction are provided in PR #2495.

@TreyE TreyE added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Apr 3, 2022
TreyE added a commit to TreyE/nokogiri that referenced this issue Apr 3, 2022
@TreyE TreyE linked a pull request Apr 3, 2022 that will close this issue
TreyE added a commit to TreyE/nokogiri that referenced this issue Apr 3, 2022
TreyE added a commit to TreyE/nokogiri that referenced this issue Apr 3, 2022
TreyE added a commit to TreyE/nokogiri that referenced this issue Apr 3, 2022
TreyE added a commit to TreyE/nokogiri that referenced this issue Apr 3, 2022
@flavorjones
Copy link
Member

Thank you for opening this as well as submitting a PR for it!

I want to spend some time understanding the bug report and re-building context around the reparenting code, which has always been complex and tricky to get right. Please give me a day or two! Thanks.

@TreyE
Copy link
Author

TreyE commented Apr 4, 2022

@flavorjones NP! Thanks for getting back to me. This is my first major C submission here and understand I might not have gotten everything exactly right - so take as much time as you need.

flavorjones pushed a commit to TreyE/nokogiri that referenced this issue Aug 28, 2022
@flavorjones flavorjones added topic/namespaces state/pr-under-review and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Nov 22, 2022
@flavorjones
Copy link
Member

For whatever it's worth, I think the behavior we should be striving for is described in #1200 (comment)

And I think we probably need to make whatever fix we adopt be in v2.0 and call it a breaking change (which it will be for some use cases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants