Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug #75779: DOMElement#setAttribute() does not process prefixed namespace declarations #3012

Closed
wants to merge 1 commit into from

Conversation

DaveRandom
Copy link
Contributor

No description provided.

@weltling
Copy link
Contributor

Looks like what you're trying to fix is already done by DOMElement::setAttributeNS().

Thanks.

@DaveRandom
Copy link
Contributor Author

DaveRandom commented Jan 30, 2018

@weltling it can indeed be done that way, but the current behaviour of setAttribute() is inconsistent in that it does process prefixes for things which aren't ns decls (by proxy; this is the behaviour of xmlSetProp()) and it also does process the bare unprefixed xmlns attribute.

Moreover, currently what happens is that it creates a standard attribute named xmlns:whatever (which is definitely incorrect behaviour) and hasAttribute()/getAttribute() do not report this, because they have special handling for namespace decls. There's an argument that this behaviour of xmlSetProp() is incorrect and should be fixed upstream, but there's a very long lag before this would address the problem that currently exists in PHP, and there would be variable behaviour depending on which libxml version PHP was built against unless we conditionally include the block in this patch at compile time, and that's assuming they even accept the change upstream - although I will look into raising it with them.

If the consensus is that a better approach would be to fail when this is attempted, then I guess that's OK too, but the current behaviour is definitely not correct and the most internally consistent to deal with it is with the approach taken by this patch, IMHO.

@weltling
Copy link
Contributor

@DaveRandom the specification about setAttribute tells

To set an attribute with a qualified name and namespace URI, use the setAttributeNS method.

The namespaces RFC

A namespace is declared using a family of reserved attributes. Such an attribute's name must either be xmlns or have xmlns: as a prefix.

The PHP implementation seems to match both things. A qualified name needs to be declared and then can be used by default. Until i misread it.

Regarding xmlSetProp - do you refer to some upstream issue?

Thanks.

@DaveRandom
Copy link
Contributor Author

@weltling PHP does deal with those things correctly, but it does extra things as well. See this demonstration,

What this patch does is make the behaviour of setAttribute('xmlns:foo') and the behaviour of setAttribute('foo:bar') consistent.

The inconsistency exists because of the behaviour of xmlSetProp(), in that it accepts prefixed attribute names and attaches a namespace to the node based on the prefix, but it doesn't handle namespace declarations, rather it creates an attribute node.

@weltling
Copy link
Contributor

Based on the previous conversation and further digging

but the current behaviour of setAttribute() is inconsistent in that it does process prefixes for things which aren't ns decls (by proxy; this is the behaviour of

Confirmed, maybe that has to raise an error if the namespace is not declared. Likely only should be done in master.

currently what happens is that it creates a standard attribute named xmlns:whatever (which is definitely incorrect behaviour)

I'm not sure that's an issue. Talking about xmlns:, that's a standard builtin namespace. I was reading an older RFC earlier, the latest version tells same, also noted before

A namespace (or more precisely, a namespace binding) is declared using a family of reserved attributes. Such an attribute's name must either be xmlns or begin xmlns:. These attributes, like any other XML attributes, may be provided directly or by default.

and also

The prefix xmlns is used only to declare namespace bindings and is by definition bound to the namespace name http://www.w3.org/2000/xmlns/. It MUST NOT be declared .

From that point, a code looks legit and should continue to function

$element->setAttribute("xmlns:test", "http://some/where");
$element->setAttribute("test:foo", "something");

which should be equivalent to

$element->setAttributeNS("http://some/where", "test:foo", "something");

From that point, the usage of setAttributeNS doesn't look correct in your example code.

About the attribute counting - from that point it also seems logic. As in the first snippet, two attributes are added, so ->attributes == 2. In the second snippet, only one attribute is added. It tells the difference in the API, but doesn't in the resulting XML. To be sure, i'm going to verify this in yet another implementation, Python or Java perhaps.

Thanks.

@DaveRandom
Copy link
Contributor Author

DaveRandom commented Jan 30, 2018

I'm not sure that's an issue. Talking about xmlns:, that's a standard builtin namespace

The key point here (I think) is that libxml does not maintain namespace declarations as attributes, but PHP has added a layer to allow them to be inspected via getAttribute()/setAttribute() as if they were:

$xml = <<<XML
<?xml version="1.0" ?>
<root xmlns:foo="uri:foo" />
XML;

$doc = new DOMDocument();

// This is a proxy for libxml's xmlParseDocument(), so does not rely on PHP having
// done "the correct thing"
$doc->loadXML($xml);

// These have special-case handling at the PHP level to redirect the lookup to the libxml
// namespace store
// https://lxr.room11.org/xref/php-src%407.2/ext/dom/element.c#285
var_dump($doc->documentElement->hasAttribute('xmlns:foo')); // bool(true)
var_dump($doc->documentElement->getAttribute('xmlns:foo')); // string(7) "uri:foo"

// This directly retrieves the libxml attribute store, which does not include the
// namespace declarations
// https://lxr.room11.org/xref/php-src%407.2/ext/dom/node.c#569
var_dump(count($doc->documentElement->attributes)); // int(0)

From that point, a code looks legit and should continue to function

Because of the above, creating a libxml attribute node with the name xmlns:whatever, instead of registering the namespace prefix with the node, doesn't make sense. This is more about how PHP interacts with libxml's API than how it conforms to the DOM standard, in the sense that what happens at the moment still produces the correct output from e.g. saveXML() (the attributes are there so they are serialized correctly in the output), but doesn't play nice with things like *AttributeNS() and XPath, because libxml does not have the association with a namespace URI.

Confirmed, maybe that has to raise an error if the namespace is not declared

I also considered this, but I feel like this is going to break a lot of code which is using DOM to build XML documents, unless setAttribute("xmlns:test", "http://some/where") registers a namespace.

@DaveRandom
Copy link
Contributor Author

When I ran the code in the above comment I have also notices that the behaviour of ->attributes around this issue has changed between 7.1 and 7.2: https://3v4l.org/q3VqZ

@weltling
Copy link
Contributor

Yeah, hasAttribute and getAttribute might have some inconsistency. Initially we were talking about the document creation through the API. If there's a discrepancy when a document is not created, but loaded from an existing one, that should be handled, too. As for me, the API part when a document is created looks ok except setAttribute with a non existent namespace.

I'm checking the creation part with Java now, to get some comparison point. If that plays with xmlns:... which can be provided directly as an attribute as per RFC, then probably we're fine in PHP, too.

The discrepancy between 7.1 and 7.2 as illustrated is a BC breach obviously, so were worth to investigate. This might be related to another change in the core, though. Given the libxml2 version used is same.

Thanks.

@weltling
Copy link
Contributor

I've created a gist with the current experiments in Java here https://gist.github.com/weltling/da3f15d6acd23b2376097cf6c472e912 . Please note, that the DOM specification uses Java as a reference implementation.

So far, i don't see the difference in the behavior, also in regard to hasAttribute, etc.. The only issue in PHP is that attributes with undefined namespace might not throw an error. The documentation explicitly mentions, that namespace attributes are usual attributes and that setAttributeNS should be used for qualified names. At this point, this patch seems to change the expected behavior, thus is not correct.

Thanks.

@DaveRandom
Copy link
Contributor Author

OK, I'm going to close this PR while I re-read the spec (again) and make sure I fully understand it. I think you are probably right, in that what I am trying to do is not allowed, and the reason I thought it should be allowed is because PHP is not throwing an error when it should be.

If I have any further changes I will open them as PRs here so they can be reviewed. Thanks very much for your input and the work you have put in to reviewing this :-)

@DaveRandom DaveRandom closed this Jan 31, 2018
@weltling
Copy link
Contributor

Thanks for the discussion! Perhaps, the behavior change between 7.1 and 7.2 could be investigated, if one has time. Wondering, why it didn't popup till today :/

Thanks.

@DaveRandom
Copy link
Contributor Author

DaveRandom commented Jan 31, 2018

The test suite for ext/dom is quite lacking at present, and I have never had cause to actually use ->attributes in the real world I don't think, so I'm not surprised things like that would be missed. Plus the result of iterating it is still the same, the only difference is the count.

I'm also reasonably certain that the fact that namespace declarations don't appear in ->attributes is wrong in terms of the DOM spec, the fact that it's like that is an artefact of libxml rather than by-design.

I'm going to look at porting the DOM compliance test suite to phpt, but that is a huge job. I will see if I can come up with any tooling to automate it.

@cmb69
Copy link
Member

cmb69 commented Jan 11, 2019

When I ran the code in the above comment I have also notices that the behaviour of ->attributes around this issue has changed between 7.1 and 7.2: https://3v4l.org/q3VqZ

This is because DOMNamedNodeMap implements Countable only as of PHP 7.2 (see also https://3v4l.org/UTcmn).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants