-
Notifications
You must be signed in to change notification settings - Fork 20
Description
There can be little doubt that the fact that an unprefixed name in XPath fails to select an unprefixed element in the source document is one of the major gotcha's, causing massive bewilderment to all newbie users.
The XPath 2.0 solution of using a default element namespace in the static context is a partial solution; its main drawback is that it doesn't help the newbies who didn't know about the problem or its solution.
The HTML "living standard" introduces a "wilful violation" of the XPath 1.0 spec to address the issue. Given that most elements in an HTML DOM will be in the XHTML namespace, it states:
If the QName has no prefix and the principal node type of the axis is element, then the default element namespace is used. Otherwise if the QName has no prefix, the namespace URI is null. The default element namespace is a member of the context for the XPath expression. The value of the default element namespace when executing an XPath expression through the DOM3 XPath API is determined in the following way:
If the context node is from an HTML DOM, the default element namespace is "http://www.w3.org/1999/xhtml".
Otherwise, the default element namespace URI is null.
It then adds a note which is blatantly untrue:
This is equivalent to adding the default element namespace feature of XPath 2.0 to XPath 1.0, and using the HTML namespace as the default element namespace for HTML documents. It is motivated by the desire to have implementations be compatible with legacy HTML content while still supporting the changes that this specification introduces to HTML regarding the namespace used for HTML elements, and by the desire to use XPath 1.0 rather than XPath 2.0.
Since the XPath 2.0 facility picks up the default namespace from the static context, while the HTML "wilful violation" picks it up dynamically from a property of the context node (namely "being from an HTML DOM") there is no way these can be considered equivalent.
(Note also, there's a significant ambiguity in the "wilful violation" rules: what exactly is the "context node" that determines this behaviour? I think they're suggesting it is the context node at the point of XPath API invocation, not the context node for the specific axis step. This makes it rather unclear how the rule is supposed to apply to XSLT. And: if an XSLT stylesheet creates a temporary tree with nodes in the XHTML namespaces, do we consider those nodes as being "from an HTML DOM"?)
Nevertheless, the intent of the "violation" is worthy, and it would be nice if we can find a solution to this problem that works both for HTML and for other vocabularies.
Our current proposal for fn:parse-html is that HTML elements should go in the XHTML namespace and this means that users familiar with XPath 1.0 implementations in the browser will trip over this problem. A lot.