-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default namespace for elements; especially in the context of HTML #296
Comments
A couple of possible ways forward: (a) we define a mode of operation in which the interpretation of unprefixed element names in paths is decided dynamically. Specifically, when this mode of operation is in force, an unprefixed element name matches an element if the element is in the same namespace as the outermost element of the containing document. This is influenced by the HTML "violation" where the default namespace depends on what document you are processing. (But there's potentially a difficulty here with element names used other than in axis steps, for example element names used in types). (b) we define a mode of operation in which unprefixed element names in axis steps match on local name only. This is something of a radical departure, but I introduced it with the Saxon Gizmo tool which is designed for interactive (and therefore informal) use, and it works very well in that environment. It basically says, if you care about the namespace, use a prefix, and if you don't, just use the local name. I think that probably meets many users' expectations. Even in the rare cases where the same local name is used with multiple namespaces, they often have a semantic relationship, and there's no harm in "//title" selecting any of them. In particular it's better to over-select than to under-select, because the former problem is much easier to diagnose and correct. |
Note that in the current drafts I have already made the change that allows the default element namespace to differ from the default type namespace. (There was really no need for them ever to be coupled. but decoupling them creates some backwards compatibility issues that the draft spec addresses.) Allowing a setting of "any" for the default element namespace is not difficult. I have only found one place that needs special attention: a schema element test |
That would help a lot to work with HTML (a) might be confusing if the outermost element has a prefix. Or if the namespace is redefined, (b) is then easier to use A third option could be prefix only matching where it checks the prefix but ignores the namespace url. |
The idea of treating prefixes as significant goes against the grain simply because we've spent so many years educating people to treat the choice of prefix as insignificant, it would cause great confusion to reverse that. I'm not going to defend the orthodox wisdom because I've always been highly critical of the way namespaces are done, but we need to be very careful to avoid making matters worse. |
Note that the spec changes to separate the element and type namespaces has been reverted following comittee review of the behaviour. That would need reinstating or revising to make work with the HTML/browser-like XPath matching rules that ignore the element namespace. |
(This is https://html.spec.whatwg.org/#interactions-with-xpath-and-xslt, to give the link here) |
@gsnedders Yes, we're well aware of the "wilful violation": that's the crux of this issue, mentioned in the original issue description. My preferred approach is to have a mode of operation where an unprefixed name in a name test is interpreted as |
I was just… making sure we actually had the link to the relevant part of the spec in the issue, nothing more, rather than everyone reading this having to dig up the relevant section. |
My preferred solution to this is as follows. Currently the default namespace for elements and types can be either a namespace URI or absent. I propose that it can take an additional setting, "auto". If the value is "auto", then:
I'm tempted also to suggest that for types, when "auto" is set, an unprefixed name T should mean The surface syntax can be XQuery: For XPath the setting would typically be controlled by the host language API. A browser-based API optimized for HTML could well choose to make this the default, or it could do something akin to the "wilful violation" by making the default depend on whether the context node is XML or HTML. |
There can be little doubt that the fact that an unprefixed name in XPath fails to select an unprefixed element in the source document is one of the major gotcha's, causing massive bewilderment to all newbie users.
The XPath 2.0 solution of using a default element namespace in the static context is a partial solution; its main drawback is that it doesn't help the newbies who didn't know about the problem or its solution.
The HTML "living standard" introduces a "wilful violation" of the XPath 1.0 spec to address the issue. Given that most elements in an HTML DOM will be in the XHTML namespace, it states:
It then adds a note which is blatantly untrue:
Since the XPath 2.0 facility picks up the default namespace from the static context, while the HTML "wilful violation" picks it up dynamically from a property of the context node (namely "being from an HTML DOM") there is no way these can be considered equivalent.
(Note also, there's a significant ambiguity in the "wilful violation" rules: what exactly is the "context node" that determines this behaviour? I think they're suggesting it is the context node at the point of XPath API invocation, not the context node for the specific axis step. This makes it rather unclear how the rule is supposed to apply to XSLT. And: if an XSLT stylesheet creates a temporary tree with nodes in the XHTML namespaces, do we consider those nodes as being "from an HTML DOM"?)
Nevertheless, the intent of the "violation" is worthy, and it would be nice if we can find a solution to this problem that works both for HTML and for other vocabularies.
Our current proposal for fn:parse-html is that HTML elements should go in the XHTML namespace and this means that users familiar with XPath 1.0 implementations in the browser will trip over this problem. A lot.
The text was updated successfully, but these errors were encountered: