New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EBV 4.0 #817
Comments
Many of our users will have spent many frustrated hours learning the Javascript rules, and I think it's important we remain consistent with them. At present we are well aligned - except that in JS, if it's not one of a small number of falsy things, then its truthy, whereas with our rules things like empty arrays and maps are errors rather than truthy. I'm really not keen on making the rules more complicated especially if it leads to outcomes that are different from JS. |
I don’t see those similarities between JavaScript and XPath. The only thing that is close is the treatment of strings, numbers and booleans, and we would keep this anyway. The main difference, and the one that regularly causes confusion, is the varying treatment of node sequences and other sequences, and it’s hard for me to grasp why this seems necessary today. The confusing examples that I’ve stated in the initial comment have no counterpart in JavaScript, and I’m convinced we could simplify the rules here by treating all items of a sequence identically, and achieving a more intuitive result. In addition, we can also sort out different behavior across implementations for heterogeneous sequences:
For function items and arrays (positional, associative), my proposal would bring XPath and JS even closer together, by getting rid of the error message which I believes serves no one in practice. It would be much easier to use |
I'm not sure why you draw out this case as being implementation-dependent. Currently the first case is unambiguously true, the second case is unambiguously an error.
In Javascript any array or object is truthy, regardless of its contents. I'm not sure what you're proposing for arrays and maps, but for sequences you're proposing something very different, and I'm still not sure exactly what. Or what the use cases are. Read Javascript tutorials, and you find people advising everyone to steer clear of this minefield. With XPath too, a lot of people suggest using functions like exists() to avoid relying on the complex EBV rules. If we make them even more complex, there will be even more advice telling users not to go there. |
Sorry for that. Indeed the specification states clearly that it's the first item that’s responsible for the result. I got misled by one implementation (well, not ours) that behaves differently. I think/hope we can agree that it's at least strange that the order of the input defines here what is going to happen. I cannot think of any good reason for the current behavior for sequences of mixed type (apart from maybe historical reasons and algebra with XPath 1.0).
In my initial proposal, I suggested checking the map/array size and returning true or false. I’d be open to the decision to always return true, in alignment with JS. The EBV of function items would always be true (similar to JS).
I hoped that the equivalent XQuery code was self-explanatory. I think it's questionable to base the result on the first item (which can easily change of data is reordered), and to raise errors for sequences… unless the first item is a node. I don't know any other language that behaves similarly. I really don't believe that the proposed rules would make EBV more complex. Quite contrary, I think that the new rules would be more consistent and easier to explain and teach: For each item in the input sequence, there's a well-defined rule to get true or false. If at least one item matches, the EBV is true. |
In XPath 1.0 there were essentially four types: string, number, boolean, and node-set, and EBV was defined for each of them. When the data model was extended in 2.0, the rules had to be compatible with the 1.0 rules, but also to handle mixed sequences, and there was a significant amount of debate on the best way of doing this. One of the concerns, if I remember rightly, was that the revised rules should not make it necessary to read an entire sequence before making a decision (so |
Thanks for the discussion. I think it helps to look at two aspects of the EBV computation separately:
For 1., the current rules are: declare function ebv($input as item()*) as xs:boolean {
if (empty($input)) then false()
else if(head($input) instance of node()) then true()
else single-ebv($input)
}; I think it would be more intuitive to get rid of any special-casing and use existential semantics instead, so I would propose: declare function ebv($input as item()*) as xs:boolean {
some $item in $input satisfies single-ebv($item)
}; This way, the result won’t change if the input sequence is reordered. More importantly, all item types would have “equal rights”. This feels important to me, as the language has evolved a lot since XPath 1.0, which was very node-centric. I really can’t find a good reason today for treating node sequences differently to sequences of other types. For 2., we currently have… declare function single-ebv($item as item()) as xs:boolean {
typeswitch($item) {
case xs:untypedAtomic | xs:string | xs:anyURI return $item != ''
case xs:numeric return $item != 0
case xs:boolean return $item
default return error(xs:QName('err:FORG0006'))
}
}; It could possibly be: declare function single-ebv($item as item()) as xs:boolean {
typeswitch($item) {
case xs:untypedAtomic | xs:string | xs:anyURI return $item != ''
case xs:numeric return $item != 0
case xs:boolean return $item
(: to be discussed... :)
case xs:base64Binary return $item != xs:base64Binary('')
case xs:hexBinary return $item != xs:hexBinary('')
case array(*) return array:size($item) != 0
case map(*) return map:size($item) != 0
default return true()
}
}; We should get rid of raised error. I can see it was reasonable in the past, but I don’t believe it’s suitable today. If we want to align sequences and arrays, it just makes no sense to me that PS: I wondered why “…and algebra” slipped into my sentence. Should probably have been “…and alignment”. |
Starting from first principles, I can certainly see why you want |
If we believe that JavaScript users are (one of) our main target groups today, we should at least return |
Another point to bear in mind: in XSLT predicates, failure means no match. So in 3.0 |
Oh dear; yes, that sounds like a hard nut to crack. If we think this through, it basically disallows us to turn any error in the language into a success (try/catch is regularly used if people are overwhelmed to assess what exactly is supposed to go wrong in more more complex code). |
I’m grateful for the discussion! I’ll open another issue with a narrower focus → #829. |
Yes, I dare to question the semantics of effective boolean values. The reason is that I never learned to fully like them. It seems obvious where the rules come from, and why they have been reasonable in previous versions of the language. From today’s perspective, I think there’s really some need to simplify and unify the rules, and I believe it’s possible with little effort and without endangering backward compatibility (provided that we are willing to drop errors and return results).
Some examples for the somewhat strange nature of the current rules:
boolean((<_>x</_>, <_>y</_>))
returnstrue
, whereasboolean(('x', 'y'))
raises an error.boolean(xs:NCName('x'))
returnstrue
, whereasboolean(xs:QName('x'))
raises an error.boolean((<a/>, 1))
andboolean((1, <a/>))
may either returntrue
or raise an error, depending on the implementation.I believe it will make much more sense to
The semantics would be tidied up a lot, it could look like this…
…or, if we include more types, like this:
(If we believe that it’s too progressive to accept all types, we could still raise an error for some specific types… although I don’t think that anyone would benefit from this choice).
As a result, EBV checks could also be used to check more than one item:
Nothing would change for the classical EBV checks:
if($node/*)
,if($x = $y)
,if($ok)
, …Regarding “1. check all values of the input equally”, one could argue that this might affect performance. I don’t actually think so: For node sequences, it will still be sufficient to retrieve only the first item. For mixed-type sequences, errors were raised in the past.
The resulting EBV could be easily combined with revised predicate semantics (#816).
The text was updated successfully, but these errors were encountered: