fn:sort, and XSLT and XQuery sorting, should use transitive comparisons #866

michaelhkay · 2023-12-03T13:19:13Z

We have addressed the question of non-transitivity of equality matching in distinct-values(), and in XSLT and XQuery grouping, but the same issue exists for sorting. Currently fn:sort, as well as XSLT and XQuery sorting, rely on the "lt" operator for comparing values including mixed numerics such as doubles and decimals. Because this promotes to double, it is capable of losing precision, and is therefore non-transitive. Most sort algorithms rely on the supplied comparison function being transitive, and if it isn't, then undefined failures may occur including non-termination.

One particular quirk (which led me here) is that fn:highest and fn:lowest start by using fn:sort semantics to put the values in order, and then rely on fn:deep-equal semantics to find the values that are "equal highest" or "equal lowest". But fn:sort and fn:deep-equal have different ways of deciding whether two values are equal: decimal 1.2 and double 1.2 are equal for fn:sort, but not for fn:deep-equal.

The text was updated successfully, but these errors were encountered:

michaelhkay · 2023-12-03T13:30:58Z

It's so tempting to say we should be using a transitive order relation for everything including the "eq" and "lt" operators.

But this causes problems: currently the decimal 1.2 and the double 1.2e0 compare equal, and they would then compare non-equal. Because the simple notation "1.2" represents a decimal, while atomization and arithmetic operators default to producing a double, unintentional comparisons between decimals and doubles are very common.

michaelhkay · 2023-12-05T18:24:21Z

Specific proposal: I propose that in XSLT xsl:sort (and xsl:perform-sort, xsl:merge, etc), in XQuery order by, and in fn:sort, fn:min, fn:max, fn:highest, and fn:lowest, numeric values should be compared using the new function fn:numeric-compare($v1, $v2) which returns -1, 0, or +1 depending on the exact mathematical relationship between the numeric values (or, if you like, conversion to xs:decimal with no limit on precision). In this ordering NaN equals NaN and precedes -INF, followed by all ordinary values, followed by +INF. Note that for numeric operands fn:numeric-compare($v1, $v2) returns 0 if and only if fn:atomic-equal($v1, $v2) returns true.

The behaviour of the lt operator (and its backing function, op:numeric-less-than) is not changed.

michaelhkay · 2023-12-06T11:01:58Z

Note in passing: the 3.1 specifications (and the current 4.0 drafts) for fn:min and fn:max have a "properties" section that refers to the arity-0 and arity-1 forms of these functions, which is nonsense; the available arities are 1 and 2.

ChristianGruen · 2023-12-06T13:06:28Z

I definitely see the need to make the functions transitive.

We’d then have fn:deep-equal, fn:compare, fn:atomic-equal, fn:numeric-compare, … Which makes sense from a technical point of view, but it could confuse all those people who don’t belong to the exclusive group of implementors.

I may be overlooking something, but couldn’t we simply drop fn:atomic-equal and fn:numeric-compare and generalize fn:compare to accept arbitrary atomic types? fn:deep-equal could then be simplified as well. The following rules could be dropped…

d. One of the following conditions is true:
i. If both $i1 and $i2 are instances of xs:string or xs:untypedAtomic, equal-strings($i1, $i2, $collation, $options) returns true.
ii. If both $i1 and $i2 are instances of xs:date, xs:time or xs:dateTime, $i1 eq $i2 returns true.
iii. Otherwise, fn:atomic-equal($i1, $i2) returns true.

…and replaced with:

fn:compare(normalize($i1), normalize($i2)) returns true.

…with equal-strings being replaced with:

declare function normalize(
  $item     as xs:anyAtomicType,
  $options  as map(*)
) as xs:string {
  if($item instance of xs:string) then (
    $string
    ! (if($options?whitespace = "normalize")) then normalize-unicode(., $options?normalization-form)  else .)
    ! (if($options?normalize-space) then normalize-space(.) else .)
  ) else $item
}

The signature of fn:compare could be:

fn:compare(
  $value1     as xs:anyAtomicType?,
  $value2     as xs:anyAtomicType?,
  $collation  as xs:string?	    := fn:default-collation()
) as xs:integer?

For types other than strings, we could adopt the rules from fn:atomic-equal and the ones proposed in #881.

Another positive effect would be that fn:compare (which has no reference to strings in its name) could then be used for arbitrary atomic items. It would also be helpful for the upcoming fn:sort-with function: fn:sort-with($sequence, compare#2)) could be used for strings, numbers, dates, etc.

michaelhkay · 2023-12-06T18:45:05Z

couldn’t we simply drop fn:atomic-equal and fn:numeric-compare and generalize fn:compare to accept arbitrary atomic types

Firstly, atomic-equal was specifically designed to be context-free, which makes it messy to combine it into a function that uses collations.

We could generalise atomic-equal to do an ordered comparison over all data types, in which case it could embrace numeric-compare, but that's providing functionality that I'm not sure we need: we're happy, I think, for sorting to be context-dependent.

Folding fn:numeric-compare into fn:compare is more feasible, but you've then got one function that does two different jobs; there's no type safety to ensure that the arguments have compatible types, and you need ad-hoc rules to say which combinations of arguments are valid and which aren't. The merit of two separate functions is that each is a total function over the domain implied by its signature.

Another option is to make numeric-compare private (i.e. put it in "op" namespace). I put it in the fn namespace because I think there are valid use-cases for applications to use it directly; that will be especially true if we introduce fn:sort-with.

I did wonder while writing this whether we should combine the op:xxx-less-than and op:xxx-equal functions for other data types into op:xxx-compare functions, but I think that's an unrelated task, and is purely editorial.

As for deep-equal, I wish we could separate it into two functions, one which compares items and one which compares sequences taking an item-comparison function as an argument; but again that's a separate project, and one that is hampered by compatibility concerns.

ChristianGruen · 2023-12-06T21:15:44Z

Folding fn:numeric-compare into fn:compare is more feasible, but you've then got one function that does two different jobs; there's no type safety to ensure that the arguments have compatible types, and you need ad-hoc rules to say which combinations of arguments are valid and which aren't.

I can’t agree here; those seem technical concerns to me that can be solved (similar to arithmetic expressions, for example, ad-hoc rules are only necessary for those cases in which static types are not available). From a user perspective, it’s not intuitive at all to have multiple compare functions, i.e., to have fn:compare for strings and fn:numeric-compare for numbers. We neither have fn:date-compare nor do we have fn:numeric-max, fn:duration-sum, fn:numeric-sort, etc. and I’m convinced no one misses these functions. The same applies to comparison operators: < or lt are defined for different types; there is no specific operator for strings or numbers.

Apart from the already mentioned advantages for a generalized fn:compare function, it could get easier to define other comparisons in the future.

Firstly, atomic-equal was specifically designed to be context-free, which makes it messy to combine it into a function that uses collations.

I see. Perhaps fn:atomic-equal could be made private: Maps seem to be the only language feature that requires a context-free comparison; and I don’t think we should confront people with yet another comparison function when they already have =, eq, fn:deep-equal and fn:compare.

This reverts commit 6a97d00.

ChristianGruen added XQFO An issue related to Functions and Operators Enhancement A change or improvement to an existing feature labels Dec 6, 2023

michaelhkay mentioned this issue Dec 6, 2023

866 Introduce and exploit new numeric-compare() function #881

Merged

michaelhkay added PR Pending A PR has been raised to resolve this issue Tests Needed Tests need to be written or merged labels Dec 6, 2023

This was referenced Dec 11, 2023

fn:compare: Support for arbitrary atomic types #893

Closed

Relax type incompatibility in order by clause (impl. dep. instead of XPST0004) #42

Closed

ndw closed this as completed in #881 Dec 19, 2023

ChristianGruen added a commit to qt4cg/qt4tests that referenced this issue Dec 20, 2023

Transitive comparisons. qt4cg/qtspecs#866

6a97d00

ChristianGruen added a commit to qt4cg/qt4tests that referenced this issue Dec 20, 2023

Revert "Transitive comparisons. qt4cg/qtspecs#866"

4007db5

This reverts commit 6a97d00.

ChristianGruen added a commit to qt4cg/qt4tests that referenced this issue Dec 20, 2023

Transitive comparisons (2nd attempt). qt4cg/qtspecs#866

7ef7741

ChristianGruen mentioned this issue Jan 10, 2024

Remove fn:numeric-compare #939

Closed

ChristianGruen removed the PR Pending A PR has been raised to resolve this issue label Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fn:sort, and XSLT and XQuery sorting, should use transitive comparisons #866

fn:sort, and XSLT and XQuery sorting, should use transitive comparisons #866

michaelhkay commented Dec 3, 2023 •

edited

Loading

michaelhkay commented Dec 3, 2023 •

edited

Loading

michaelhkay commented Dec 5, 2023 •

edited

Loading

michaelhkay commented Dec 6, 2023

ChristianGruen commented Dec 6, 2023

michaelhkay commented Dec 6, 2023 •

edited

Loading

ChristianGruen commented Dec 6, 2023

fn:sort, and XSLT and XQuery sorting, should use transitive comparisons #866

fn:sort, and XSLT and XQuery sorting, should use transitive comparisons #866

Comments

michaelhkay commented Dec 3, 2023 • edited Loading

michaelhkay commented Dec 3, 2023 • edited Loading

michaelhkay commented Dec 5, 2023 • edited Loading

michaelhkay commented Dec 6, 2023

ChristianGruen commented Dec 6, 2023

michaelhkay commented Dec 6, 2023 • edited Loading

ChristianGruen commented Dec 6, 2023

michaelhkay commented Dec 3, 2023 •

edited

Loading

michaelhkay commented Dec 3, 2023 •

edited

Loading

michaelhkay commented Dec 5, 2023 •

edited

Loading

michaelhkay commented Dec 6, 2023 •

edited

Loading