Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions that determine equality of two sequences or equality of two arrays #99

Closed
dnovatchev opened this issue Nov 28, 2021 · 12 comments · Fixed by #1120
Closed

Functions that determine equality of two sequences or equality of two arrays #99

dnovatchev opened this issue Nov 28, 2021 · 12 comments · Fixed by #1120
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Nov 28, 2021

The only standard XPath 3.1 function that compares two arrays or two sequences for equality is the deep-equal() function.
It implements "value-based equality" which may not always be the equality one needs to check for. For example, the standard XPath 3.1 operator is implements a check for "identity-based equality" on nodes.

Thus for two nodes $n1 and $n2 it is possible that:

deep-equal($n1, $n2) ne ($n1 is $n2)

The functions defined below can be used to verify a more generic kind of equality between two sequences or between two arrays. These functions accept as a parameter a user-provided function $compare(), which is used to decide whether or not two corresponding items of the two sequences, or two constituents of the two arrays are "equal".

fn:sequence-equal($seq1 as item()*, $seq2 as item()*, 
                  $compare as function(item(), item()) as xs:boolean := deep-equal#2) as xs:boolean

fn:array-equal($ar1 as array(*), $ar2 as array(*), 
               $compare as function(item()*, item()*) as xs:boolean := deep-equal#2) as xs:boolean

Examples:

fn:sequence-equal((1, 2, 3), (1, 2, 3))  (: returns true() :)
fn:sequence-equal((1, 2, 3), (1, 2, 5))  (: returns false() :)
fn:sequence-equal((1), (1, 2))  (: returns false() :)
fn:sequence-equal((), ())  (: returns true() :)
let $compare := function($ arg1 as xs:integer, $arg2 as xs:integer) {$arg1 mod 2 eq $arg2 mod 2}
   return fn:sequence-equal((1, 2, 3), (5, 6, 7), $compare)  (: returns true() :)

let $compare := function($ arg1 as xs:integer, $arg2 as xs:integer) {$arg1 mod 2 eq $arg2 mod 2}
   return fn:sequence-equal((1, 2, 3), (5, 6, 8), $compare)  (: returns false() :)
fn:array-equal([1, 2, 3], [1, 2, 3]) (: returns true() :)
fn:array-equal([1, 2, 3], [1, 2, 5])  (: returns false() :)
fn:array-equal([1], [1, 2])  (: returns false() :) 
fn:array-equal([], [])  (: returns true() :)
fn:array-equal([], [()])  (: returns false() :)

Possible implementations:

  1. Here is a pure XPath implementation of fn:sequence-equal:
let $compare := function($it1 as item(), $it2 as item()) as xs:boolean 
                {deep-equal($it1, $it2)},
    $sequence-equal := function($seq1 as item()*, $seq2 as item()*, 
                                $compare as function(item(), item()) as xs:boolean, 
                                $self as function(*)) as xs:boolean
{
   let $size1 := count($seq1), $size2 := count($seq2)
    return
      if($size1 ne $size2) then false()
      else
         $size1 eq 0
        or
         $compare(head($seq1), head($seq2)) and $self(tail($seq1), tail($seq2), $compare, $self)
}
 return
   $sequence-equal((1, 2, 3), (1, 2, 3), $compare, $sequence-equal)
  1. Below is a pure XPath implementation of fn:array-equal:
let  $compare := function($val1 as item()*, $val2 as item()*) as xs:boolean 
                {deep-equal($val1, $val2)},
     $array-equal := function($ar1 as array(*), $ar2 as array(*), 
                              $compare as function(item()*, item()*) as xs:boolean, 
                              $self as function(*)) as xs:boolean
{
   let $size1 := array:size($ar1), $size2 := array:size($ar2)
    return
      if($size1 ne $size2) then false()
      else
         $size1 eq 0
        or
         $compare(array:head($ar1), array:head($ar2)) and $self(array:tail($ar1), array:tail($ar2), $compare, $self)
}
 return
   $array-equal([], [()], $compare, $array-equal)
@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 29, 2021

As the function can be implemented with just a few statements…

let $seq1 := (1, 2, 3)
let $seq2 := (1, 2, 3)
return count($seq1) = count($seq2) and (
  (: or array:for-each-pair :)
  every $b in for-each-pair($seq1, $seq2, deep-equal#2) satisfies $b (: = true() :)
)

…I wonder if the use case is prevalent enough to justify an extra function?

The underlying question may be more general: Do we expect XQuery 4 to provide hundreds of helper functions, or do we expect it to mainly provide functions for things that cannot be implemented otherwise, or would be less efficient with standard XQuery?

@martin-honnen
Copy link

It would be nice to have some node/id based examples as well although I guess that would be easier using XQuery or XSLT where you can construct nodes than in pure XPath where you are not able to do that.

@dnovatchev
Copy link
Contributor Author

As the function can be implemented with just a few statements…

let $seq1 := (1, 2, 3)
let $seq2 := (1, 2, 3)
return count($seq1) = count($seq2) and (
  (: or array:for-each-pair :)
  every $b in for-each-pair($seq1, $seq2, deep-equal#2) satisfies $b (: = true() :)
)

…I wonder if the use case is prevalent enough to justify an extra function?

The underlying question may be more general: Do we expect XQuery 4 to provide hundreds of helper functions, or do we expect it to mainly provide functions for things that cannot be implemented otherwise, or would be less efficient with standard XQuery?

@ChristianGruen Yes, and thinking in the same line FXSL for XSLT 1.0 helps us do everything, and shows that everything can be done just with XSLT 1.0, thus we do not need any higher version of XSLT.

I know that such statements are an overstretched, almost ridiculous extreme. While they are (mainly) true, they totally ignore maybe the most important factor in programming languages -- convenience and understandability.

Just that some complex expression as the above can be written by 5% of all developers doesn't mean that the remaining 95% will ever even try writing it -- not because that they can't but because this requires too-much unnecessary effort -- both to write and even more, to understand later.

@dnovatchev
Copy link
Contributor Author

It would be nice to have some node/id based examples as well

@martin-honnen Yes, this can be done even in pure XPath.
Maybe you can help us with such example. The first that comes to mind is something like identity-equal(), where atomic items are compared as usual and nodes are compared by identity (say using the generate-id() function.

@dnovatchev
Copy link
Contributor Author

The underlying question may be more general: Do we expect XQuery 4 to provide hundreds of helper functions, or do we expect it to mainly provide functions for things that cannot be implemented otherwise, or would be less efficient with standard XQuery?

@ChristianGruen Good question, I actually have proposed only the functions that I believe are the most important, and not the myriad of other functions that I have considered as possible to have.

Equality, starts-with, ends-with and containment for strings are given to us by standard functions since version 1 of the language, and a string is just a special kind of a sequence (of characters). Developers, who are accustomed with the convenience of manipulating in this way sequences of characters would greatly appreciate having the same convenience for any type of sequence, or otherwise would wonder why these were omitted in the language.

When deciding whether or not to include a new function we should ask ourselves to what extent that function helps us increase both the expressive power and understandability of the language. I strongly believe the proposed functions on sequences are probably the strongest candidates that satisfy these criteria.

@martin-honnen
Copy link

I experimented with XQuery and the following example:

let $seq-equal := function($seq1 as item()*, $seq2 as item()*, 
                           $compare as function(item(), item()) as xs:boolean, 
                           $self as function(*)) as xs:boolean
{
   let $size1 := count($seq1), $size2 := count($seq2)
    return
      if($size1 ne $size2) then false()
      else
         $size1 eq 0
        or
         $compare(head($seq1), head($seq2)) and $self(tail($seq1), tail($seq2), $compare, $self)
}
 return
   let $seq := (1, 2, 3)!<value>{.}</value>
   return (
     $seq-equal($seq, $seq, function($n1, $n2) { $n1 is $n2 }, $seq-equal), 
     $seq-equal($seq, $seq => reverse(), function($n1, $n2) { $n1 is $n2 }, $seq-equal), 
     $seq-equal($seq, $seq => reverse() => reverse(), function($n1, $n2) { $n1 is $n2 }, $seq-equal) 
   )

@dnovatchev
Copy link
Contributor Author

@martin-honnen Great,
I would only prefer this:

let $seq := (1, 2, 1)!<value>{.}</value>

so that we see in the inequality with the reverse, that value-equality isn't used

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 29, 2021

@martin-honnen I came up with this pure XPath example, which shows the evaluation with different compare() functions -- one with value-equality and one with identity-equality:

let $compare := function($it1 as item(), $it2 as item()) as xs:boolean 
                {deep-equal($it1, $it2)},
    $compare2 := function($it1 as item(), $it2 as item()) as xs:boolean 
                {if($it1 instance of node() and $it2 instance of node())
                  then $it1 is $it2
                  else $compare($it1, $it2)
                },
    $sequence-equal := function($seq1 as item()*, $seq2 as item()*, 
                                $compare as function(item(), item()) as xs:boolean, 
                                $self as function(*)) as xs:boolean
{
   let $size1 := count($seq1), $size2 := count($seq2)
    return
      if($size1 ne $size2) then false()
      else
         $size1 eq 0
        or
         $compare(head($seq1), head($seq2)) and $self(tail($seq1), tail($seq2), $compare, $self)
}
 return
   let $inSeq := ( (1, 2, 1) !parse-xml("<v>" || . || "</v>"))
    return
    (
       $sequence-equal($inSeq, $inSeq, $compare, $sequence-equal),              (: returns true()  :)
       $sequence-equal($inSeq, reverse($inSeq), $compare, $sequence-equal),     (: returns true()  :) 
       $sequence-equal($inSeq, $inSeq, $compare2, $sequence-equal),             (: returns true()  :)
       $sequence-equal($inSeq, reverse($inSeq), $compare2, $sequence-equal)     (: returns false() :)
    )

@dnovatchev dnovatchev added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature labels Sep 14, 2022
@michaelhkay
Copy link
Contributor

In writing up the proposal for starts-with-sequence, ends-with-sequence, and contains-sequence, I realised that there are many cases where it's useful to supply a $compare call-back function that does something other than equality matching. For example you can do contains-sequence to test if a sequence of nodes contains three adjacent paragraphs longer than 100 characters. The same will be true of fn:sequence-equal. I therefore propose that the new function be named fn:matches-sequence rather than fn:sequence-equal, to align with this new family of functions.

An example might be

fn:matches-sequence($chap/*, ('h1', 'p', 'p', 'p'), ->($x, $y){local-name($x) = $y})

which test whether the children of $chap comprise four elements with the specified local names.

This use case works with the function as proposed, but the name fn:sequence-equal becomes misleading.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Oct 31, 2022

The proposed function fn:matches-sequence seems definitely useful.

However it seems too-much more complicated in meaning (and even in its name) than the simpler and more easily understood fn:sequence-equal

fn:sequence-equal has obvious and abundant use-cases and can be readily called on such occasions, while fn:matches-sequence can be used in other, select occasions and is not so straightforward as a substitute (requires considerable mental strain, at least for me :) ) to associate with the simpler concept of sequence-equality.

Compare with strings, where we have both the eq operator and the fn:matches function. Why didn't we just say that we should have only fn:matches because it could also be used to specify string equality?

We can also make a parallel with another case of two proposals, one for concrete, specific functions (fn:highest and fn:lowest), and another for a more general function - fn:ranks of which these two are special cases. Clearly again, both fn:highest, fn:lowest and fn:ranks have their specific use cases and should coexist.

@michaelhkay
Copy link
Contributor

I think you may have missed my point: I'm proposing fn:matches-sequence as a more appropriate name for the function fn:sequence-equal, since it has many use cases beyond equality testing. It's exactly the same function as you proposed, just with a different name to reflect the fact that it's more powerful than the name fn:sequence-equal suggests.

@dnovatchev
Copy link
Contributor Author

I think you may have missed my point: I'm proposing fn:matches-sequence as a more appropriate name for the function fn:sequence-equal, since it has many use cases beyond equality testing. It's exactly the same function as you proposed, just with a different name to reflect the fact that it's more powerful than the name fn:sequence-equal suggests.

@michaelhkay It will require a strenuous mental effort, and probably not always successful, in order to discover that one could uses "matches" for "equals"

Why in the case of strings we have both functions and not just fn:matches()?

Why do we use:

   "abcd" eq "abcd"  

and not

   matches("abcd", "abcd"  )

This may be just a linguistic problem, but I personally find it significantly disturbing.

We vould try to have both fn:matches-sequence and fn:sequence-equal and define the latter as a shorthand for

matches-sequence($seq1, $seq2  ->{  op:same-key($it1, $it2)  }

But even in this case we have a problem: I personally believe that for two sequences to be "matching" they are not required at all to have the same length.

But for two sequences to be "equal", they must be of the same length.

@ChristianGruen ChristianGruen changed the title [XPath]Functions that determine equality of two sequences or equality of two arrays Functions that determine equality of two sequences or equality of two arrays Apr 27, 2023
@michaelhkay michaelhkay added the PR Pending A PR has been raised to resolve this issue label Mar 19, 2024
@ndw ndw closed this as completed in #1120 Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue XQFO An issue related to Functions and Operators
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants