Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions that determine if a given sequence is a subsequence of another sequence #94

Closed
dnovatchev opened this issue Nov 20, 2021 · 19 comments
Assignees
Labels
Feature A change that introduces a new feature Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Nov 20, 2021

It is surprising that we are at version 4 and still are missing:

(1) fn:has-subsequence($container as item()*, $maybe-subsequence as item()*, 
                       $compare as function(item(), item()) as xs:boolean := deep-equal)
                       ) as xs:boolean

and

(2) fn:has-subsequence($container as item()*, $maybe-subsequence as item()*, $contiguous-subsequence := true(),
                       $compare as function(item(), item()) as xs:boolean := deep-equal)
                       ) as xs:boolean

and

(3) fn:has-non-contigous-subsequence($container as item()*, $maybe-subsequence as item()*,
                                     $compare as function(item(), item()) as xs:boolean := deep-equal)
                                     ) as xs:boolean

(3) above is a shorthand for:

fn:has-subsequence(?, ?, false()) 

Examples:

fn:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'c')) returns true()

fn:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'd')) returns false()

fn:has-non-contigous-subsequence(('a', 'b', 'c', 'd'), ('b', 'd')) returns true()

fn:has-non-contigous-subsequence(('a', 'b', 'c', 'd'), ('d', 'b')) returns false()

('a', 'b', 'c', 'd') => has-subsequence(('b', 'c')) returns true()

('a', 'b', 'c', 'd') => has-subsequence(('b', 'd')) returns false()

('a', 'b', 'c', 'd') => has-non-contigous-subsequence(('b', 'd')) returns true()

('a', 'b', 'c', 'd') => has-non-contigous-subsequence(('d, 'b')) returns false()

@dnovatchev dnovatchev changed the title [XPath] Functions to determine if a given sequence is a subsequence of another sequence [XPath] Functions that determine if a given sequence is a subsequence of another sequence Nov 20, 2021
@ChristianGruen
Copy link
Contributor

Maybe the second argument of fn:index-of could be generalized to accept sequences?

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 20, 2021

@ChristianGruen This would become too complex and in some cases the code may be difficult to understand.

Why we have the contains() function for strings (sequence of characters) but not the same function for sequence of any other type?

@martin-honnen
Copy link

I wonder whether fn:has-subsequence($container as item()*, $maybe-subsequence as item()*) as xs:boolean would be implemented by (used XQuery syntax):

declare function fn:has-subsequence($container as item()*, $maybe-subsequence as item()*) as xs:boolean
{
  some $item in $container
  satisfies 
    some $index in index-of($container, $item)
    satisfies
      deep-equal(
        subsequence($container, $index, count($maybe-subsequence)),
        $maybe-subsequence
      )
};

Examples seem to give the right/expected result at https://xqueryfiddle.liberty-development.net/bdxZ93 (had to change to local namespace prefix as XQuery doesn't allow overriding fn prefix):

local:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'c')),

local:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'd')),

local:has-subsequence(('a', 'b', 'a', 'c', 'd'), ('a', 'c', 'd'))

@martin-honnen
Copy link

So I wondered how the non-contiguous version could be found, what are your thoughts on

declare function local:has-subsequence($container as item()*, $maybe-subsequence as item()*) as xs:boolean
{
  some $item in $container
  satisfies 
    some $index in index-of($container, $item)
    satisfies
      deep-equal(
        subsequence($container, $index, count($maybe-subsequence)),
        $maybe-subsequence
      )
};

declare function local:has-subsequence($container as item()*, $maybe-subsequence as item()*, $contiguous-subsequence as xs:boolean) as xs:boolean
{
  if ($contiguous-subsequence)
  then 
    local:has-subsequence($container, $maybe-subsequence)
  else 
    local:has-non-contigous-subsequence($container, $maybe-subsequence)
};

declare function local:has-non-contigous-subsequence($container as item()*, $possible-subsequence as item()*) as xs:boolean
{
  fold-left($possible-subsequence, (false(), 0), function($b, $item) { let $ind := index-of($container, $item) return ($b[2] < $ind, $ind) }) => head() => boolean()
};

?
I think it does the job for sequences of atomic values but the index-of doesn't suffice or work for sequences of nodes.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 21, 2021

I would use:

let $has-non-cont-subsequence := function($seq1 as item()*, $seq2 as item()*, $self as function(*))
{
   empty($seq2)
  or
   (for $fst in head($seq2),
        $ind1 in index-of($seq1, $fst)
      return
        $ind1 gt 0 and $self(subsequence($seq1, $ind1+1), tail($seq2), $self)
    )   
}
  return
    $has-non-cont-subsequence(('a', 'b', 'c', 'd'), ('b', 'd'), $has-non-cont-subsequence)

As for comparing items that are nodes, we probably need another parameter to indicate whether we want identity semantics or value semantics to be used in comparing the corresponding items. Also need to agree on a good default value for this parameter.

@michaelhkay
Copy link
Contributor

We also don't have a function for testing whether two sequences are "the same". And for the same reason: we don't have a function or operator for testing if two items are "the same". For some items -- notably functions -- it's intrinsically hard to come up with a definition. For other items it's hard to come up with a definition that will suit all users all of the time.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 22, 2021

Let us define these useful functions for all sequences that don't contain functions!

It would be a pity not to have this useful and user-needed functionality just because of one type (the function type) out of zillions of other types of items.

@michaelhkay Can we define a type the is item() exceptType function(*) ?

Obviously, we can define a function that takes an item() and returns true if it isn't a function: $x => not($x instance of function(*))

@ChristianGruen
Copy link
Contributor

If we decide to add new functions of this type, I think it would be consistent to apply the same rules as for index-of (and include an additional and optional collation argument).

@dnovatchev
Copy link
Contributor Author

I have updated the function definitions in the proposal to include an optional $compare argument that defaults to deep-equal.

Obviously we are good, if our comparer returns false() for any call having an argument that is a function.

@ChristianGruen
Copy link
Contributor

So I assume that the behavior of the function cannot be simulated with index-of anymore (as shown further above)?

What about collations (which are also applicable todeep-equal)?

@dnovatchev
Copy link
Contributor Author

@ChristianGruen I believe we can add collation arguments -- I didn't do this for now so that we could agree on the main semantics first.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 22, 2021

@martin-honnen My preferred pure-XPath implementation is below. This also defines the implementation (and uses it) of another proposal: fn:starts-with-sequence

let $starts-with-sequence := function($seq1 as item()*, $seq2 as item()*, $self as function(*))
{
   empty($seq2)
  or
   head($seq1) eq head($seq2) and $self(subsequence($seq1, 2), subsequence($seq2, 2), $self)
},
    $has-subsequence := function($seq1 as item()*, $seq2 as item()*)
    {
       empty($seq2)
      or
       (some $ind in index-of($seq1, head($seq2))
             satisfies $starts-with-sequence(subsequence($seq1, $ind+1), tail($seq2), $starts-with-sequence)
        )
    }
  return
    $has-subsequence(('a', 'b', 'c', 'd'), ('b', 'd'))

@michaelhkay
Copy link
Contributor

I think adding the $compare argument is the way to go. I'm not sure what the default should be.

@ChristianGruen
Copy link
Contributor

Maybe fn:contains-sequence would be a better choice if we decide to introduce fn:(starts|ends)-with-sequence.

@dnovatchev dnovatchev added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature labels Sep 14, 2022
michaelhkay added a commit to michaelhkay/qtspecs that referenced this issue Oct 25, 2022
@michaelhkay
Copy link
Contributor

I included a definition of contains-sequence() in PR 222. The name seems better aligned with the names of string functions (though with hindsight, contains() should have been named has-substring()).

@michaelhkay
Copy link
Contributor

Could the non-contiguous version be done as:

contains-interleaved($a, $b, $compare) {
   empty($b) or (
      exists($a) and
         if ($compare(head($a), head($b)) 
         then contains-interleaved(tail($a), tail($b), $compare)
         else contains-interleaved(tail($a), $b, $compare)
    )}

To be honest, I'm struggling to think what one might want to use it for.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Oct 26, 2022

Could the non-contiguous version be done as:

contains-interleaved($a, $b, $compare) {
   empty($b) or (
      exists($a) and
         if ($compare(head($a), head($b)) 
         then contains-interleaved(tail($a), tail($b), $compare)
         else contains-interleaved(tail($a), $b, $compare)
    )}

To be honest, I'm struggling to think what one might want to use it for.

For this non-contiguous containment-check I would prefer fn:has-subsequence() as the meaning is like in the is-subset-of relation, not between sets but between (ordered) sequences.

Maybe even a name like has-skeleton

One use-case would be to prove easily that seq2 is the result of several remove operations on seq1 (or that seq1 is the result of a series of insert-operations on seq2)

Or to put it in other words, each sequence can be obtained from the other using a series of "topology-preserving" operations.

If we had the set datatype, then there is no order to be maintained.

On the other side the existing, standard fn:subsequence() gives us only an uninterrupted chunk of the original sequence. At present there is nothing that resembles subset (regardless of proximity, but still preserving order).

ndw pushed a commit to ndw/qtspecs-xslt4 that referenced this issue Nov 16, 2022
starts-with-sequence, ends-with-sequence, contains-sequence as per issues qt4cg#96 and part of qt4cg#94
@ChristianGruen ChristianGruen changed the title [XPath] Functions that determine if a given sequence is a subsequence of another sequence Functions that determine if a given sequence is a subsequence of another sequence Apr 27, 2023
@ChristianGruen
Copy link
Contributor

@dnovatchev Now that we have fn:contains-subsequence, do we want to keep this issue open? If yes, we could update its title.

@ChristianGruen ChristianGruen added the Propose Closing with No Action The WG should consider closing this issue with no action label Feb 13, 2024
@ndw
Copy link
Contributor

ndw commented Feb 21, 2024

The CG agreed to close this issue without action at meeting 066

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

5 participants