Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions for splitting a sequence (or array) based on predicate matching #149

Closed
michaelhkay opened this issue Sep 20, 2022 · 10 comments
Closed
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators

Comments

@michaelhkay
Copy link
Contributor

This is concerned with use cases like "How do I select all the paragraphs before the first H2?" or "How do I select items between and ?".

Currently in the draft spec we have proposals for:

range-from($input, $predicate): Returns a sequence containing items from an input sequence, starting with the first item that matches a supplied predicate.

range-to($input, $predicate): Returns a sequence containing items from an input sequence, ending with the first item that matches a supplied predicate.

These both include the matching item, on the theory that it's easier to drop it if it's not wanted, than to add it if its needed.

I've also proposed (as an alternative) a family of four functions items-before, items-to, items-from, items-after giving four combinations of taking the subsequence before/after the first match of the predicate, and including or not including the matched item.

It's worth pointing out that these can all be defined in terms of index-where. For example range-to (assuming at least one item matches the predicate) is subsequence($input, 1, index-where($input, $predicate).

These functions all treat the first match of the predicate as special: they partition the sequence before or after the first item that matches the predicate. An alternative, inspilred by XSLT's for-each-group group-ending|starting-with, would be to partition the sequence breaking immediately before or after every item that matches the predicate:

group-breaking-after($input, $predicate)
group-breaking-before($input, $predicate)

But these logically return a sequence of sequences, which would typically be presented either as an array of sequences or a sequence of arrays, neither of which is ideal. (An alternative would be to return a sequence of arity-0 functions)

Having reviewed the options, I think my preferance remains having a family of four functions which I have called items-before, items-to, items-from, items-after. But I'm certainly open to other options. The logical names would probably be subsequence-before etc, but that's a bit of a mouthful.

Whatever family of functions we decide upon, there's logically a requirement to offer the same for arrays.

Michael Kay

@dnovatchev
Copy link
Contributor

What about just adding to the two functions a boolean argument: $inclusive as xs:boolean ?

Or having "inclusive" or "exclusive" in the name of the function:

items-to-inclusive()
items-to-exclusive()
items-from-inclusive()
items-from-exclusive()

@ChristianGruen ChristianGruen added XPath An issue related to XPath Feature A change that introduces a new feature labels Sep 21, 2022
@michaelhkay
Copy link
Contributor Author

michaelhkay commented Sep 21, 2022 via email

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Sep 21, 2022

I’d be happy to see only 2 functions (items-from, items-to), as it feels like overkill to have 4 functions that are pretty similar. Or we manage to define a single function with additional options for it. I believe this has been discussed before, I just can't find the sources.

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators and removed XPath An issue related to XPath labels Sep 21, 2022
@ChristianGruen
Copy link
Contributor

Maybe items-toitems-until ?

@michaelhkay
Copy link
Contributor Author

I've pushed a proposal that uses the function names

  • items-before
  • items-after
  • items-starting-where
  • items-ending-where

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Oct 5, 2022

I’ll add the examples from #80 (comment), and some more, to document possible alternatives:

The common prolog for all queries:

declare variable $INPUT := 1 to 10000;

The function fn:items-starting-where

items-starting-where($INPUT, function($item) { $item <= 5000 })

…could also be realized with fn:while (provided that the condition is only met once):

while(
  function($seq) { head($seq) <= 5000 },
  function($seq) { tail($seq) },
  $INPUT
)

Or, taking advantage of various new proposals:

while(=> { head(.) <= 5000 }, => { tail(.) }, $INPUT)

The alternative writing of fn:items-ending-where looks similar:

items-starting-where(1 to 10000, -> { . <= 5000 }
while(=> { foot(.) >= 5000 }, => { init(.) }, $INPUT)

If we decide to add a predicate for aborting a loop to fn:fold-left and fn:fold-right, as proposed in #80 (comment), the following function calls would be equivalent (to be strict, the second example is only equivalent if the condition is only met once, as above):

items-ending-where($INPUT, -> { . = $5 })
fold-left($INPUT, (), ->($seq, $curr) { $seq, $curr }, => { foot(.) = 5 })

items-starting-where($INPUT, -> { . = $5 })
fold-right($INPUT, (), ->($curr, $seq) { $curr, $seq }, => { foot(.) = 5 })

With fn:index-where, we can do this:

let $pos := head(index-where($INPUT, -> { . = 5 }))
return subsequence($INPUT, 1, $pos)

If we add a positional argument to fn:for-each (see #181), we could do this:

let $pos := head(for-each($INPUT, ->($item, $pos) { $pos[$item = 5] }))
return subsequence($INPUT, 1, $pos)

@ChristianGruen
Copy link
Contributor

I love those people who contribute ideas if things have more or less been finalized. Still…

I did some research on how common challenges on lists and arrays are tackled in other programming languages, and noticed there's quite a bunch of languages today that come with takeWhile and dropWhile functions. As we already use well-known function names such as for-each, fold-left or filter, maybe we should rather adopt these two functions instead of reinventing the iterative wheel?

@ndw
Copy link
Contributor

ndw commented Oct 6, 2022

I'm all for reuse rather than reinvention, but unless I'm being more than usually thick, take-while and drop-while aren't semantically the same as any of our functions. "Take while" is "items until not" and "drop while" is "items after not" (or something like that). Can we get a nice set of four functions that include take-while and drop-while?

@ChristianGruen
Copy link
Contributor

It's true, the functions are not equivalent. I'm just wondering if our requirements are really that different from those of Scala, Python, C#/F#, Kotlin, Java, and other languages? The main difference I can see is that we have sequences, but apart from some specific and cool features such as implicit flattening, the data structure pretty much resembles lists and arrays.

@ChristianGruen
Copy link
Contributor

Accepted and resolved (#199 (comment))

michaelhkay added a commit to qt4cg/qt4tests that referenced this issue Nov 17, 2022
fn:items-(until|from) → fn:items-(ending|starting)-where. qt4cg/qtspecs#149
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

4 participants