Functions for splitting a sequence (or array) based on predicate matching #149

michaelhkay · 2022-09-20T20:12:28Z

This is concerned with use cases like "How do I select all the paragraphs before the first H2?" or "How do I select items between and ?".

Currently in the draft spec we have proposals for:

range-from($input, $predicate): Returns a sequence containing items from an input sequence, starting with the first item that matches a supplied predicate.

range-to($input, $predicate): Returns a sequence containing items from an input sequence, ending with the first item that matches a supplied predicate.

These both include the matching item, on the theory that it's easier to drop it if it's not wanted, than to add it if its needed.

I've also proposed (as an alternative) a family of four functions items-before, items-to, items-from, items-after giving four combinations of taking the subsequence before/after the first match of the predicate, and including or not including the matched item.

It's worth pointing out that these can all be defined in terms of index-where. For example range-to (assuming at least one item matches the predicate) is subsequence($input, 1, index-where($input, $predicate).

These functions all treat the first match of the predicate as special: they partition the sequence before or after the first item that matches the predicate. An alternative, inspilred by XSLT's for-each-group group-ending|starting-with, would be to partition the sequence breaking immediately before or after every item that matches the predicate:

group-breaking-after($input, $predicate)
group-breaking-before($input, $predicate)

But these logically return a sequence of sequences, which would typically be presented either as an array of sequences or a sequence of arrays, neither of which is ideal. (An alternative would be to return a sequence of arity-0 functions)

Having reviewed the options, I think my preferance remains having a family of four functions which I have called items-before, items-to, items-from, items-after. But I'm certainly open to other options. The logical names would probably be subsequence-before etc, but that's a bit of a mouthful.

Whatever family of functions we decide upon, there's logically a requirement to offer the same for arrays.

Michael Kay

The text was updated successfully, but these errors were encountered:

dnovatchev · 2022-09-21T00:50:50Z

What about just adding to the two functions a boolean argument: $inclusive as xs:boolean ?

Or having "inclusive" or "exclusive" in the name of the function:

items-to-inclusive()
items-to-exclusive()
items-from-inclusive()
items-from-exclusive()

michaelhkay · 2022-09-21T07:35:26Z

The problem with boolean arguments is that someone reading the code `items-to($input, ->{boolean(self::h2)}, true())` has to either remember or read up on what the semantics of the third argument are. There's almost a case for using a string argument that must be set to "inclusive" or "exclusive". These suggestions are all possible, but I think my preference is for having four functions rather than two functions with options. There's always a fine balance between making the semantics of a function evident from its name, and keeping it short. We do know from experience that naming is really important; there's an awful lot of incorrect XPath code written because people jump to wrong conclusions about what contains() does. Michael Kay

…

On 21 Sep 2022, at 01:51, dnovatchev ***@***.***> wrote: What about just adding to the two functions a boolean argument: $inclusive as xs:boolean ? Or having "inclusive" or "exclusive" in the name of the function: items-to-inclusive() items-to-exclusive() items-from-inclusive() items-from-exclusive() — Reply to this email directly, view it on GitHub <#149 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AASIQIR6373AWI23RRMPQODV7JLXLANCNFSM6AAAAAAQRNJU4E>. You are receiving this because you authored the thread.

ChristianGruen · 2022-09-21T07:48:56Z

I’d be happy to see only 2 functions (items-from, items-to), as it feels like overkill to have 4 functions that are pretty similar. Or we manage to define a single function with additional options for it. I believe this has been discussed before, I just can't find the sources.

ChristianGruen · 2022-10-04T15:55:03Z

Maybe items-to → items-until ?

michaelhkay · 2022-10-04T17:31:44Z

I've pushed a proposal that uses the function names

items-before
items-after
items-starting-where
items-ending-where

ChristianGruen · 2022-10-05T11:10:47Z

I’ll add the examples from #80 (comment), and some more, to document possible alternatives:

The common prolog for all queries:

declare variable $INPUT := 1 to 10000;

The function fn:items-starting-where…

items-starting-where($INPUT, function($item) { $item <= 5000 })

…could also be realized with fn:while (provided that the condition is only met once):

while(
  function($seq) { head($seq) <= 5000 },
  function($seq) { tail($seq) },
  $INPUT
)

Or, taking advantage of various new proposals:

while(=> { head(.) <= 5000 }, => { tail(.) }, $INPUT)

The alternative writing of fn:items-ending-where looks similar:

items-starting-where(1 to 10000, -> { . <= 5000 }
while(=> { foot(.) >= 5000 }, => { init(.) }, $INPUT)

If we decide to add a predicate for aborting a loop to fn:fold-left and fn:fold-right, as proposed in #80 (comment), the following function calls would be equivalent (to be strict, the second example is only equivalent if the condition is only met once, as above):

items-ending-where($INPUT, -> { . = $5 })
fold-left($INPUT, (), ->($seq, $curr) { $seq, $curr }, => { foot(.) = 5 })

items-starting-where($INPUT, -> { . = $5 })
fold-right($INPUT, (), ->($curr, $seq) { $curr, $seq }, => { foot(.) = 5 })

With fn:index-where, we can do this:

let $pos := head(index-where($INPUT, -> { . = 5 }))
return subsequence($INPUT, 1, $pos)

If we add a positional argument to fn:for-each (see #181), we could do this:

let $pos := head(for-each($INPUT, ->($item, $pos) { $pos[$item = 5] }))
return subsequence($INPUT, 1, $pos)

ChristianGruen · 2022-10-06T16:47:12Z

I love those people who contribute ideas if things have more or less been finalized. Still…

I did some research on how common challenges on lists and arrays are tackled in other programming languages, and noticed there's quite a bunch of languages today that come with takeWhile and dropWhile functions. As we already use well-known function names such as for-each, fold-left or filter, maybe we should rather adopt these two functions instead of reinventing the iterative wheel?

ndw · 2022-10-06T16:55:36Z

I'm all for reuse rather than reinvention, but unless I'm being more than usually thick, take-while and drop-while aren't semantically the same as any of our functions. "Take while" is "items until not" and "drop while" is "items after not" (or something like that). Can we get a nice set of four functions that include take-while and drop-while?

ChristianGruen · 2022-10-06T17:19:27Z

It's true, the functions are not equivalent. I'm just wondering if our requirements are really that different from those of Scala, Python, C#/F#, Kotlin, Java, and other languages? The main difference I can see is that we have sequences, but apart from some specific and cool features such as implicit flattening, the data structure pretty much resembles lists and arrays.

ChristianGruen · 2022-11-17T11:12:48Z

Accepted and resolved (#199 (comment))

fn:items-(until|from) → fn:items-(ending|starting)-where. qt4cg/qtspecs#149

ChristianGruen added XPath An issue related to XPath Feature A change that introduces a new feature labels Sep 21, 2022

ChristianGruen added XQFO An issue related to Functions and Operators and removed XPath An issue related to XPath labels Sep 21, 2022

ChristianGruen mentioned this issue Oct 5, 2022

Context item → Context value? #129

Closed

This was referenced Oct 18, 2022

Items before etc #177

Closed

HOF Sequence Functions with Positional Arguments #181

Closed

ChristianGruen mentioned this issue Nov 17, 2022

fn:items-(until|from) → fn:items-(ending|starting)-where. qt4cg/qtspecs#149 qt4cg/qt4tests#29

Merged

ChristianGruen added this to the QT 4.0 milestone Nov 17, 2022

ChristianGruen closed this as completed Nov 17, 2022

michaelhkay added a commit to qt4cg/qt4tests that referenced this issue Nov 17, 2022

Merge pull request #29 from ChristianGruen/fn-items---where

12ef283

fn:items-(until|from) → fn:items-(ending|starting)-where. qt4cg/qtspecs#149

ChristianGruen mentioned this issue Nov 18, 2023

Standard, array & map functions: Equivalencies #843

Closed

michaelhkay mentioned this issue Feb 2, 2024

character sequence constructor 'a' to 'z' #989

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functions for splitting a sequence (or array) based on predicate matching #149

Functions for splitting a sequence (or array) based on predicate matching #149

michaelhkay commented Sep 20, 2022

dnovatchev commented Sep 21, 2022

michaelhkay commented Sep 21, 2022 via email

ChristianGruen commented Sep 21, 2022 •

edited

Loading

ChristianGruen commented Oct 4, 2022

michaelhkay commented Oct 4, 2022

ChristianGruen commented Oct 5, 2022 •

edited

Loading

ChristianGruen commented Oct 6, 2022

ndw commented Oct 6, 2022

ChristianGruen commented Oct 6, 2022

ChristianGruen commented Nov 17, 2022

Functions for splitting a sequence (or array) based on predicate matching #149

Functions for splitting a sequence (or array) based on predicate matching #149

Comments

michaelhkay commented Sep 20, 2022

dnovatchev commented Sep 21, 2022

michaelhkay commented Sep 21, 2022 via email

ChristianGruen commented Sep 21, 2022 • edited Loading

ChristianGruen commented Oct 4, 2022

michaelhkay commented Oct 4, 2022

ChristianGruen commented Oct 5, 2022 • edited Loading

ChristianGruen commented Oct 6, 2022

ndw commented Oct 6, 2022

ChristianGruen commented Oct 6, 2022

ChristianGruen commented Nov 17, 2022

ChristianGruen commented Sep 21, 2022 •

edited

Loading

ChristianGruen commented Oct 5, 2022 •

edited

Loading