Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a 'while' clause to FLWOR expressions #187

Closed
michaelhkay opened this issue Oct 7, 2022 · 14 comments · Fixed by #943
Closed

Add a 'while' clause to FLWOR expressions #187

michaelhkay opened this issue Oct 7, 2022 · 14 comments · Fixed by #943
Labels
Feature A change that introduces a new feature XQuery An issue related to XQuery

Comments

@michaelhkay
Copy link
Contributor

michaelhkay commented Oct 7, 2022

This proposal adds a new clause, the while clause, to FLWOR expressions. I'll start with the proposal, and then give some rationale.

Proposal

We add a new WhileClause which can appear anywhere a WhereClause can appear. The semantics are deliberately almost identical to the WhereClause.

3.12.x While Clause

[60] WhileClause ::= "while" ExprSingle
A while clause serves as a filter for the tuples in its input tuple stream. The expression in the while clause, called the while-expression, is evaluated once for each of these tuples. If the effective boolean value of the while-expression is true, the tuple is retained in the output tuple stream; otherwise the tuple and all subsequent tuples in the stream are discarded.

Examples:

This example illustrates the effect of a while clause on a tuple stream:
Input tuple stream:

($a = 13, $b = 11)
($a = 91, $b = 42)
($a = 17, $b = 30)
($a = 85, $b = 63)

while clause:
while $a > $b
Output tuple stream:

($a = 13, $b = 11)
($a = 91, $b = 42)

The following query illustrates how a while clause might be used to extract all items in an input sequence before the first one that fails to satisfy some condition. In this case it selects the leading para elements in the input sequence, stopping before the first element that is not a para element.

for $x in $section/*
while $x[self::para]
return $x

Note:
Although the semantics are described in terms of discarding all the tuples following the first one that fails to match the condition, a practical implementation is likely to avoid evaluating those tuples, thus giving an "early exit" from the iteration performed by the FLWOR expression.

Justification

FLWOR expressions remain the primary control construct in XQuery, despite the introduction of higher-order functions. The inability to process input selectively until some condition is true remains one of their biggest limitations. The window clause provides a workaround for some use cases, but it is very complex, and although it can partition an input sequence based on conditions found in the data, it has no direct way of stopping processing when a condition is found. Also, it only operates on input sequences, not on the tuple stream, so it isn't able to handle a composite condition involving multiple variables, for example for $i at $p in //emp while ($i/location = 'x' and $p < 50) return $i.

That particular example could be written with a where clause. But if a where clause is used, both the human reader and the query optimiser need to do some thinking to work out that early exit is possible (once $p reaches 50, no further tuples are going to have values less than 50).

@michaelhkay michaelhkay added XQuery An issue related to XQuery Feature A change that introduces a new feature labels Oct 7, 2022
@dnovatchev
Copy link
Contributor

dnovatchev commented Oct 7, 2022

FLWOR expressions remain the primary control construct in XQuery, despite the introduction of higher-order functions. The inability to process input selectively until some condition is true remains one of their biggest limitations.

This is not true.

This can be done even in pure XPath 3.1 and a complete working example is provided in my comment here: #80 (comment)

   let $itemsBeforeHelper := function(
                             $seq as item()*, 
                             $pred as function(item()) as xs:boolean, 
                             $self as function(*) )   as item()*
  {
    if(empty($seq)) then ()
     else
       let $head := head($seq), $tail := tail($seq)
       return
       if(not($pred($head)))
         then ($head, $self($tail, $pred, $self))
         else()       
   },
   $itemsBefore := function($seq as item()*, 
                            $pred as function(item()) as xs:boolean)
                   {
                      $itemsBeforeHelper($seq, $pred, $itemsBeforeHelper)
                   }
   return
      $itemsBefore((1 to 6), function($n) {$n gt 4} )      

Evaluating the above expression correctly produces:

1
2
3
4

@dnovatchev
Copy link
Contributor

@michaelhkay
I believe this proposal is a significant and unjustified step back from your original proposal in the comment threads to issue #80 : #80 (comment)

What substantial reasons do you see for dropping this initial proposal?

Why address this only in XQuery and not in XPath as your initial proposal did?

Here is one example from your initial proposal:

An all-true() function could be implemented as

function all-true($input) {
     for $item in $input
     with $all initially true() then ($all and $item)
     while $all
     finally return $all
}

And I believe that with a small enhancement from my side, the proposal looked even better:

    (:   Expr 2  :)
      for $item in $input
            with-accumulator $all initially true() then ($all and $item)
        while $all
      yield-accumulator

Therefore I propose to support, unless there are pressing reasons not to, your initial proposal in issue #80, with consequent amendments, as shown above.

@ChristianGruen
Copy link
Contributor

Thanks for the new proposal, and the decision to look at while and with separately.

I had expressed some concerns on extending the FLWOR expression in #80 (comment), but I see that the while clause raises fewer questions than I thought. In particular, I better understand that we don’t have to care about variables in the return clauses that have not been initialized yet, as the tuple stream will simply be interrupted.

And I better understand the fundamental differences between the proposed while clause and fn:while (even though they are named identically): The while clause helps to interrupt operations on sequences, whereas fn:while provides a general concept to process and modify data until a condition is met. With FLWOR expressions, the maximum number of iterations is defined by the input, and with the functional approach, it depends on the data that is being generated. I believe even more that we should include both concepts in the new specification.

This can be done even in pure XPath 3.1

@dnovatchev True indeed; but I assume that just a handful people will be able to write and understand queries of this type?

@dnovatchev
Copy link
Contributor

This can be done even in pure XPath 3.1

@dnovatchev True indeed;

Yes, the absolute statement that this cannot be done in XQuery is not true, and it is not true even in the case of XPath. I hope that @michaelhkay could edit and correct this misleading, absolute statement?

but I assume that just a handful people will be able to write and understand queries of this type?

One can make assumptions that seem likely, but are not based on data, and use such assumptions to support decisions. It is definitely better if we could rely on existing data. While the pure XPath technique can seem fairly advanced, using recursion with named functions in XQuery is a common practice/feature/capability and many thousands of developers, practically everybody who can write code for n! (factorial) in any programming language, can write recursive functions in XQuery.

Even the seemingly advanced technique for recursion in pure XPath isn't as misunderstood as some people may suppose. This technique was first proposed more than 10 years ago. It has been publicized in my blog, and at the Balisage 2013 conference in front of many people. It was extensively explained in the Pluralsight course (2014) "The Evolution of XPath: What's New in Xpath 3.0". We have objective data for this course: Since its publishing it has been watched by more than 2000 people. Even if just half of these people understood the technique, this makes 1000 people, not just a "handful"

And what do you think, @ChristianGruen: even if this subjective statement were true, that "just a handful people will be able to write and understand queries of this type": should we just support the status quo (supposed unawareness ignorance) or do our best to try promote new knowledge and capabilities?

@ChristianGruen
Copy link
Contributor

Thanks for your thoughts. I’ll use the chat to dive deeper into this to keep the discussion in the issue focused.

@ndw
Copy link
Contributor

ndw commented Oct 7, 2022

One of the user communities that is remarkably well served by QT are folks who do not consider themselves programmers. These are academics and other professionals in non-programming disciplines, folks who picked up a tool like XSLT because it allowed them to get some stuff done without having to learn Perl or Python or Java or C# or Haskell or Scala or whatever your favorite language is.

These users are important to me!

I have met many who self-identify as "non-programmers". That they clearly are programmers is irrelevant.

Many are unprepared to engage, and some believe themselves incapable of engaging, in the mental gymnastics that allow one to write a recursive function, let alone a recursive function with a HOF parameter.

Features like xsl:iterate make it possible for them to accomplish tasks that would otherwise have been beyond their grasp. I wholeheartedly support additions to the languages that make them easier for folks who are not in a position to work out the functional programming idioms required to do a task, even if those idioms already exist in XPath and could be used to accomplish the task.

The QT languages are not an academic exercise in purity of expression. They are real world tools that ordinary humans use to get important jobs done.

@michaelhkay
Copy link
Contributor Author

@dnovatchev

Firstly, I'm not withdrawing the original proposal, I'm just breaking it up into bite-size chunks that I hope are more manageable, in order to help focus the discussion.

My assertion about the inability to do this currently was in the context of FLWOR expressions; I thought the context made that clear. Of course I know that you can achieve anything if you write recursive functions, but I also know there are many users who find recursive functions very difficult to write and debug. Users coming from SQL, in particular, like to do everything they can using FLWOR expressions alone.

@dnovatchev
Copy link
Contributor

The QT languages are not an academic exercise in purity of expression. They are real world tools that ordinary humans use to get important jobs done.

@ndw, Yes, and I really support this. My question to @michaelhkay was why in his current state his proposal was stepped back from the initial one and now is targeting only XQuery.

These users are important to me!

Absolutely, and this group is certainly not a static one, thus we are in a unique position to do our best to help and support their further learning and development.

@dnovatchev
Copy link
Contributor

Firstly, I'm not withdrawing the original proposal, I'm just breaking it up into bite-size chunks that I hope are more manageable, in order to help focus the discussion.

@michaelhkay Sorry for misunderstanding this

My assertion about the inability to do this currently was in the context of FLWOR expressions;

Understood.

I would fully support this feature in the broader context of XPath.

@michaelhkay
Copy link
Contributor Author

@ChristianGruen "The while clause helps to interrupt operations on sequences".

The most common use case is where the while clause applies a condition to items in the primary input sequence. But of course it's more general than that; in the general case it interrupts operations on the tuple stream, and may be based on conditions other than the raw input. We should add some examples of this. For example, to limit the number of items in the output:

for $x in //x, $y in //y
count $n
while $n < 100000
return $x * $y

or to process the top 3 items in sorted order

for $x in //x
order by $x/key
for $y in $x//something
count $n
while $n <= 3
return $x

Of course it becomes even more powerful if we add the ability to "accumulate" values such as the total size of the paragraphs output so far, which is what the "with" clause is about.

@ChristianGruen
Copy link
Contributor

But of course it's more general than that; […]

Absolutely, I agree. And a great feature of FLWOR expressions is that you can place the clauses wherever you like. That is something basic HOF functions cannot provide (they have other strengths).

@PieterLamers
Copy link

I am totally in favor of this proposal. I am against having an identically named fn:while that does something different (re Christian's remark and his proposal; I second Dimitre's recent suggestion to prefix that with iterate-). I also like using FLWOR expressions because they seem (mostly) intuitive, and I feel comfortable being categorized as a non-programmer, although this is less and less true for me.

@michaelhkay
Copy link
Contributor Author

Revisiting this after six months, I retain the view that this would be a useful addition to XQuery, and would prove popular in the same way that xsl:iterate has proved highly popular with XSLT users. Although it has always been true that we have the basic capabilities to do things like this using recursion or fold-left/fold-right; and moreover we have added functions in 4.0 like items-before() or items-ending-with() that handle some of the use cases, I do think there's a community of users, particularly those schooled in SQL, who find the FLWOR approach natural and convenient and would value simple extensions such as this.

Looking at the feedback on the proposal, @dnovatchev seems to be criticizing it both because the feature is technically redundant (the requirement can be met using recursion or fold functions) and because it doesn't go far enough (in issue #80 I originally proposed two new clauses, while and with). My answer to the first criticism is that although this feature, like xsl:iterate, is technically redundant, it still enables much more intuitive and readable code than is currently possible for a variety of use cases (though such judgements are of course subjective). My answer to the second is criticism is that my original proposal included two features that are completely separable, and when that happens it's best to consider them separately.

@michaelhkay
Copy link
Contributor Author

We now have a proposal for subsequence-where() that provides two predicates, from and to, defining inclusive endpoints of a subsequence selection. One use case that isn't well handled by that proposal is "select all the items before the first X, or up to the end if there is no X". I'm therefore going to propose for $item in $input while not(X) to handle that case. I shall propose it first for XQuery and if people like it we can then consider adding it to XPath (unfortunately the actual text will have to be done separately for both).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature XQuery An issue related to XQuery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants