Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FO] fn:while (before: fn:until) #80

Closed
ChristianGruen opened this issue Jun 14, 2021 · 42 comments
Closed

[FO] fn:while (before: fn:until) #80

ChristianGruen opened this issue Jun 14, 2021 · 42 comments
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators

Comments

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Jun 14, 2021

Motivation

Similar to fold-left, the function allows for an alternative writing of code that would otherwise be solved recursively, and that would possibly cause stack overflows without tail call optimizations.

In contrast to sequence-processing functions (fold functions, for-each, filter, others), the initial input of fn:while can be arbitrary and will not determine the number of maximum iterations.

Summary

Applies the predicate function $test to $input. If the result is false, $action is invoked with the start value – or, subsequently, with the result of this function – until the predicate function returns false.

Signature

Edit: The $input argument (before: $zero) is now defined as first parameter.

fn:while(
  $input  as item()*,
  $test   as function(item()*) as xs:boolean,
  $action as function(item()*) as item()*
) as item()*

Examples / Use Cases

Calculate the square root of a number by iteratively improving an initial guess:

let $input := 3936256
return fn:while(
  $input,
  function($result) { abs($result * $result - $input) >= 0.0000000001 },
  function($guess) { ($guess + $input div $guess) div 2 }
)

Find the first number that does not occur in a sequence:

let $values := (1 to 999, 1001 to 2000)
return while(1, -> { . = $values }, -> { . + 1 })

Equivalent Expression

declare function local:while(
  $input  as item()*,
  $test   as function(item()*) as xs:boolean,
  $action as function(item()*) as item()*
) {
  if($test($input)) then (
    local:while($action($input), $test, $action)
  ) else (
    $input
  )
};
@rhdunn rhdunn added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature labels Sep 14, 2022
@michaelhkay
Copy link
Contributor

This seems similar in concept to xsl:iterate, which has certainly proved very useful in cases where recursion would otherwise be needed. But in most cases xsl:iterate is used to process items in an input sequence until some condition occurs, and that use case doesn't seem easy to achieve here.

My feeling is that as proposed, fn:until is too specialised to be included. More use cases might convince me otherwise.

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Sep 21, 2022

I would claim that fn:while is even more generic instead of specialized, as it’s not limited to sequences. fn:items-from can e.g. be realized as follows (provided that the condition is only met once):

while(
  1 to 10000,
  function($seq) { head($seq) <= 5000 },
  function($seq) { tail($seq) }
)

Or, taking advantage of various new proposals:

while(1 to 10000, => { head(.) <= 5000 }, => { tail(.) })

fn:items-to looks similar…

while(1 to 10000, => { foot(.) >= 5000 }, => { init(.) })

…or, with the classical notation and slice:

while(
  1 to 10000,
  function($seq) { slice($seq, -1) >= 5 },
  function($seq) { slice($seq, (), -2) }
)

It could also be written as follows:

let $items := 1 to 1000
let $last := while(1, -> { subsequence($items, ., 1) != 5 }, -> { . + 1 })
return subsequence($items, 1, $last)

items-to and items-from are fine when we have plain sequences. In complex applications, though, new data is generated that needs to be processed. fn:fold-left works this way, but it does not allow us to interrupt the iteration. Recursive code can be written for such use cases, but it’s usually more verbose and sometimes tricky to write, in particular when the trivial version of the code is not tail recursive.

In short, fn:fold-left vs. the classical for can be compared to fn:while and do ... while.

@michaelhkay
Copy link
Contributor

Thanks for the additional use case. Yes, it's more flexible than I realised. But only having a single "variable" seems a bit limiting. For example, could you implement index-of this way?

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Sep 21, 2022

For example, could you implement index-of this way?

I would probably do it with fold-left, as we need to check all items of the sequence:

let $input := (11 to 21, 21 to 31)
let $search := 21
return fold-left(1 to count($input), (), function($seq, $curr) {
  $seq, if($input[$curr] = $search) then $curr else ()
})

For both fold and while/until, we could add a parameter that gives us the current position in the sequence:

let $input := (11 to 21, 21 to 31)
let $search := 21
return fold-left($input, (), ->($seq, $curr, $pos) { $seq, $pos[$curr = $search] })

The proposal was “inspired by” (or copied from) a function in Haskell, which has an enviably compact notation: http://zvon.org/other/haskell/Outputprelude/until_f.html

→ See #181 for a new proposal.

@benibela
Copy link

Or a scripting extension with local variables could do that

It could look like:


mutable let $guess := 3936256;
while (abs($guess * $guess - $input) >= 0.0000000001) {
  $guess := ($guess + $input div $guess) div 2 
};
$guess

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Sep 21, 2022

Reminds me of the XQuery Scripting Extension. I think Zorba was the only processor with a full implementation.

@ChristianGruen ChristianGruen changed the title [FO] fo:until [FO] fo:until → fo:while Oct 5, 2022
@ChristianGruen
Copy link
Contributor Author

I think while is more common to coders than until, so I renamed the function and I inverted the semantics of the predicate function.

@ChristianGruen ChristianGruen changed the title [FO] fo:until → fo:while [FO] fo:while (before: fo:until) Oct 5, 2022
@ChristianGruen ChristianGruen changed the title [FO] fo:while (before: fo:until) [FO] fn:while (before: fn:until) Oct 5, 2022
@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Oct 5, 2022

Most higher-order functions revolve around sequences. If we want to preserve this focus, we could extend the existing fold functions with an additional argument for interrupting the loop. The example of the initial comment could then be written as follows:

let $input := 3936256
return fold-left(
  1 to 1000,  (: maximum number of loops :)
  $input,
  (: compute new value :)
  function($guess, $current-item) { ($guess + $input div $guess) div 2 },
  (: test if loop should be interrupted :)
  function($result) { abs($result * $result - $input) >= 0.0000000001 }
)

Even if we add fn:while (which I would recommend), the possibility to interrupt loops while folding would still be helpful.

@dnovatchev
Copy link
Contributor

Even if we add fn:while (which I would recommend), the possibility to interrupt loops while folding would still be helpful.

Just using fold-right() should (as in Haskell) give us the ability for shortcutting.

Thus no need to complicate even further fold-left()

@ChristianGruen
Copy link
Contributor Author

Just using fold-right() should (as in Haskell) give us the ability for shortcutting.

I'm not sure about that. Can you give an example?

@dnovatchev
Copy link
Contributor

dnovatchev commented Oct 5, 2022

@ChristianGruen

I'm not sure about that. Can you give an example?

Here is the definition of foldr in Haskell:

foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

So, the first application of foldr is on the last item in the list. Then if f supports shortcutting and its 2nd argument (the result of the application of foldr so far) is one that allows shortcutting, this shortcutting will be done by f immediately.

For example, if f is the function and and z is true and and whatEver false is false (the shortcutting definition for and), and if the last item in the provided list is false, then:

foldr and true [true, true, true, ..., false]

evaluates just (immediately on the first application of foldr to z (true) and the last item in the list: false):
and true false

and immediately returns false, because and does the shortcutting.

@ChristianGruen
Copy link
Contributor Author

Thanks for the explanation, Dimitre. Pardon my ignorance, but I still wonder how that would look like in XPath or XQuery. Could you please add an example that's not based on Haskell? How would you e.g. realize fn:items-before that way?

@benibela
Copy link

benibela commented Oct 5, 2022

I think while is more common to coders than until, so I renamed the function and I inverted the semantics of the predicate function.

but that will conflict with the scripting extension.

unless the parser can look ahead to find the closing ) and check if there is a { following after it

You could call it fold-left-while

he possibility to interrupt loops while folding would still be helpful.

The scripting extension had a break expression for that in early drafts. Although it got removed

Thanks for the explanation, Dimitre. Pardon my ignorance, but I still wonder how that would look like in XPath or XQuery. Could you please add an example that's not based on Haskell? How would you e.g. realize fn:items-before that way?


 
 let $items-before := function($seq, $predicate){
    try { 
      fold-left ( $seq, 0, ->($tempseq,$a) { 
        if ($predicate($a)) then error(QName("local:items"), "", $tempseq)
        else ($tempseq, $a) 
      } )  
    } catch local:items { $err:value }     
  }
  return $items-before(1 to 100, -> { . = 50})

@dnovatchev
Copy link
Contributor

dnovatchev commented Oct 5, 2022

Thanks for the explanation, Dimitre. Pardon my ignorance, but I still wonder how that would look like in XPath or XQuery. Could you please add an example that's not based on Haskell? How would you e.g. realize fn:items-before that way?

Here is one XPath 3.1 executable:

 let $myAnd := function($b1, $b2) {  if(not($b2)) then false() else  trace($b1) and  $b2 },
       $allTrue := function($bools as xs:boolean*)
 { 
   fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
 }
   return $allTrue((true(), true(), true(), true(), false()))

Running this produces false() and only one trace message, which shows that there was shortcutting.

However running this:

let $myAnd := function($b1, $b2) {  if(not($b2)) then false() else  trace($b1) and  $b2 },
       $allTrue := function($bools as xs:boolean*)
 { 
   fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
 }
   return $allTrue((true(), true(), true(), true(), true()))

produces true() and 5 messages, which shows that in this case there was no shortcutting (as expected, because all passed values were true())

Even if we add fn:while (which I would recommend), the possibility to interrupt loops while folding would still be helpful.

Summary of this comment:

We don't need to do anything in order to "interrupt the loop", if we are using fn:fold-right() passing to it a function that does shortcutting.

@ChristianGruen Is this answer clear and sufficient?

@michaelhkay
Copy link
Contributor

An anecdotal aside. 'In some dialects of Northern England, "while" is translated into standard English as "until"; for example, "At least wait while we're done." ' (See wikipedia, "while"). This once led in ICL to a rather expensive hardware design fault, due to Southerners and Northeners using the same word to mean different things.

@dnovatchev
Copy link
Contributor

An anecdotal aside. 'In some dialects of Northern England, "while" is translated into standard English as "until"; for example, "At least wait while we're done."

Yes, this is why I asked for explicit names with "...starting-at" and "...ending-at" :)

@dnovatchev
Copy link
Contributor

dnovatchev commented Oct 5, 2022

BTW, I see some issue in Saxon, the implementation of and is shortcutting if the value of $arg1 is false(), but it isn't shortcutting if the value of $arg2 is false() (and $arg1 is true()):

let $myAnd := function($b1, $b2) {  $b2 and trace($b1)    },
       $allTrue := function($bools as xs:boolean*)
 { 
   fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
 }
   return $allTrue((true(), true(), true(), true(), false()))

This does shortcutting -- produces just one trace message.

However the below doesn't do any shortcutting and produces 5 messages:

   
 let $myAnd := function($b1, $b2) {  $b1 and trace($b2)    },
       $allTrue := function($bools as xs:boolean*)
 { 
   fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
 }
   return $allTrue((true(), true(), true(), true(), false()))

@michaelhkay, Maybe a good opportunity for further optimizations in Saxon?

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Oct 5, 2022

@michaelhkay Wow. I always translated while as während in German, until (while?) I eventually noticed it has many more meanings.

but that will conflict with the scripting extension.

@benibela Thanks, I was not aware of that. That may be yet another reason to choose fn:until… Although I believe that while() would be syntactically valid. See the following code, which defines a function called for:

declare default function namespace 'whatever';
declare function for() { };
for()

@dnovatchev Thanks. We seem to elaborate on different things: By shortcutting, I believe I understand you mean that the “generating” and potentially expensive expression (i.e., the one that creates new results) will only be evaluated once as long as a certain condition was not successful.

My objective is to really stop the iteration once a condition is successful for the first time. See e..g the following expression:

fold-right(
  (((1 to 100000000000000000) ! true()), false()),
  true(),
  function($b1, $b2) { (if(not($b2)) then false() else trace($b1)) and $b2 }
)

trace will only be evaluated once (as $b1 will only be true once). However, the code will take a long time to terminate (no matter which processor you use, I assume), as the function body will be evaluated for all items of the sequence (no matter if you use fn:fold-left or fn:fold-right, by the way).

If we added a third argument, the evaluation could really be interrupted:

fold-right(
  (((1 to 100000000000000000) ! true()), false()),
  true(),
  function($b1, $b2) { $b2 },
  function($item) { $item }
)

We can certainly simulate fn:items-before with the existing fold functions, maybe like this …

let $stop := <STOP/>
let $result := fold-left((1 to 100), (), function($seq, $curr) {
  if($seq[last()][. instance of element()][. is $stop]) then (
    $seq
  ) else if($curr = 5) then (
    $seq, $stop
  ) else (
    $seq, $curr
  )
})
return $result[position() < last()]

…but if the input sequence is large, it may be very slow, and a recursive solution would be faster (provided that we don’t run into TCO issues). Apart from that, I think that my code is not very readable.

Maybe I missed something, though?

@dnovatchev
Copy link
Contributor

dnovatchev commented Oct 5, 2022

…but if the input sequence is large, it may be very slow, and a recursive solution would be faster (provided that we don’t run into TCO issues). Apart from that, I think that my code is not very readable.

@ChristianGruen It depends how we define "very slow".

I am running the code below on BaseX (Excellent XPath/XQuery processor, deep thanks!) and it checks for all-true() of 10 000 000 (ten million) booleans of which the last one is false() and it takes only about half a second.:

image

Thus, for my practical purposes using fn:fold-right() with a parameter a short-cutting function seems really fast enough.

This is on a sequence not even of 1M items, but 10 million items...

For practical purposes it would be extremely rare to use sequences with length even several hundred thousand items.

Knowing this, I am inclined to use my knowledge about fn:fold-right() and shortcutting and quite probably not to use something more complicated as additional constructs for breaking out of a fold on a condition.

I believe that even if there are such constructs for a fold, implementing such a break still would require exiting all currently nested loops (although doing nothing else in these loops), thus this would be comparable to what similarly happens with the e valuation of fn:fold-right() when the argument-function is shortcutting.

And it seems more "imperative" as someone kinda noticed, as this is like the break; out of a loop, and in order to get the full speed from such a break; one has to write truly tail-recursive code, ..., and do we forget that no existing XPath/XQuery spec has a definition of "tail-recursive"?

Without such a definition every implementor is free to use their imagination of what "tail-recursive"? might mean... The result of this might well be that code that runs "fast" on one implementation runs "slow" or even crashes on another.

So, please, don't get me wrong, I would simply want to avoid all such complexities and ambiguities.

@ChristianGruen
Copy link
Contributor Author

The execution time will mostly depend on the action. Checking booleans is certainly fast.

But performance is just one aspect. I know I asked before, but I’m still wondering how would you implement fn:items-before with XQuery 3.1 and higher-order functions?

@michaelhkay
Copy link
Contributor

michaelhkay commented Oct 5, 2022

I think in general it's dangerous to worry too much about performance when designing a specification. The spec should be designed for usability not for performance.

Dimitre has often asked for the spec to contain more rules about the implementation strategy (for example mandating tail recursion) and we have consistently resisted this. I think we are right to do so. Implementations should be free to innovate as much or as little as they wish, and to choose their own trade-offs in delivering performance. We've been learning recently how some of the micro-optimisations we make are counter-productive because they worsen branch-prediction rates at the hardware level, and reduce cache hit rates in the CPU. It's the job of implementors to understand those effects, not the job of specification writers.

The XSLT streaming work, of course, was all about designing a specification that enabled a certain class of optimisation. While it's delivering value to users who have that particular requirement, I don't think it has improved the language for the majority, which I feel should be our objective.

@ChristianGruen
Copy link
Contributor Author

@michaelhkay Very interesting!

I can imagine that both a new function and new FLWOR clauses can be useful. Maybe it’s a good idea to create a new issue for the proposed FLWOR enhancements?

@dnovatchev
Copy link
Contributor

Example 1

for $t at $p in //text() while $p < 1000 return upper-case($t)

processes the first 1000 text nodes. The semantics of while here are identical to where, but with less burden on the optimiser to infer that early exit is possible.

It seems you meant: "processes the first 999 text nodes" ?

In working through use cases for this extension, I find there is a need for the equivalent of the on-completion clause of xsl:iterate: something like a "finally return" clause.

An all-true() function could be implemented as

function all-true($input) {
     for $item in $input
     with $all initially true() then ($all and $item)
     while $all
     finally return $all
}

Seems there is no need to invent a new clause (finally). Why not write this just as:

function all-true($input) {
     for $item in $input
            with $all initially true() then ($all and $item)
        while $all
      return $all
}

And equally useful we can have an until clause:

function items-before($input, $pred) {
     for $item in $input until $pred($item)
            with $result initially () then ($result , $item)
      return $result
}

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Oct 6, 2022

I’ll relocate parts of this comment if we decide to have an extra issue for while and with.

I had some thoughts on the proposed FLWOR enhancements:

  • While (again that word) I think that enhancing FLWOR expressions is compelling, my impression is that it would, at this late stage, have pervasive effects on existing implementations.
  • The with clause would introduce mutable variables to XQuery, which would, from a functional perspective, be a rather fundamental change to the semantics of the language.
  • If an iteration is interrupted with while, we’ll need to consider that variables in the return clause have not been assigned yet.
  • We would need to understand the implications on all existing clauses(mostlygroup by` and window clauses) or disallow certain combinations.

Below, I try to contrast the three variants – fn:while, (optionally interruptible) fn:fold-left and enhanced FLWORs – for some use cases presented in this issue:

A. Compute square root

let $INPUT := 1 to 20
let $INITIAL := 3936256
let $THEN := ->($item) { ($item + $INITIAL div $item) div 2 }
let $TEST := ->($item) { abs($item * $item - $INITIAL) >= 0.0000000001 }

return (
  (: fn:while :)
  while($INITIAL, $TEST, $THEN),
  
  (: interruptible fn:fold-left :)
  fold-left($RUNS, $INITIAL, ->($item, $_) { $THEN($item) }, $TEST),

  (: enhanced FLWOR :)
  for $_ in $RUNS
  with $item initially $INITIAL then $THEN($item)
  while $TEST($item)
  return $item
)

B. items-before, items-ending-with

(: items-before / items-ending-with :)
let $INPUT := 1 to 20
let $TEST := ->($item) { $item <= 5 }

return (
  (: fn:while :)
  while($INPUT, => { $TEST(foot(.)) }, => { truncate(.) }),
  
  (: interruptible fn:fold-left :)
  fold-left($INPUT, (), op(','), $TEST),

  (: enhanced FLWOR :)
  for $item in $INPUT
  with $sub initially $item then ($sub, $item)
  while $TEST($item)
  return $sub
)

C. all-true

(: return items until test fails (items-before) :)
let $INPUT := 1 to 20
let $INITIAL := true()
let $THEN := op('and')
let $TEST := boolean#1

return (
  (: interruptible fn:fold-left :)
  fold-left($INPUT, $INITIAL, $THEN, $TEST),

  (: enhanced FLWOR :)
  for $item in $INPUT
  with $all initially $INITIAL then $THEN($all, $item)
  while $TEST($all)
  return $all
)

D. Compute the product of a sequence of numbers

(: return items until test fails (items-before) :)
let $INPUT := 1 to 20
let $INITIAL := 1
let $THEN := op('*')

return (
  (: fn:fold-left :)
  fold-left($INPUT, $INITIAL, $THEN),

  (: enhanced FLWOR :)
  for $item in $INPUT
  with $result initially $INITIAL then $THEN($result, $item)
  return $result
)

fn:while actually differs a lot from the other two proposals: The (current version of my) proposal does not provide for the iteration of a sequence. Instead, it’s a purely functional way to change values in a loop until a certain condition is given.

while/until constructs can be found in functional languages (Haskell, F#, probably others), and I believe their design is perfectly functional: The input and output is well-defined; no variables are reassigned; parallelism is not suppressed as each iteration builds upon the previous one; etc.

However, I can (in some way) understand purists who are brave enough to condemn the usage of folds, for/while/until loops and other iterative concepts in functional code. And @liamquin I think I can see what you mean (sorry if I interpret you wrongly): Iterative code enforces a certain way of thinking that derives from the procedural world, and adding further support for iterative built-in functions may push that thinking.

@dnovatchev
Copy link
Contributor

dnovatchev commented Oct 6, 2022

I'm wondering if the benefits of xsl:iterate, and the functionality proposed here, wouldn't be better achieved by means of enhancements to the FLWOR expression. Specifically, I'm thinking of two new clauses:

I am sure @liamquin would not like this... the same way as I personally strongly dislike xsl:iterate.

But rather than regarding such inventions as designed as a crutch/kludge for people that are not confident they could grasp FP, as shown by @ChristianGruen in his comment putting them side by side: #80 (comment), one could regard them (the inventions) as just shorthand(s) for the proper FP recursive construct. Thus it would be a matter of taste which of the two variants (verbose vs. functional) to use, and everyone could be happy.

The only thing I am afraid might happen is that numerous edge cases could emerge and explaining them in the Spec will take so much space as to make the whole description unfathomable.

@michaelhkay
Copy link
Contributor

michaelhkay commented Oct 6, 2022

It's frustrating that this forum is so one-dimensional - you can't comment on a comment. Christian was right, I should have started a new thread.

The problem with the formulation

      for $item in $input
            with $all initially true() then ($all and $item)
        while $all
      return $all

is that every time an item is processed, a boolean (the current value of $all) is emitted.

It's a mistake to think of $all as a mutable variable. It's exactly like $item, it's a variable that is part of the tuple stream, and that has different values in different tuples. The only difference is that unlike most FLWOR clauses, there is visibility of the variables in the previous tuple.

Of course Dimitre is right, this is a syntactic sweetener for those who find recursion and higher-order functions intellectually challenging. There are plenty of such people, and xsl:iterate has been very popular with them; I'm happy to consider such people as part of my target user base.

@dnovatchev
Copy link
Contributor

dnovatchev commented Oct 6, 2022

The problem with the formulation

      for $item in $input
            with $all initially true() then ($all and $item)
        while $all
      return $all

is that every time an item is processed, a boolean (the current value of $all) is emitted.

I thought that would be the case if return $all was inside the for ... while, like this:

      for $item in $input
            with $all initially true() then ($all and $item)
           return $all
        while $all

If using return after for ... while is problematic, then let us use another word, such as:

submit $all

or

yield $all

It would be not a good use for finally, because usually it is related to graceful exit from exceptions (crashes, the end of the world ...)

In other words:

    (:   Expr 1  :)
      for $item in $input
            with $all initially true() then ($all and $item)
        while $all
      yield $all

would be a shorthand for:

      (for $item in $input
            with $all initially true() then ($all and $item)
          while $all
        return $all
      )[last()]

And because folds can generally produce on each step not just a single item but a sequence, it is good to have the final result be an array, each member of which contains the result of the corresponding step in the evaluation. then the expression for which we try to provide a more convenient shorthand the "Expr 1" (above) will be:

      array:foot(
                    (for $item in $input
                        with $all initially true() then ($all and $item)
                      while $all
                    return $all
                      )
                )

And there is even a better expression than Expr 1, if we replace the keyword "with" with "with-accumulator":

    (:   Expr 2  :)
      for $item in $input
            with-accumulator $all initially true() then ($all and $item)
        while $all
      yield-accumulator

@liamquin
Copy link

liamquin commented Oct 6, 2022

I use xsl:iterate and teach it; i'd have preferred a name that suggested it was syntactic sugar for recursion.

We should remember XQuery implementations in which the FLWOR tuple stream is not generated in the "intuitive" order. In the absence of order by, an early termination doesn't necessarily mean you got all the values you wanted - the impolementation would thus have to generate them all. But then, i don't know that the RDBMS people are interested in XQuery or XPath these days.

i like the idea of with-accumulator, as that's getting closer to looking like part of XSLT 3 but not in a way that might necessitate pervasive changes in XQuery implementations.

@ChristianGruen
Copy link
Contributor Author

Here’s a use case for fn:while that I came across just today when checking for non-existing files in a directory. Slightly simplified:

(: Find the first number in a sequence that is missing :)
let $values := (1 to 999, 1001 to 2000)
return while(1, -> { . = $values }, -> { . + 1 })

@dnovatchev
Copy link
Contributor

The execution time will mostly depend on the action. Checking booleans is certainly fast.

But performance is just one aspect. I know I asked before, but I’m still wondering how would you implement fn:items-before with XQuery 3.1 and higher-order functions?

@ChristianGruen and @michaelhkay

Actually, if the operands on the function call are evaluated lazily, (that is, only on demand) then fn:foldr() must return immediately when the passed as parameter $f makes a shortcut, as per the definition from the official XPath 3,1 Spec:

<xsl:function name="fn:fold-right" as="item()*">
  <xsl:param name="seq" as="item()*"/>
  <xsl:param name="zero" as="item()*"/>
  <xsl:param name="f" as="function(item(), item()*) as item()*"/>
  <xsl:choose>
    <xsl:when test="fn:empty($seq)">
      <xsl:sequence select="$zero"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="$f(fn:head($seq), fn:fold-right(fn:tail($seq), $zero, $f))"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

Note this excerpt from the above:

    <xsl:otherwise>
      <xsl:sequence select="$f(fn:head($seq), fn:fold-right(fn:tail($seq), $zero, $f))"/>
    </xsl:otherwise>

This means that $f is op('and'), and its 2nd argument is executed lazily, then

fold-right((false(), (1 to 10000000000000000)!true())),  true, $f)

will immediately produce false() on the first shortcut and no inner calls to fold-right() will be evaluated.

We get immediate shortcut and evaluation time ~ 0.

Conclusion: Let us specify that the arguments of a function call are evaluated lazily!

@ChristianGruen
Copy link
Contributor Author

@dnovatchev I don't believe implementation details should be defined in the specification. Next, there are good reasons to choose an iterative implementation for folds, as not all tasks can be implemented end-recursively.

How would you implement the use case from my last example with fn:fold-right?

@ChristianGruen
Copy link
Contributor Author

One more request, regarding the latest comments on fold-right and various past discussions in other issues:

I think we should try to focus on the proposal that’s given in the initial comment. If a suggestion has been inspired by an existing issue, it can certainly be valuable as well, but we should propose we discuss it either in the chat or create a new issue and reference the old one. Otherwise, no one, even those directly involved, will eventually read through all comments and understand the rationale of the outcome.

@michaelhkay
Copy link
Contributor

I think the best course of action when this kind of thing happens is to close the issue and open a new issue (or several, if an independent idea has developed during the discussion) that make a new proposal, taking any useful feedback into account, and perhaps summarising your responses to comments made on the original proposal.

@dnovatchev
Copy link
Contributor

@dnovatchev I don't believe implementation details should be defined in the specification. Next, there are good reasons to choose an iterative implementation for folds, as not all tasks can be implemented end-recursively.

@ChristianGruen

Right, there will be a new, separate proposal for Lazy Evaluation of Arguments (LEA) of function calls. For now, again the essence of this is that no detection is necessary that a shortcutting has happened inside the evaluation of a fold-right(). Just following the definition of fold-right() and having LEA, results in the immediate completion of the evaluation when there is a shortcut.

So, given:

fold-right( (false(), (1 to 1000000000000000000000000000000000000) ! true() ),   true(),   op('and') )

this is evaluated as:

false() and fold-right($notEvaluatedNotAvailableYetTailArgument, true(), op('and') )

And due to the shortcutting of and (and LEA), this immediately evaluates to false(), thus the 2nd argument to this and: (fold-right($notEvaluatedNotAvailableYetTailArgument, true(), op('and')` is completely ignored (skipped, not evaluated at all)

I have seen some people wonder: "why on Earth fn:fold-right() exists at all", and they would be right were it not for shortcutting and lazy argument evaluation, as it is in Haskell, from which foldr was borrowed, renamed and its signature (order of arguments) adjusted.

ChristianGruen added a commit to ChristianGruen/qtspecs that referenced this issue Oct 13, 2022
michaelhkay added a commit to qt4cg/qt4tests that referenced this issue Oct 13, 2022
ChristianGruen added a commit to ChristianGruen/qtspecs that referenced this issue Oct 13, 2022
@ChristianGruen ChristianGruen added this to the QT 4.0 milestone Oct 14, 2022
ChristianGruen added a commit to ChristianGruen/qtspecs that referenced this issue Oct 25, 2022
@dnovatchev
Copy link
Contributor

I have re-read the proposal as specified here: https://qt4cg.org/pr/210/xpath-functions-40/Overview.html#func-while

I think that it would be good to have two functions named: iterate-while and iterate-until with the main difference between them being where the guard/condition is evaluated: for iterate-while the guard is evaluated before the current iteration, and for iterate-until the guard is evaluated after the current iteration.

I would recommend these names for the 2nd and 3rd argument:

  • guard instead of predicate
  • transformation (or transform-step) instead of action . People may be accustomed from other PLs that action is a function that doesn't produce any result and has only side-effects. But here we have a function that transforms the input on each step.

@ChristianGruen
Copy link
Contributor Author

@dnovatchev Note that until in Haskell evaluates the test before the first iteration.

@dnovatchev
Copy link
Contributor

@ChristianGruen OK, but we must define this clearly in order to avoid any misunderstanding.

Then will this be true:

iterate-until($input, $guard, $tranform-step) can be rewritten as:

iterate-while($input, not($guard), $transform-step)

@ChristianGruen
Copy link
Contributor Author

We could certainly choose a behavior that differs from Haskell if there's a common agreement that it'll be more intuitive.

we must define this clearly in order to avoid any misunderstanding.

The current draft in #210 is based on the suggestions given by Michael Kay. My initial version can be found here:

04ea1b5

I'll be happy to revise the the description and rule set in the current draft.

@ChristianGruen
Copy link
Contributor Author

Closed; fn:iterate-while was added to the specification.
A new issue may be opened for fn:iterate-until.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

6 participants