-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FO] fn:while (before: fn:until) #80
Comments
This seems similar in concept to xsl:iterate, which has certainly proved very useful in cases where recursion would otherwise be needed. But in most cases xsl:iterate is used to process items in an input sequence until some condition occurs, and that use case doesn't seem easy to achieve here. My feeling is that as proposed, fn:until is too specialised to be included. More use cases might convince me otherwise. |
I would claim that while(
1 to 10000,
function($seq) { head($seq) <= 5000 },
function($seq) { tail($seq) }
) Or, taking advantage of various new proposals: while(1 to 10000, => { head(.) <= 5000 }, => { tail(.) })
while(1 to 10000, => { foot(.) >= 5000 }, => { init(.) }) …or, with the classical notation and while(
1 to 10000,
function($seq) { slice($seq, -1) >= 5 },
function($seq) { slice($seq, (), -2) }
) It could also be written as follows: let $items := 1 to 1000
let $last := while(1, -> { subsequence($items, ., 1) != 5 }, -> { . + 1 })
return subsequence($items, 1, $last)
In short, |
Thanks for the additional use case. Yes, it's more flexible than I realised. But only having a single "variable" seems a bit limiting. For example, could you implement |
I would probably do it with let $input := (11 to 21, 21 to 31)
let $search := 21
return fold-left(1 to count($input), (), function($seq, $curr) {
$seq, if($input[$curr] = $search) then $curr else ()
}) For both fold and while/until, we could add a parameter that gives us the current position in the sequence: let $input := (11 to 21, 21 to 31)
let $search := 21
return fold-left($input, (), ->($seq, $curr, $pos) { $seq, $pos[$curr = $search] }) The proposal was “inspired by” (or copied from) a function in Haskell, which has an enviably compact notation: http://zvon.org/other/haskell/Outputprelude/until_f.html → See #181 for a new proposal. |
Or a scripting extension with local variables could do that It could look like:
|
Reminds me of the XQuery Scripting Extension. I think Zorba was the only processor with a full implementation. |
I think |
Most higher-order functions revolve around sequences. If we want to preserve this focus, we could extend the existing fold functions with an additional argument for interrupting the loop. The example of the initial comment could then be written as follows: let $input := 3936256
return fold-left(
1 to 1000, (: maximum number of loops :)
$input,
(: compute new value :)
function($guess, $current-item) { ($guess + $input div $guess) div 2 },
(: test if loop should be interrupted :)
function($result) { abs($result * $result - $input) >= 0.0000000001 }
) Even if we add |
Just using Thus no need to complicate even further |
I'm not sure about that. Can you give an example? |
Here is the definition of foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs) So, the first application of For example, if
evaluates just (immediately on the first application of and immediately returns |
Thanks for the explanation, Dimitre. Pardon my ignorance, but I still wonder how that would look like in XPath or XQuery. Could you please add an example that's not based on Haskell? How would you e.g. realize |
but that will conflict with the scripting extension. unless the parser can look ahead to find the closing ) and check if there is a { following after it You could call it fold-left-while
The scripting extension had a break expression for that in early drafts. Although it got removed
|
Here is one XPath 3.1 executable: let $myAnd := function($b1, $b2) { if(not($b2)) then false() else trace($b1) and $b2 },
$allTrue := function($bools as xs:boolean*)
{
fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
}
return $allTrue((true(), true(), true(), true(), false())) Running this produces However running this: let $myAnd := function($b1, $b2) { if(not($b2)) then false() else trace($b1) and $b2 },
$allTrue := function($bools as xs:boolean*)
{
fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
}
return $allTrue((true(), true(), true(), true(), true())) produces
Summary of this comment: We don't need to do anything in order to "interrupt the loop", if we are using @ChristianGruen Is this answer clear and sufficient? |
An anecdotal aside. 'In some dialects of Northern England, "while" is translated into standard English as "until"; for example, "At least wait while we're done." ' (See wikipedia, "while"). This once led in ICL to a rather expensive hardware design fault, due to Southerners and Northeners using the same word to mean different things. |
Yes, this is why I asked for explicit names with "...starting-at" and "...ending-at" :) |
BTW, I see some issue in Saxon, the implementation of let $myAnd := function($b1, $b2) { $b2 and trace($b1) },
$allTrue := function($bools as xs:boolean*)
{
fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
}
return $allTrue((true(), true(), true(), true(), false())) This does shortcutting -- produces just one trace message. However the below doesn't do any shortcutting and produces 5 messages:
let $myAnd := function($b1, $b2) { $b1 and trace($b2) },
$allTrue := function($bools as xs:boolean*)
{
fold-right($bools, true(), function($arg1, $arg2) {$myAnd($arg1, $arg2)})
}
return $allTrue((true(), true(), true(), true(), false())) @michaelhkay, Maybe a good opportunity for further optimizations in Saxon? |
@michaelhkay Wow. I always translated
@benibela Thanks, I was not aware of that. That may be yet another reason to choose declare default function namespace 'whatever';
declare function for() { };
for() @dnovatchev Thanks. We seem to elaborate on different things: By shortcutting, I believe I understand you mean that the “generating” and potentially expensive expression (i.e., the one that creates new results) will only be evaluated once as long as a certain condition was not successful. My objective is to really stop the iteration once a condition is successful for the first time. See e..g the following expression: fold-right(
(((1 to 100000000000000000) ! true()), false()),
true(),
function($b1, $b2) { (if(not($b2)) then false() else trace($b1)) and $b2 }
)
If we added a third argument, the evaluation could really be interrupted: fold-right(
(((1 to 100000000000000000) ! true()), false()),
true(),
function($b1, $b2) { $b2 },
function($item) { $item }
) We can certainly simulate let $stop := <STOP/>
let $result := fold-left((1 to 100), (), function($seq, $curr) {
if($seq[last()][. instance of element()][. is $stop]) then (
$seq
) else if($curr = 5) then (
$seq, $stop
) else (
$seq, $curr
)
})
return $result[position() < last()] …but if the input sequence is large, it may be very slow, and a recursive solution would be faster (provided that we don’t run into TCO issues). Apart from that, I think that my code is not very readable. Maybe I missed something, though? |
@ChristianGruen It depends how we define "very slow". I am running the code below on BaseX (Excellent XPath/XQuery processor, deep thanks!) and it checks for Thus, for my practical purposes using This is on a sequence not even of 1M items, but 10 million items... For practical purposes it would be extremely rare to use sequences with length even several hundred thousand items. Knowing this, I am inclined to use my knowledge about I believe that even if there are such constructs for a fold, implementing such a break still would require exiting all currently nested loops (although doing nothing else in these loops), thus this would be comparable to what similarly happens with the e valuation of And it seems more "imperative" as someone kinda noticed, as this is like the Without such a definition every implementor is free to use their imagination of what "tail-recursive"? might mean... The result of this might well be that code that runs "fast" on one implementation runs "slow" or even crashes on another. So, please, don't get me wrong, I would simply want to avoid all such complexities and ambiguities. |
The execution time will mostly depend on the action. Checking booleans is certainly fast. But performance is just one aspect. I know I asked before, but I’m still wondering how would you implement |
I think in general it's dangerous to worry too much about performance when designing a specification. The spec should be designed for usability not for performance. Dimitre has often asked for the spec to contain more rules about the implementation strategy (for example mandating tail recursion) and we have consistently resisted this. I think we are right to do so. Implementations should be free to innovate as much or as little as they wish, and to choose their own trade-offs in delivering performance. We've been learning recently how some of the micro-optimisations we make are counter-productive because they worsen branch-prediction rates at the hardware level, and reduce cache hit rates in the CPU. It's the job of implementors to understand those effects, not the job of specification writers. The XSLT streaming work, of course, was all about designing a specification that enabled a certain class of optimisation. While it's delivering value to users who have that particular requirement, I don't think it has improved the language for the majority, which I feel should be our objective. |
@michaelhkay Very interesting! I can imagine that both a new function and new FLWOR clauses can be useful. Maybe it’s a good idea to create a new issue for the proposed FLWOR enhancements? |
It seems you meant: "processes the first 999 text nodes" ?
Seems there is no need to invent a new clause (
And equally useful we can have an
|
I’ll relocate parts of this comment if we decide to have an extra issue for I had some thoughts on the proposed FLWOR enhancements:
Below, I try to contrast the three variants – A. Compute square rootlet $INPUT := 1 to 20
let $INITIAL := 3936256
let $THEN := ->($item) { ($item + $INITIAL div $item) div 2 }
let $TEST := ->($item) { abs($item * $item - $INITIAL) >= 0.0000000001 }
return (
(: fn:while :)
while($INITIAL, $TEST, $THEN),
(: interruptible fn:fold-left :)
fold-left($RUNS, $INITIAL, ->($item, $_) { $THEN($item) }, $TEST),
(: enhanced FLWOR :)
for $_ in $RUNS
with $item initially $INITIAL then $THEN($item)
while $TEST($item)
return $item
) B. items-before, items-ending-with(: items-before / items-ending-with :)
let $INPUT := 1 to 20
let $TEST := ->($item) { $item <= 5 }
return (
(: fn:while :)
while($INPUT, => { $TEST(foot(.)) }, => { truncate(.) }),
(: interruptible fn:fold-left :)
fold-left($INPUT, (), op(','), $TEST),
(: enhanced FLWOR :)
for $item in $INPUT
with $sub initially $item then ($sub, $item)
while $TEST($item)
return $sub
) C. all-true(: return items until test fails (items-before) :)
let $INPUT := 1 to 20
let $INITIAL := true()
let $THEN := op('and')
let $TEST := boolean#1
return (
(: interruptible fn:fold-left :)
fold-left($INPUT, $INITIAL, $THEN, $TEST),
(: enhanced FLWOR :)
for $item in $INPUT
with $all initially $INITIAL then $THEN($all, $item)
while $TEST($all)
return $all
) D. Compute the product of a sequence of numbers(: return items until test fails (items-before) :)
let $INPUT := 1 to 20
let $INITIAL := 1
let $THEN := op('*')
return (
(: fn:fold-left :)
fold-left($INPUT, $INITIAL, $THEN),
(: enhanced FLWOR :)
for $item in $INPUT
with $result initially $INITIAL then $THEN($result, $item)
return $result
)
However, I can (in some way) understand purists who are brave enough to condemn the usage of folds, for/while/until loops and other iterative concepts in functional code. And @liamquin I think I can see what you mean (sorry if I interpret you wrongly): Iterative code enforces a certain way of thinking that derives from the procedural world, and adding further support for iterative built-in functions may push that thinking. |
I am sure @liamquin would not like this... the same way as I personally strongly dislike But rather than regarding such inventions as designed as a crutch/kludge for people that are not confident they could grasp FP, as shown by @ChristianGruen in his comment putting them side by side: #80 (comment), one could regard them (the inventions) as just shorthand(s) for the proper FP recursive construct. Thus it would be a matter of taste which of the two variants (verbose vs. functional) to use, and everyone could be happy. The only thing I am afraid might happen is that numerous edge cases could emerge and explaining them in the Spec will take so much space as to make the whole description unfathomable. |
It's frustrating that this forum is so one-dimensional - you can't comment on a comment. Christian was right, I should have started a new thread. The problem with the formulation
is that every time an item is processed, a boolean (the current value of $all) is emitted. It's a mistake to think of $all as a mutable variable. It's exactly like $item, it's a variable that is part of the tuple stream, and that has different values in different tuples. The only difference is that unlike most FLWOR clauses, there is visibility of the variables in the previous tuple. Of course Dimitre is right, this is a syntactic sweetener for those who find recursion and higher-order functions intellectually challenging. There are plenty of such people, and |
I thought that would be the case if
If using
or
It would be not a good use for In other words: (: Expr 1 :)
for $item in $input
with $all initially true() then ($all and $item)
while $all
yield $all would be a shorthand for: (for $item in $input
with $all initially true() then ($all and $item)
while $all
return $all
)[last()] And because folds can generally produce on each step not just a single item but a sequence, it is good to have the final result be an array, each member of which contains the result of the corresponding step in the evaluation. then the expression for which we try to provide a more convenient shorthand the "Expr 1" (above) will be: array:foot(
(for $item in $input
with $all initially true() then ($all and $item)
while $all
return $all
)
) And there is even a better expression than (: Expr 2 :)
for $item in $input
with-accumulator $all initially true() then ($all and $item)
while $all
yield-accumulator |
I use xsl:iterate and teach it; i'd have preferred a name that suggested it was syntactic sugar for recursion. We should remember XQuery implementations in which the FLWOR tuple stream is not generated in the "intuitive" order. In the absence of order by, an early termination doesn't necessarily mean you got all the values you wanted - the impolementation would thus have to generate them all. But then, i don't know that the RDBMS people are interested in XQuery or XPath these days. i like the idea of with-accumulator, as that's getting closer to looking like part of XSLT 3 but not in a way that might necessitate pervasive changes in XQuery implementations. |
Here’s a use case for (: Find the first number in a sequence that is missing :)
let $values := (1 to 999, 1001 to 2000)
return while(1, -> { . = $values }, -> { . + 1 }) |
@ChristianGruen and @michaelhkay Actually, if the operands on the function call are evaluated lazily, (that is, only on demand) then <xsl:function name="fn:fold-right" as="item()*">
<xsl:param name="seq" as="item()*"/>
<xsl:param name="zero" as="item()*"/>
<xsl:param name="f" as="function(item(), item()*) as item()*"/>
<xsl:choose>
<xsl:when test="fn:empty($seq)">
<xsl:sequence select="$zero"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="$f(fn:head($seq), fn:fold-right(fn:tail($seq), $zero, $f))"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function> Note this excerpt from the above: <xsl:otherwise>
<xsl:sequence select="$f(fn:head($seq), fn:fold-right(fn:tail($seq), $zero, $f))"/>
</xsl:otherwise> This means that $f is op('and'), and its 2nd argument is executed lazily, then fold-right((false(), (1 to 10000000000000000)!true())), true, $f) will immediately produce We get immediate shortcut and evaluation time ~ 0. Conclusion: Let us specify that the arguments of a function call are evaluated lazily! |
@dnovatchev I don't believe implementation details should be defined in the specification. Next, there are good reasons to choose an iterative implementation for folds, as not all tasks can be implemented end-recursively. How would you implement the use case from my last example with |
One more request, regarding the latest comments on fold-right and various past discussions in other issues: I think we should try to focus on the proposal that’s given in the initial comment. If a suggestion has been inspired by an existing issue, it can certainly be valuable as well, but we should propose we discuss it either in the chat or create a new issue and reference the old one. Otherwise, no one, even those directly involved, will eventually read through all comments and understand the rationale of the outcome. |
I think the best course of action when this kind of thing happens is to close the issue and open a new issue (or several, if an independent idea has developed during the discussion) that make a new proposal, taking any useful feedback into account, and perhaps summarising your responses to comments made on the original proposal. |
Right, there will be a new, separate proposal for Lazy Evaluation of Arguments (LEA) of function calls. For now, again the essence of this is that no detection is necessary that a shortcutting has happened inside the evaluation of a So, given: fold-right( (false(), (1 to 1000000000000000000000000000000000000) ! true() ), true(), op('and') ) this is evaluated as:
And due to the shortcutting of I have seen some people wonder: "why on Earth |
XQFO4: fn:while (qt4cg/qtspecs#80)
I have re-read the proposal as specified here: https://qt4cg.org/pr/210/xpath-functions-40/Overview.html#func-while I think that it would be good to have two functions named: I would recommend these names for the 2nd and 3rd argument:
|
@dnovatchev Note that |
@ChristianGruen OK, but we must define this clearly in order to avoid any misunderstanding. Then will this be true:
|
We could certainly choose a behavior that differs from Haskell if there's a common agreement that it'll be more intuitive.
The current draft in #210 is based on the suggestions given by Michael Kay. My initial version can be found here: I'll be happy to revise the the description and rule set in the current draft. |
Closed; |
fn:while → fn:iterate-while. qt4cg/qtspecs#80
Motivation
Similar to
fold-left
, the function allows for an alternative writing of code that would otherwise be solved recursively, and that would possibly cause stack overflows without tail call optimizations.In contrast to sequence-processing functions (fold functions,
for-each
,filter
, others), the initial input offn:while
can be arbitrary and will not determine the number of maximum iterations.Summary
Applies the predicate function
$test
to$input
. If the result isfalse
,$action
is invoked with the start value – or, subsequently, with the result of this function – until the predicate function returnsfalse
.Signature
Edit: The
$input
argument (before:$zero
) is now defined as first parameter.Examples / Use Cases
Calculate the square root of a number by iteratively improving an initial guess:
Find the first number that does not occur in a sequence:
Equivalent Expression
The text was updated successfully, but these errors were encountered: