Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generators in XPath #716

Open
dnovatchev opened this issue Sep 20, 2023 · 79 comments
Open

Generators in XPath #716

dnovatchev opened this issue Sep 20, 2023 · 79 comments
Labels
Discussion A discussion on a general topic. Feature A change that introduces a new feature PRG-optional Categorized as "optional for 4.0" at the Prague f2f, 2024 Propose for V4.0 The WG should consider this item critical to 4.0 XPath An issue related to XPath XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Sep 20, 2023

What is a generator?

Generators are well known and provided out of the box in many programming languages. Per Wikipedia:

“In computer science, a generator is a routine that can be used to control the iteration behaviour of a loop. All generators are also iterators.[1] A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values.
However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.”


The goal of this proposal (major use-cases)

A generator in XPath should be a tool to easily implement the solutions to the following use-cases:

  1. Processing a huge collection whose members may not all be needed.
    A generator will produce only the next member of the collection and only on demand basis.

  2. Handling a collection containing unknown or infinite number of members.
    When requested the next member of the collection the generator will always produce it, if the collection still contains any members. It is the responsibility of the caller to issue only the necessary number of requests for the really needed next members.

What is achieved in both cases:

  • A (next) member is produced only on request. No time is spent on producing all members of the collection.
  • A (next) member is produced only on request. No memory is consumed to store all members of the collection.

A good problem that is based on these use-cases is to generate a collection of the first N members that have some wanted properties, and are generated from other collection(s), when it is not known what the size of the original input collections would be in order for the desired number of N members to be discovered.

For example: Produce the first 1 000 000 (1M) prime numbers.

Sometimes we may not even know if N such wanted members actually exist, for example: Produce the first 2 sequences of 28 prime numbers where the primes in each of the sequences form an arithmetic progression.


The Proposal

A generator is defined as (and synonym for):

let $generator as record
                   (initialized as xs:boolean,
                    endReached as xs:boolean,
                    getCurrent as function(..) as item()*,
                    moveNext as function(..) as .. ,
                    *  ) 

A generator is an extensible record .

It has four fixed-named keys, and any other map-keys, as required to hold the internal state of that specific generator.

Here is the meaning of the four fixed/named keys:

  • initialized is a boolean. When a generator $gen is initially instantiated, $gen?initialized is false(). Any call to $gen?getCurrent() raises an error. In order to get the first value of the represented collection, the caller must call $gen?moveNext()

  • endReached is a boolean. If after a call to moveNext() the value of the returned generator's endReached key is true() then calling moveNext() and/or getCurrent() on this generator raises an error.

  • getCurrent is a function of zero arguments. It must only be called if the values of initialized is true() and the value of endReached is false(), otherwise an error must be raised. This function produces the current member of the collection after the last call to moveNext, if this call didn't return a generator whose endReached value was true()

  • moveNext is a function of zero arguments. When called on a generator whose endReached value is false() then it produces the next (state of the) generator. including a possibly true() value of endReached and if this value is still false(), then calling getCurrent() produces the value of the next member of the collection.

Examples of operations on generators

The following examples are written in pseudo-code as at the time of writing there was no available implementation of records. And also, the code for recursion in pure XPath makes any such example longer than necessary for grasping its meaning.

The Empty Generator

emptyGenerator() {
             map{
                 initialized : true(),
                 endReached: true(),
                 getCurrent: function($this as map(*)) {error()},
                 moveNext: function($this as map(*)) {error()}
                }
              }

Take the first N members of the collection

take($gen as generator, $n as xs:integer) as generator
{
  let $gen := if(not($gen?initialized)) then $gen?moveNext()
                      else $gen,
      return
         if( $gen?endReached or $n eq 0) then emptyGenerator()
            else map:put($gen, "moveNext", take($gen?moveNext($gen) ), $n -1  )
}

Skip the first N members from the collection

skip($gen as generator, $n as xs:integer) as generator
{
  if($n eq 0) then $gen
     else
     {
         let $gen := if(not($gen?initialized)) then $gen?moveNext()
                       else $gen
           return
             if(not($gen?endReached) then skip($gen?moveNext(), $n -1)
               else $gen
     }             
}

Subrange of size N starting at the M-th member

subrange($gen as generator, $m as xs:integer, $n as xs:integer) as generator
{
  take(skip($gen, $m -1), $n)
}

Head of a generator

head($gen as generator)
{
  take($gen, 1)?getCurrent()
}

Tail of a generator

tail($gen as generator)
{
  skip($gen, 1)
}

At index N

at($ind as xs:integer)
{
  subrange($ind, 1)?getCurrent()
}

For Each

for-each($gen as generator, $fun as function(*))
{
   map:put($gen, "getCurrent", function() { $fun($gen?getCurrent()) }  )                              
}

For Each Pair

for-each-pair($gen1 as generator, $gen2 as generator, $fun as function(*))
{
   let $gen1 := if(not($gen1?initialized)) then $gen1?moveNext()
                  else $gen1,
       $gen2 := if(not($gen2?initialized)) then $gen2?moveNext()
                  else $gen2,
    return
      if($gen1?endReached or $gen2?endReached) then map:put($gen1, "endReached", true())
        else map:put(map:put($gen1, "getCurrent", function() { $fun($gen1?getCurrent(), $gen2?getCurrent()) } ) ,
                     "moveNext", function() { for-each-pair(skip($gen1, 1), skip($gen2, 1), $fun)}
                    )                             
}

Filter

filter($gen as generator, $pred as function(item()*) as xs:boolean)
{
     let $getNextGoodValue := function($gen as map(*), $pred as function(item()*) as xs:boolean)
         {
            let $mapResult := iterate-while(
                                            $gen,
                                            function($gen) { not($pred($gen?getCurrent($gen))) },
                                            function($gen) { $gen?moveNext($gen) }
                                            )   
            return $mapResult?getCurrent($mapResult)                     
         },
       $gen := if($gen?initialized) then $gen 
                      else $gen?moveNext($gen)
        return
          map {
               "initialized": true(),
               "endReached":  $gen?endReached,
               "getCurrent": function($this as map(*)) { $getNextGoodValue($this?inputGen, $pred) },
               "moveNext":   function($this as map(*))
                             {    let $nextGoodValue := $getNextGoodValue($this?inputGen?moveNext($this?inputGen), $pred),
                                      $nextGen := iterate-while(
                                                                $this?inputGen?moveNext($this?inputGen),
                                                                function($gen) { not($pred($gen?getCurrent($gen))) },
                                                                function($gen) { $gen?moveNext($gen) }
                                                                )
                                    return
                                      map {
                                            "initialized": $nextGen?initialized,
                                            "endReached":  $nextGen?endReached,
                                            "getCurrent" : function($x) {$nextGoodValue},
                                            "moveNext" :   $this?moveNext,
                                            "inputGen" :   $nextGen
                                           }
                             },
               "inputGen" : $gen
              }
        }

Here are some other useful functions on generators -- with just their signature and summary:

  • concat($gen1 as generator , $gen2 as generator ) - produces a generator that behaves as $gen1 until $gen1.endReached becomes true(), and then behaves as $gen2

  • append($gen as generator, $value as item()*) - produces a generator that behaves as $gen until $gen.endReached becomes true(), and then as a generator that has only the single value value.

  • prepend($gen as generator, $value as item()*) - produces a generator whose first value is value and then behaves as $gen.

  • some($gen as generator) as xs:boolean - Produces true() if $gen has at least one value, and false() otherwise.

  • some($gen as generator, $pred as function(item()*) as xs:boolean) as xs:boolean - Produces true() if $gen has at least one value for which $pred($thisValue) is true(), and false() otherwise.

  • ofType($gen as generator, $type as type) - Produces a new generator from $gen that contains all values from $gen of type type -- for this we need to have added to the language the type object.

  • skipWhile($gen as generator, $pred as function(item()*) as xs:boolean) - Produces a new generator from $gen by skipping all starting values for which $pred($theValue) is true().

  • takeWhile($gen as generator, $pred as function(item()*) as xs:boolean) - Produces a new generator from $gen which contains all starting values of $gen for which $pred($theValue) is true().

  • toArray($gen as generator) - Produces an array that contains all values that are contained in $gen.

  • toSequence($gen as generator) - Produces a sequence that contains all values that are contained in $gen. Values of $gen that are sequences themselves are flattened.

  • toMap($gen as generator) - If the values in $gen are all key-value pairs, produces a map that contains exactly all the key-value pairs from $gen.


These and many other useful functions on generators can and should be added to every generator upon construction.

Thus, it would be good to have an explicit constructor function for a generator:

     construct-generator($record as 
                               record( initialized as xs:boolean,
                                       endReached as xs:boolean,
                                       getCurrent as function(..) as item()*,
                                       moveNext as function(..) as .. ,
                                      )
                         ) as generator
@dnovatchev dnovatchev added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature Propose for V4.0 The WG should consider this item critical to 4.0 labels Sep 20, 2023
@michaelhkay
Copy link
Contributor

michaelhkay commented Sep 20, 2023

Thanks, that looks like a very coherent proposal, and it's easy to understand as a generalisation of the design pattern used by fn:random-number-generator.

Looking at it again, though, I wonder what we're actually proposing adding to the spec? Is it a set of functions like skip() and take() plus a definition of the record type generator that they operate on, plus some kind of user guide on how to implement your own generators?

In my own experience I've found that the easiest way to use fn:random-number-generator() is with xsl:iterate, though presumable fn:iterate-while() will also work well. Examples that show the interaction of these capabilities would be useful.

It would be good to see some examples that show how the concept can be applied to processing of XML and JSON. Perhaps there's some kind of possibility of a function like stream(uri-of-xml-document, simple-pattern) that delivers a generator whose delivered items are subtrees of the XML document that satisfy the simple-pattern, where the simple-pattern is some kind of restricted predicate that can test nodes against local conditions like their name, attributes, and ancestry.

@michaelhkay
Copy link
Contributor

Note also issue #295, which points out the need for extensions to the record-type syntax to allow record types of the kind you are using, with self-reference in function arguments and result.

@michaelhkay
Copy link
Contributor

michaelhkay commented Sep 20, 2023

I guess two more useful functions on generators might be for-each() and filter() that return new generators.

It would seem appealing to make the functions skip(), take(), for-each(), filter() etc be accessible as fields of the generator itself, so we can write $my-generator?take(10). But since the generator is user-written, that requires some kind of inheritance mechanism so someone implementing a generator gets these methods "for free". Perhaps this is a bridge too far... If we provide free-standing functions like generator:take($generator, $count) then it's easy enough for the user implementing a generator to wrap these as methods on the generator if they choose.

Another thought is whether and how one might allow the internal state of a generator to be private.

@dnovatchev
Copy link
Contributor Author

Thanks, that looks like a very coherent proposal, and it's easy to understand as a generalisation of the design pattern used by fn:random-number-generator.

Looking at it again, though, I wonder what we're actually proposing adding to the spec? Is it a set of functions like skip() and take() plus a definition of the record type generator that they operate on, plus some kind of user guide on how to implement your own generators?

This is the minimum we can do.

We could also think of extending the definition of some of the standard functions that operate on sequences and on arrays, to accept as first argument anything that implements head and tail. If this can be done, then we will not need to add new overloads for these functions at all.

In my own experience I've found that the easiest way to use fn:random-number-generator() is with xsl:iterate, though presumable fn:iterate-while() will also work well. Examples that show the interaction of these capabilities would be useful.

Exactly. A generator goes well as argument to any function that has an accumulator-like argument that will contain the latest generator, produced in the latest iteration (or latest recursive call), such as folds, _iterate-XXX_, _while-XXX_

And generally being passed as one of the arguments of a call to a recursive function, as showed in the examples in the proposal.

It would be good to see some examples that show how the concept can be applied to processing of XML and JSON. Perhaps there's some kind of possibility of a function like stream(uri-of-xml-document, simple-pattern) that delivers a generator whose delivered items are subtrees of the XML document that satisfy the simple-pattern, where the simple-pattern is some kind of restricted predicate that can test nodes against local conditions like their name, attributes, and ancestry.

Yes, I have a few examples in mind, but based on the feedback (including yours) from my previous/other issues, I wanted to keep this one as simple and short as possible.

Do note that a generator can encapsulate any kind of collection - not only a sequence or an array, but also maps, trees, XML documents, infinite or of unknown size collections, ... , you name it. The same goes to the traversal strategy that a generator can use Breadth-First, Dept-First, left-to-right, right-to-left, oscillating, ... , again you name it.

@dnovatchev
Copy link
Contributor Author

I guess two more useful functions on generators might be for-each() and filter() that return new generators.

It would seem appealing to make the functions skip(), take(), for-each(), filter() etc be accessible as fields of the generator itself, so we can write $my-generator?take(10). But since the generator is user-written, that requires some kind of inheritance mechanism so someone implementing a generator gets these methods "for free". Perhaps this is a bridge too far... If we provide free-standing functions like generator:take($generator, $count) then it's easy enough for the user implementing a generator to wrap these as methods on the generator if they choose.

Yes. These can easily be implemented as standard (universally available) decorators (or just a single decorator adding all of these functions to the result-generator) on a generator, thus the author of a generator doesn't have to implement them at all.

Such a decorator takes a generator and returns another generator that is the result of adding to the original one the missing functions.

Another thought is whether and how one might allow the internal state of a generator to be private.

Right now there is no concept of private in XPath.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 20, 2023

I guess two more useful functions on generators might be for-each() and filter() that return new generators.

Here are the two additional requested functions on generators: for-each and filter, plus two additional: for-each-pair
and at.

I am also placing them into the main proposal:

At index N

at($ind as xs:integer)
{
  subrange($ind, 1)
}

For Each

for-each($gen as generator, $fun as function(*))
{
   map:put($gen, "getCurrent", function() { $fun($gen?getCurrent()) }  )                              
}

For Each Pair

for-each-pair($gen1 as generator, $gen2 as generator, $fun as function(*))
{
   let $gen1 := if(not($gen1?initialized)) then $gen1?moveNext()
                  else $gen1,
       $gen2 := if(not($gen2?initialized)) then $gen2?moveNext()
                  else $gen2,
    return
      if($gen1?endReached or $gen2?endReached) then map:put($gen1, "endReached", true())
        else map:put(map:put($gen1, "getCurrent", function() { $fun($gen1?getCurrent(), $gen2?getCurrent()) } ) ,
                     "moveNext", function() { for-each-pair(skip($gen1, 1), skip($gen2, 1), $fun)}
                    )                             
}

Filter

filter($gen as generator, $pred as function(item()*) as xs:boolean)
{
  let $gen := if(not($gen?initialized)) then $gen?moveNext()
                else $gen,
      $moveNext := function() { filter($gen?moveNext(), $pred) }
      return if($gen?endReached or $pred($gen?getCurrent())) 
               then map:put($gen, "moveNext", $moveNext)
               else filter($gen?moveNext(), $pred)
}

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 23, 2023

Update:

Here is a more precise, non-(explicitly)recursive code for filter acting on a generator.

Do note:

  • The code below is valid and executable XPath 4.0 code.

  • It demonstrates that fn:iterate-while can work flawlessly with generator-input, without needing any modifications or new overloads

  • The code creating the filtered-generator doesn't know (and doesn't use) anything about the internal state of the input generator.

  • The produced generator that performs the filtering and presents its results one at a time, has as its own, internal state the key named "inputGen" and its value, which can be any suffix of the input-generator, such that the head of this suffix is satisfying the predicate $pred.

Filter

filter($gen as generator, $pred as function(item()*) as xs:boolean)
{
     let $getNextGoodValue := function($gen as map(*), $pred as function(item()*) as xs:boolean)
         {
            let $mapResult := iterate-while(
                                            $gen,
                                            function($gen) { not($pred($gen?getCurrent($gen))) },
                                            function($gen) { $gen?moveNext($gen) }
                                            )   
            return $mapResult?getCurrent($mapResult)                     
         },
       $gen := if($gen?initialized) then $gen 
                      else $gen?moveNext($gen)
        return
          map {
               "initialized": true(),
               "endReached":  $gen?endReached,
               "getCurrent": function($this as map(*)) { $getNextGoodValue($this?inputGen, $pred) },
               "moveNext":   function($this as map(*))
                             {    let $nextGoodValue := $getNextGoodValue($this?inputGen?moveNext($this?inputGen), $pred),
                                      $nextGen := iterate-while(
                                                                $this?inputGen?moveNext($this?inputGen),
                                                                function($gen) { not($pred($gen?getCurrent($gen))) },
                                                                function($gen) { $gen?moveNext($gen) }
                                                                )
                                    return
                                      map {
                                            "initialized": $nextGen?initialized,
                                            "endReached":  $nextGen?endReached,
                                            "getCurrent" : function($x) {$nextGoodValue},
                                            "moveNext" :   $this?moveNext,
                                            "inputGen" :   $nextGen
                                           }
                             },
               "inputGen" : $gen
              }
        }

Below is a complete XPath 4.0 expression which, besides the above definitions, produces a specific filtered generator, whose input is the generator of integers from 2 to Infinity, and the predicate "approves" any odd number. The first 999 members of this generator are materialized as the result of this expression:

let $gen2ToInf :=
       map{
            "initialized": true(),
            "endReached": false(),
            "getCurrent": function($this as map(*)) { $this?last + 1},
            "moveNext":  function($this as map(*))
                         {
                           if(not($this?initialized)) then map:put($this, "initialized", true())
                             else map:put($this, "last", $this?last +1)
                         },
            "last" : 1 
           },           
   $getNextGoodValue := function($gen as map(*), $pred as function(item()*) as xs:boolean)
                        {
                            let $mapResult := iterate-while(
                                                            $gen,
                                                            function($gen) { not($pred($gen?getCurrent($gen))) },
                                                            function($gen) { $gen?moveNext($gen) }
                                                            )   
                            return $mapResult?getCurrent($mapResult)                     
                        },           
   $filter := function($gen as map(*), $pred as function(item()*) as xs:boolean)
              {
                     let $gen := if($gen?initialized) then $gen 
                                    else $gen?moveNext($gen)
                      return
                        map {
                             "initialized": true(),
                             "endReached":  $gen?endReached,
                             "getCurrent": function($this as map(*)) { $getNextGoodValue($this?inputGen, $pred) },
                             "moveNext":   function($this as map(*))
                           {    let $nextGoodValue := $getNextGoodValue($this?inputGen?moveNext($this?inputGen), $pred),
                                    $nextGen := iterate-while(
                                                              $this?inputGen?moveNext($this?inputGen),
                                                              function($gen) { not($pred($gen?getCurrent($gen))) },
                                                              function($gen) { $gen?moveNext($gen) }
                                                              )
                                  return
                                    map {
                                          "initialized": $nextGen?initialized,
                                          "endReached":  $nextGen?endReached,
                                          "getCurrent" : function($x) {$nextGoodValue},
                                          "moveNext" :   $this?moveNext,
                                          "inputGen" :   $nextGen
                                         }
                           },
                             "inputGen" : $gen
                            }
                      }
   return
      fold-left(1 to 999, [$filter($gen2ToInf, fn($x) {$x mod 2 eq 1} ), []], 
                fn($accum, $mem) 
                {
                  let $nextGen := $accum(1)?moveNext($accum(1))
                   return  [$nextGen, array:append($accum(2), $nextGen?getCurrent($nextGen))]
                }
               )
               (2)

Anyone can execute this code with BaseX ver. 10.8 and produce the result:

image

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 23, 2023

This is an open question to the knowledgeable reader:

Can we extend the type of arguments of any currently existing standard XPath functions (and operators) that are (the argument types) sequence, array or map, to a more generalized type that is the union type of these types and the generator type?

This will result in immediately having these standard functions become much more powerful and encompassing, without impacting backwards compatibility in any existing pre- XPath 4.0 code, and no need for additional overloads.

@michaelhkay
Copy link
Contributor

Can we extend...

Not if the argument type is sequence, because there is no type that i more general than sequence, so the union of sequence with any other type is sequence.

When we introduced arrays, we spent at least a year on trying to devise a solution in which sequences and arrays were subtypes of something more general. Sadly, it proved too difficult.

@michaelhkay
Copy link
Contributor

One way forward might be to use annotations as a sort of interface/prototype mechanism. We could say, for example, that if $X is annotated %prototype=map{"count" : array:size#1, "head" : array:head#1, ...) then '%count($X)returnsarray:size($X)and%head($X)returnsarray:head($X)`. But working out all the details would be challenging, and the final result might be quite confusing to users.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 24, 2023

One way forward might be to use annotations as a sort of interface/prototype mechanism. We could say, for example, that if $X is annotated %prototype=map{"count" : array:size#1, "head" : array:head#1, ...) then '%count($X)returnsarray:size($X)and%head($X)returnsarray:head($X)`. But working out all the details would be challenging, and the final result might be quite confusing to users.

I find annotations rather messy.

A better and well-tested in practice approach is to say that everything is an object. An object has a value and some other properties, like collection-type whose values specify what type of collection the object?value is, and this can be one of : map, array, generator, sequence .

This is in full accordance with #105 following which every value can be represented as a map that has no regular keys and only a default key whose value is the value. We can add a special key named object-properties to this map, which could be something that is not allowed as a regular key until XPath 4, and have all object properties (among them collection-type ) as properties of the map that is the value of the key object-properties.

Any object has a set of standard properties, and can have its own-properties.

Any value (item(*)) can give us its containing object as the result of evaluating a new function:

object($value as item()*)

I know this sounds rather complicated.

For the purpose of the current discussion, we could have a simpler solution, where we have type-hints such as:

no-sequence(ExprProducingSingleItem)

telling the processor that the value of ExprProducingSingleItem must not be regarded as a sequence.

In this way, legacy code that isn't aware of the type-hints will continue to treat ExprProducingSingleItem generally as a sequence of one item.

But if an argument is passed with the no-sequence hint, its collection-type would be one of the remaining known collection-types (array, map, generator)

@dnovatchev
Copy link
Contributor Author

For the purpose of the current discussion, we could have a simpler solution, where we have type-hints such as:

no-sequence(ExprProducingSingleItem)

telling the processor that the value of ExprProducingSingleItem must not be regarded as a sequence.

In this way, legacy code that isn't aware of the type-hints will continue to treat ExprProducingSingleItem generally as a sequence of one item.

But if an argument is passed with the no-sequence hint, its collection-type would be one of the remaining known collection-types (array, map, generator)

We could simply use as a hint surrounding an expression (that evaluates to a single, non-sequence-collectionItem (array, map or generator) by back-slashes:

\$myGenerator\

This means:

Dear processor, use $myGenerator not as a sequence of one item, but as the generator of the input collection on which we want to apply the function.

Thus,

tail(\[1, 2, 3]\)

will produce:

[2, 3]

and will not produce
()

In this way we can have and use only one single set of collection-processing functions, regardless of the type of the collection - be it a sequence, an array, a map or a generator.

@michaelhkay
Copy link
Contributor

michaelhkay commented Sep 24, 2023

It seems that the proposed backslash-notation is not defining \[1, 2, 3]\ as an expression, but rather as a modifier of the syntax of a static function call which somehow changes the semantics of the function call, in effect causing the function name to bind to a different function with a different signature? This all seems very lacking in orthogonality.

If annotations are messy, this is downright disheveled.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Sep 24, 2023

It seems that the proposed backslash-notation is not defining \[1, 2, 3]\ as an expression, but rather as a modifier of the syntax of a static function call which somehow changes the semantics of the function call, in effect causing the function name to bind to a different function with a different signature? This all seems very lacking in orthogonality.

If annotations are messy, this is downright disheveled.

No, not different functions that have different signature but just one function.

Exactly the opposite: this is a single function that accepts a generator. If using \ is visually displeasing, let us have:

  • sequence-generator([1, 2, 3]) This presents the item [1, 2, 3] as a generator, calling whose getCurrent() method produces just one item (after moveNext() is called) - the whole array.
  • array-generator([1, 2, 3]) This presents the item [1, 2, 3] as a generator, calling whose getCurrent() method produces 1, 2, and 3 on each call (after moveNext() is called).
  • map-generator(map{"a": 1, "b": 2}) This presents the item map{"a": 1, "b": 2} as a generator, calling whose getCurrent() method produces each of the map's two key-value-pairs (after moveNext() is called).
  • $someGenerator - this must be a generator.

We can think of the three types: sequence-generator, array-generator and map-generator as distinct (non-overlapping) subtypes of the generator type.

Updated with Conclusion:

Thus, we can have just a single set of collection-processing functions, that can take as input any type of collection: be it a sequence, a singleton-array, a singleton-map, or any other collection represented by a generator.

@ChristianGruen ChristianGruen added Discussion A discussion on a general topic. and removed Feature A change that introduces a new feature labels Nov 8, 2023
@ChristianGruen
Copy link
Contributor

Closely related (both issues should be discussed together): #708.

@dnovatchev
Copy link
Contributor Author

Closely related (both issues should be discussed together): #708.

These are not "both issues". This is the same issue, which in #708 is just mentioned and touched on the surface, while here we have done a deep analysis and a full design and proposal.

So, to be clear: there is one issue and there is one complete proposal.

@ChristianGruen
Copy link
Contributor

So, to be clear: there is one issue and there is one complete proposal.

This wasn’t meant to be offensive, Dimitre.

@dnovatchev dnovatchev added the Feature A change that introduces a new feature label Nov 8, 2023
@dnovatchev
Copy link
Contributor Author

dnovatchev commented Nov 14, 2023

Thanks to all for today's discussion.

To address @rhdunn 's statement that the boolean initialized member is not necessary, and that no such property is there in the IEnumerator interface in .NET, here is a quote of the Microsoft's documentaion:

"Current is undefined under any of the following conditions:

  • The enumerator is positioned before the first element in the collection, immediately after the enumerator is created. MoveNext must be called to advance the enumerator to the first element of the collection before reading the value of Current."

Thus, it is clearly stated that immediately after the enumerator is created an (in fact initialization) action is needed (calling the MoveNext method) so that the enumerator will be able to expose a correct value of its Current property.

In the current proposal the initialized member serves exactly the purpose of indicating whether or not a call to moveNext is necessary in order to ensure that a call to getCurrent can produce a correct result.

Thus, rather than mentioning this important fact as something minor in the Remarks, here we choose to have a property that explicitly shows the state of the generator -- being either initialized or non-initialized. Operating on a uninitialized generator can no longer "go unnoticed" or be blamed on the caller. Instead, initialization (calling moveNext) can always be done whenever initialized is false.

@rhdunn
Copy link
Contributor

rhdunn commented Nov 14, 2023

That just means that in order to traverse an IEnumerator, you need to write:

    IEnumerator<T> enumerator = createGenerator();
    while (enumerator.MoveNext()) {
        T current = enumerator.Current;
        // ...
    }

This is similar to how Java Iterator works, where you write:

    Iterator<T> iterator = createGenerator();
    while (iterator.hasNext()) {
        T next = iterator.next();
        // ...
    }

Using Kotlin, you can implement a NodeList Java Iterator as:

class NodeListIterator(val nodes: NodeList) : Iterator<Node> {
    private var current: Int = 0

    // Returns true if there are more elements. Does not advance the iterator.
    override fun hasNext(): Boolean = current != nodes.length

    // Returns the next element. Advances the iterator.
    override fun next(): Node = nodes.item(current++)
}

In C# -- because of IEnumerable semantics -- this would be something like:

class NodeListIterator : IEnumerable {
    NodeList nodes;
    int current = -1; // Positioned before the first element.

    public NodeListIterator(NodeList nodes) {
        this.nodes = nodes;
    }

    // Move to the next item. Adances the iterator.
    public bool MoveNext() {
        return position++ < nodes.Length;
    }

    // Get the current item. Does not advance the iterator.
    public Node Current {
        get { return nodes.Item(current); }
    }
}

C# has different semantics to Java because the Current item is a property and is named Current. This allows a user to access Current multiple times in the function without moving the iterator. In Java you would need to place the result of next() in a temporary variable to avoid issues with multiple access.

@michaelhkay
Copy link
Contributor

C# has different semantics to Java because the Current item is a property and is named Current. This allows a user to access Current multiple times in the function without moving the iterator. In Java you would need to place the result of next() in a temporary variable to avoid issues with multiple access.

The differences run deeper than this (which makes transpiling Java to C# painful!). In C#, MoveNext changes the state, while in Java, hasNext() must be stateless - you can call it multiple times before a call on next().

@dnovatchev
Copy link
Contributor Author

Thanks Mike and Reece!

Anyway, having an explicit initialization-state property is something helpful for all developers here and this is its intended goal.

@rhdunn
Copy link
Contributor

rhdunn commented Nov 14, 2023

I think it would be useful to try and write different generator functions that cover the different expected uses/patterns so we can work out what properties and associated semantics are needed. That way, we would know if an initialized property is useful or if it is not needed.

@michaelhkay
Copy link
Contributor

In short, a generator looks like a function but behaves like an iterator.”

Looking at this thread again, what's being described here looks much more like an iterator than a generator. To me the essential characteristic of a generator is that it's written like a coroutine; a function that yields result items one at a time.

@MarkNicholls
Copy link

As head is defined to return the 1st element of a list, then an empty list is not in the domain.
listToMaybe is defined to return the 1st element of a list if there is one, and none if there isn't.

both are valid, but I tend to avoid non total functions if I can...though i suspect there is some deep categorical reason haskell does what it does.

I'm not used to the edge cases of the weirdness of "()"....I'll have to think about it...

I think in my example these arent quite lists it may be

([],true) vs ([()],false)

it reminds me of oracle and all the null weirdness

@ChristianGruen
Copy link
Contributor

I'm not used to the edge cases of the weirdness of "()"....I'll have to think about it...

One advantage of XDM arrays is exactly that a member can be an empty sequence. If you’re working with flat data, sequences are usually a better choice. Even with nested data, you can turn it around and organize your data as a sequence of arrays (instead of an array that contains sequences as members). If you do so, you can use fn:head and fn:empty as planned.

@MarkNicholls
Copy link

I came across this issue the other day (and posted it on stack overflow)

I wanted to evaluate an unbounded sequence lazily, and failed.

I did though create unbounded lazy lists (actually technically it isn't 'lazy' its 'deferred'), i.e. like IEnumberable in dotnet, rather than a haskell list, though it IS a functional list, which IEnumerable isnt. This construction is much nicer than the one I wrote previously in this thread which needed an array to construct the "iterator", which defeats the object if the array is eager.

here's an example

it creates a lazylist of all the natural numbers, unbounded, takes the first 100 of them, and then dumps them into a sequence.

things to note
i) its not tail recursive
ii) toSequence may well do something o(N^2) - toSequence isnt inherent though, its just to allow me to output the result, and I cant see how to make sequences behave nicely.
iii) its the "initial encoding", i.e. based on modelling a data type in maps.
iv) its a translation of effectively the inherent mathematical model...it may still be a bad translation though.
v) i.e. its based on fold and unfold, which are pretty mechanical constructs once you have defined the underlying data type, in some sense the code writes itself.
vi) There quite a nice paper about folds & unfolds origami
and looking at it now, the code is pretty much a translation of much of what is in the paper.
vii) list unfold is how haskell defines 'iterate', so its a pretty standard model (unfold is pretty ubiquitious nowadays in C#, F# etc).

declare namespace array = "http://www.w3.org/2005/xpath-functions/array";
declare namespace map = "http://www.w3.org/2005/xpath-functions/map";
declare namespace lazyList = "http://kookerella.com/lazyList";

declare function lazyList:empty() as map(*) {
    map {
        'name' : 'empty'
    }
};

declare function lazyList:cons($head as item(),$lazyTail as function() as map(*)) as map(*) {
    map {
        'name' : 'cons',
        'head' : $head,
        'lazyTail' : $lazyTail
    }
};

(: curried fold : (state -> a -> state) -> state -> LazyList a -> state  :)
(: uncurried fold : ((state,a) -> state),state,LazyList a) -> state  :)
declare function lazyList:fold($folder as function(item()*,item()) as item()*,$state as item()*,$lasyList as map(*)) {
    if (map:get($lasyList,'name') = 'empty') then
        $state
    else
        let $head := map:get($lasyList,'head')
        let $newState := $folder($state,$head)
        let $newLazyList := map:get($lasyList,'lazyTail')()
        return lazyList:fold($folder,$newState,$newLazyList)
};

(: curried unfold : (state -> Maybe (a,state)) -> state -> lazyList a :)
(: we use an array to model maybe :)
declare function lazyList:unfold($unfolder as function(item()) as array(*),$state as item()) as map(*) {
    let $nextMaybe := $unfolder($state)
    return
        if (array:size($nextMaybe) = 0) then
            lazyList:empty()
        else
            let $lazyTail := function () {
                lazyList:unfold($unfolder,array:get($nextMaybe,2))
            }        
            return lazyList:cons(array:get($nextMaybe,1),$lazyTail)
};

declare function lazyList:toSequence($lazyList as map(*)) as item()* {
    let $folder := function($state,$a) { ($state,$a) }
    return lazyList:fold($folder,(),$lazyList)
};

declare function lazyList:take($n as xs:integer,$lazyList as map(*)) as map(*) {
    if ($n = 0) then
        lazyList:empty()
    else
        let $head := map:get($lazyList,'head')
        let $lazyTail := function() {
            lazyList:take($n - 1,map:get($lazyList,'lazyTail')())
        }
        return lazyList:cons($head,$lazyTail)
};

declare function local:naturalUnfolder($state as xs:integer) {
    array { $state,$state + 1 }
};

let $naturalNumbers := lazyList:unfold(local:naturalUnfolder#1,0),
    $top100 := lazyList:take(100,$naturalNumbers)

return lazyList:toSequence($top100)

@MarkNicholls
Copy link

MarkNicholls commented May 15, 2024

@ChristianGruen
Thanks for your answer (indirectly via MH) to my starter question on SO....the above really was the general problem, except I wanted an 'unfold' for 'sequence', so I had to write this to define the problem 'lazily' and then hop back into 'sequence' to output the result. I can make this 'toSequence' tail recursive, but I don't think I can do 'toSequence' of the unbounded $naturalNumbers without it blowing up.

@dnovatchev
Copy link
Contributor Author

@MarkNicholls
Sorry, I looked at the SO question and I cannot understand what actually is wanted?

There doesn't seem to be a problem defined (that needs to be solved) - just an attempted solution at some problem, whose definition is not shown.

In case you define this problem to us, I will be happy to see if this is related to the topic of Generators, and if so, to provide a solution.

Aren't the functions currently defined on Generator sufficient for solving your problem:

take, skip, subrange, head, tail, at, for-each, for-each-pair, filter, concat, append, prepend, some, ofType, skipWhile. takeWhile, toArray, toSequence, toMap

Or do you have difficulties deciding how to define your Generator?

@dnovatchev
Copy link
Contributor Author

@MarkNicholls,

If I understand the problem, you want to get the Nth item from a possibly infinite sequence of items.

Below is a pure XPath 3.1 solution using an implemented generator that represents the infinite sequence of natural numbers: [1, infinity)

While this may seem a little bit long, please, note that all involved functions are standard generator functions, so the user will simply write this:

$at($gen1ToInf, 1000000)

Here is the XPath 3.1 expression of this - where all these standard generator functions have been implemented:

let $put := function($map as map(*), $key as xs:anyAtomicType, $value as item()*) as map(*)
             { Q{http://www.w3.org/2005/xpath-functions/map}put($map, $key, $value) },
     $gen1ToInf :=
       map{
            "initialized": true(),
            "endReached": false(),
            "getCurrent": function($this as map(*)) { $this?last + 1},
            "moveNext":  function($this as map(*))
                         {
                           if(not($this?initialized)) then $put($this, "initialized", true())
                             else $put($this, "last", $this?last +1)
                         },
            "last" : 0 
           },
   $emptyGenerator := function()
   {
       map{
           "initialized" : true(),
           "endReached": true(),
           "getCurrent": function($this as map(*)) {error()},
           "moveNext": function($this as map(*)) {error()}
          }     
   },
   
   $skip-helper := function($gen as map(*), $n as xs:integer, $self as function(*))
   {
       if($n eq 0) then $gen
         else
         (
             let $gen := if(not($gen?initialized)) then $gen?moveNext()
                           else $gen
               return
                 if(not($gen?endReached)) then $self($gen?moveNext($gen), $n -1, $self)
                   else $gen
       )     
   },
   $skip := function($gen as map(*), $n as xs:integer)
   {
      $skip-helper($gen, $n, $skip-helper)
   },
   
   $take-helper := function($gen as map(*), $n as xs:integer, $self as function(*))
   {
    let $gen := if(not($gen?initialized)) then $gen?moveNext()
                        else $gen
        return
           if( $gen?endReached or $n eq 0) then $emptyGenerator()
              else $put($gen, "moveNext", $self($gen?moveNext($gen), $n -1, $self))
   },
   
   $take := function($gen as map(*), $n as xs:integer)
   {
     $take-helper($gen, $n, $take-helper)
   },
   
   $subrange := function($gen as map(*), $m as xs:integer, $n as xs:integer) as map(*)
   {
     $take($skip($gen, $m -1), $n)
   },
   
   $at := function($gen as map(*), $ind as xs:integer)
   {
     let $genResult :=$subrange($gen, $ind, 1)
      return
       $genResult?getCurrent($genResult)
   }
   return
     $at($gen1ToInf, 10000000)

And sure enough, this works without a problem:

image

@MarkNicholls
Copy link

MarkNicholls commented May 15, 2024

It was someone in SO that mentioned this thread and made me re-realise that unfold was a generator/iterator.

The observation is, once the library functions are written, the proposed formulation for the natural numbers is this:
(i.e. as an user of the language, i.e as a programmer, this is what I would have to write)

 $gen1ToInf :=
       map{
            "initialized": true(),
            "endReached": false(),
            "getCurrent": function($this as map(*)) { $this?last + 1},
            "moveNext":  function($this as map(*))
                         {
                           if(not($this?initialized)) then $put($this, "initialized", true())
                             else $put($this, "last", $this?last +1)
                         },
            "last" : 0 
           },

whilst unfold (stolen from Haskell who stole it from category theorists who discovered it) does this like this:

declare function local:naturalUnfolder($state as xs:integer) {
    array { $state,$state + 1 }
};

I think the two formulations are probably logically identical, the question being, why would I use the former formulation when the latter seems simpler?

@dnovatchev
Copy link
Contributor Author

@MarkNicholls

(i.e. as an user of the language, i.e as a programmer, this is what I would have to write)

 $gen1ToInf :=
       map{
            "initialized": true(),
            "endReached": false(),
            "getCurrent": function($this as map(*)) { $this?last + 1},
            "moveNext":  function($this as map(*))
                         {
                           if(not($this?initialized)) then $put($this, "initialized", true())
                             else $put($this, "last", $this?last +1)
                         },
            "last" : 0 
           },

whilst unfold (stolen from Haskell who stole it from category theorists who discovered it) does this like this:

declare function local:naturalUnfolder($state as xs:integer) {
    array { $state,$state + 1 }
};

I think the two formulations are probably logically identical, the question being, why would I use the former formulation when the latter seems simpler?

Actually these two are equivalent - we could ask the user only to specify the state and the state-change function (moveNext), and use defaults for the remaining properties.

@MarkNicholls
Copy link

MarkNicholls commented May 15, 2024

Is it simpler to implement unfold instead though? I think the proposal for "generators" in some sense its a rediscovery of unfold, but we can simply steal the "design" from the category theorists.

fold/unfold are pretty deep inherent things that can be pretty much mechanically derived from the data structures themselves (they apply to all data structures), but unfold is pretty much the simplest possible translation of the underling maths into mortal's software, I personally think the signature explains what it does

unfold : (state -> Maybe (state,a)) -> state -> LazyList a
where the 'unfolder' is of type

state -> Maybe (state,a)
i.e. what the user does is supply a function that takes a state and if they want the list to continue they create the next item in the list, along with a new state....or...they return 'none' and the list ends.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented May 15, 2024

@MarkNicholls ,

Is it simpler to implement unfold instead though?
fold/unfold are pretty deep inherent things that can be pretty much mechanically derived from the data structures themselves (they apply to all data structures), but unfold is pretty much the simplest possible translation of the underling maths into mortal's software, I personally think the signature explains what it does

unfold : (state -> Maybe (state,a)) -> state -> LazyList a where the 'unfolder' is of type

state -> Maybe (state,a) i.e. what the user does is supply a function that takes a state and if they want the list to continue they create the next item in the list, along with a new state....or...they return 'none' and the list ends.

We are making no progress here - you are describing in more unfamiliar terms what has already been precisely specified in the description of a Generator - that the moveNext function can only be evaluated if not(endReached)

Because we are describing essentially the same thing, but one description is done in XPath, which the implementors know well, and the other description is in Haskell, which the implementors are not supposed to (and maybe do not) know, then the question which of the two descriptions is "simpler to implement" has a natural and obvious answer.

Also, any such question is very much subjective: it is "simpler" for you to implement datatypes described in Haskell, and it is "simpler" for other people to implement the same datatype but described in XPath.

So questions of this type don't have a single answer and instead of discussing "which representation is better", let us just be content with the fact that the same datatype has been defined in two different ways independently, which means that it has true value, that different people saw and captured in their own way.

@MarkNicholls
Copy link

MarkNicholls commented May 16, 2024

@dnovatchev

The code/description is in XQuery, Haskell is simply a citation for the model (as is scala/erlang/F#).

no progress indeed.

@MarkNicholls
Copy link

@rhdunn
Ah...I notice that your comments on #708
is the same formulation as unfold above...its just called generateSequence in Kotlin.

@ChristianGruen ChristianGruen removed the Propose for V4.0 The WG should consider this item critical to 4.0 label May 22, 2024
@ndw ndw added PRG-hard Categorized as "hard" at the Prague f2f, 2024 PRG-optional Categorized as "optional for 4.0" at the Prague f2f, 2024 labels Jun 4, 2024
@dnovatchev
Copy link
Contributor Author

Looking at it again, though, I wonder what we're actually proposing adding to the spec? Is it a set of functions like skip() and take() plus a definition of the record type generator that they operate on, plus some kind of user guide on how to implement your own generators?

@michaelhkay Exactly.

I am wondering, and would appreciate your experienced advice and guidance on what would be the best way to express this in the documents:

  • We need a definition of a standard record type - the generator-record. Where is the best place to put this definition in the "XPath 4.0" document?
  • After defining the generator-record type in the "XPath 4.0" document we need to define about 22 different functions on generators - should this be done in a single PR or in 22 PRs?
  • Should the constructor - function for the generator-record be in the "XPath 4.0" document or in the F&O document?
  • Should the "empty generator" be defined, and in which of the two documents?
  • Should we pre-define some special generators, such as generator-m-n and generator-m-infinity ?

@dnovatchev dnovatchev added XPath An issue related to XPath Propose for V4.0 The WG should consider this item critical to 4.0 and removed PRG-hard Categorized as "hard" at the Prague f2f, 2024 labels Sep 22, 2024
@michaelhkay
Copy link
Contributor

Given the scale of what you are proposing, I would suggest doing it as a free-standing document describing the feature in enough detail that, if accepted, writing the PR becomes an editorial task. This should also give you scope for a discussion of requirements and use cases for the feature which might be useful background for the CG even if not directly suitable for inclusion in the spec.

@dnovatchev
Copy link
Contributor Author

Given the scale of what you are proposing, I would suggest doing it as a free-standing document describing the feature in enough detail that, if accepted, writing the PR becomes an editorial task. This should also give you scope for a discussion of requirements and use cases for the feature which might be useful background for the CG even if not directly suitable for inclusion in the spec.

Thanks @michaelhkay .

Isn't such a free-standing document - the opening comment for the current document?

If not, what exact (missing) content would you like to see in this document, and what (if anything) would you advise to remove?

@ndw
Copy link
Contributor

ndw commented Oct 4, 2024

I'm confused.

About 1/4 the way through this long thread, we see some examples from @dnovatchev that implement filtering over generators in XPath 4.0 that runs in BaseX.

If this can already be implemented, what does the proposal seek to add to the specification?

@dnovatchev
Copy link
Contributor Author

I'm confused.

About 1/4 the way through this long thread, we see some examples from @dnovatchev that implement filtering over generators in XPath 4.0 that runs in BaseX.

If this can already be implemented, what does the proposal seek to add to the specification?

I thought this was obvious:

  1. Establish a standard generator record type that can be used without the need for definition in any code module.
  2. Provide 22 standard functions on generators, whose signatures and semantics have been specified in this proposal.

Of course anything "can be implemented", and following such an argument we could conclude that our function library probably does not need 95% of the functions we have already added in version 4.0. My understanding why we still are providing all these functions is because they give the user a level of abstraction, eliminate the need to repeatedly write complex code that may be error-prone, and help the user to approach any problem in a more abstracted, general and systematic way.

@ndw
Copy link
Contributor

ndw commented Oct 4, 2024

I take your point that we have lots of functions that a user could write themselves. Which ones we add and which ones we don't is a judgement call.

I think it's possible to look at this proposal and think:

(a) I've never even thought of using generators in XPath and (b) we need to add twenty-two functions to support them? Gosh that seems like a lot. Do we really need to do this if those users who want them can already implement them?

Computing the first million prime numbers is an intellectually interesting task, but it doesn't seem closely aligned with the sorts of tasks that users turn to XPath to accomplish.

Can you think of a few use cases that are more closely aligned with the problems that I think most users are imagining when they turn to XPath, namely querying or transforming XML documents in some way? Note, I am not saying that generators can't be useful to do that, I'm saying, it would be an easier sell if some of the examples made me think "yeah, I do want to do that sometimes and, yeah, generators really make that easy (or possible) in ways that other approaches don't."

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Oct 4, 2024

Can you think of a few use cases that are more closely aligned with the problems that I think most users are imagining when they turn to XPath, namely querying or transforming XML documents in some way? Note, I am not saying that generators can't be useful to do that, I'm saying, it would be an easier sell if some of the examples made me think "yeah, I do want to do that sometimes and, yeah, generators really make that easy (or possible) in ways that other approaches don't."

I have to think about more xml-specific problems, and this requires at least some time, but generators are useful in all kinds of problems which would otherwise be restricted by unknown (or known upper-limits that are too-high and not generally supportable) necessary quantity of memory, and/or unknown (or known upper-limits that are too-high and not generally supportable) of computation /time.

Without using a generator, the user has to guess about these limits and probably needs to repeatedly improve these guesses in a long error and trial process.

If a problem can benefit from lazy execution, then it can probably benefit from using generators.

@rhdunn
Copy link
Contributor

rhdunn commented Oct 4, 2024

Certain algorithms (tree traversal, tokenization that feeds into other operations, producer/consumer, etc.) can often be easier to implement in a generator style -- either as an iterator object, a for/while loop with a yield, or via a generator "get next item from the current item and context" function. We already have one such function in XPath/XQuery -- the random number generator!

We have two alternative proposals:

  1. Dimitri's proposal here is to add the core record type (similar to mine) and to define XPath functions that operate on those analogous to exiting functions.
  2. My proposal is to add a record type (similar to Dimitri's) that the XPath/XQuery processor can internally wrap into a sequence via a fn:sequence function -- this allows the user to use/call existing XPath functions and expressions with these lazily evaluated sequences.

The exact details of what core properties and methods the generator record type has can be refined as the proposal matures. I think the key question here -- if we adopt generators -- is whether we want the operations on the generator objects defined as separate funtions (as proposed by Dimitri), or having them map onto the standard sequence-based functions (e.g. via my proposal) and letting the implementors provide the logic, optimisations, etc.

@dnovatchev
Copy link
Contributor Author

We have two alternative proposals:

1. Dimitri's proposal here is to add the core record type (similar to mine) and to define XPath functions that operate on those analogous to exiting functions.

2. My proposal is to add a record type (similar to Dimitri's) that the XPath/XQuery processor can internally wrap into a sequence via a fn:sequence function -- this allows the user to use/call existing XPath functions and expressions with these lazily evaluated sequences.

These two proposals are not mutually exclusive, but 2. above typically requires a lot to be done "behind the scenes" by the language processor, as it needs to implement a finite automaton (based again on the generator concept). Because a lot of the processing is hidden from the user, they sometimes don't realize what is actually involved in the processing, when execution is being deferred, and when not, and the run-time behavior of the code may not meet their expectations - as witnessed here and here with code using the yield return statement in C# programs.

This proposal is what Reece has specified as 1. above. Its main advantages are:

  1. Easy to implement and nothing required for the language processors to implement - no change to the languages, we only use a generator type and we have added a set of functions on generators.
  2. Complete transparency of what is being evaluated - nothing hidden.
  3. The user is in full control and bears the sole responsibility for the way they specified and manipulated their generators.

We may adopt a 2-stage process: initially specifying just the standard generator record type and the functions that operate upon generators, and then at a later stage and based on the way generators are being used in actual user code - and if the users really would want even higher level of abstraction - add to the languages a yield return type of construct.

@MarkNicholls
Copy link

MarkNicholls commented Oct 5, 2024

the proposals are quite different, #708 effectively introduces an unfold function for sequences (see unfold )

This one introduces a completely new construct based on imperative iterators.

I think #708 has the benefit of being small and in some sense, a missing piece of the existing infrastructure but may be problematic/clunky in certain edge cases when manipulated unbounded sequences.

This proposal though, I think is stylistically dissonant to these languages, even in C#, iterators (IEnumerator<T>) are rarely used directly by users any more, its virtually a historical detail when MS introduced generics and wanted a typesafe version of the plain IEnumerator type, when extending their libraries to generics.

As soon as C# embraced functional constructs via LINQ the user facing infrastructure shifted to revolve around a pure(ish) IEnumerable<T> type. This proposal is more like the pythonic Generator[], which actually I find problematic.

For me a lazy list collection type (i.e. a functional take on this proposal) with an unfold (i.e. like #708) would be a better fit.

(I have written a deferred list type in XPath 3, but I cannot do a truly lazy version without it being put into the language, or lazy constructs being introduced).

P.S.

Despite having written a deferred list type with fold/unfold/map/bind etc, I've never needed to use it in production code, and my interest in it, was driven more by the need to guarantee tail recursive behaviour, than wanting to traverse unbounded search trees or generating unbounded sequences of prime numbers etc normally associated with lazy lists, I'm not sure XPath is the tool I would choose to do this sort of thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion A discussion on a general topic. Feature A change that introduces a new feature PRG-optional Categorized as "optional for 4.0" at the Prague f2f, 2024 Propose for V4.0 The WG should consider this item critical to 4.0 XPath An issue related to XPath XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

6 participants