Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array:leaves #356

Closed
dnovatchev opened this issue Feb 18, 2023 · 13 comments
Closed

array:leaves #356

dnovatchev opened this issue Feb 18, 2023 · 13 comments
Assignees
Labels
Feature A change that introduces a new feature Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Feb 18, 2023

1. Issues

There are at least two issues with the definition of the function array:flatten:

  1. Unlike most other functions on arrays (such as array:put, array:replace, array:append, array:slice, array:subarray, array:remove, array:insert-before, array:tail, array:trunk, array:reverse, array:join, array:for-each, array:filter, array:for-each-pair, array:sort, array:partition) , which produce an array as their result, this function produces only a sequence

  2. This function is not lossless -- any members that are the empty sequence or the empty array are not represented in the returned result.

2. Suggested solution(s)

We want to have a function that is similar to the wrongly defined one, but produces its contents as an array, and is lossless. There are two obvious ways to do this:

  1. Correct the specification of array:flatten so that its result is an array and it represents the empty sequences and empty arrays as the same members of its result.

  2. Add to the Specification a new function: array:leaves that produces an array as its result and that is lossless.
    array:leaves returns an array whose members are exactly all the leaves of the input array, by the order of their appearance. By definition leaves are all, and at any depth, members that are not an array except when they are the empty array. Thus () (the empty sequence) and [] (the empty array) are leaves by definition.

Solution 2. will not cause any compatibility issues.

3. Examples

The expression array:leaves([1, (), [4, 6], 5, 3]) returns [1, (), 4, 6, 5, 3].

The expression array:leaves([1, 2, 5], [[10, 11], 12], [], 13) returns [1, 2, 5, 10, 11, 12, [], 13].

@dnovatchev dnovatchev added Bug Something that doesn't work in the current specification XQFO An issue related to Functions and Operators Feature A change that introduces a new feature labels Feb 18, 2023
@joewiz
Copy link

joewiz commented Feb 18, 2023

I'd find the flatten-array function quite handy. I've misunderstood the current flatten function too, until realizing that it returns a sequence.

@ChristianGruen
Copy link
Contributor

The function name array:flatten-array could be misleading. Maybe we could add options to array:flatten. A depth option might be helpful as well.

@ndw
Copy link
Contributor

ndw commented Feb 18, 2023

I often use array:flatten precisely because I want to turn the array into a sequence, so I'd be opposed to making a backwards incompatible change.

@michaelhkay
Copy link
Contributor

The array:flatten function was defined in 3.1 so whether we like it or not, we can't change it. I haven't found many occasions to use it (I typically use $array?* instead), but that's not the point. We could extend it with extra parameters, but we can't change the return type.

i can't say that I've seen many occasions when I would want to use the other function you're proposing either, though I've no objection to it in principle. But it's a different function and needs a different name. array:squash perhaps?

It's incorrect to say that the new function is lossless. You can't reconstitute the input from the output.

@johnlumley
Copy link

johnlumley commented Feb 18, 2023

As your second example leaves an empty array as itself, does that imply that array:flatten-array([[[]]]) and array:flatten-array([[]]) would both yield [[]]?

@ChristianGruen
Copy link
Contributor

We would stay backward compatible by adding an option to the function, maybe array:flatten($a, map { 'wrap': true() }), but similar to @johnlumley, I fail to understand the exact semantics of the proposed enhancement. @dnovatchev What would be the expected results for these queries?

  1. array:flatten(1)
  2. array:flatten([ [], [ [] ] ])

A depth option would give us better control how much flattening will take place.
Some examples for $a := [ 1, [ 2, [ 3, 4 ] ], [] ]:

Example Result
array:flatten($a, map { 'depth': 1 }) 1, [ 2, [ 3, 4 ] ], [ ]
array:flatten($a, map { 'depth': 2 }) 1, 2, [ 3, 4 ]
array:flatten($a) 1, 2, 3, 4, 5

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Feb 18, 2023

Excellent questions, everyone! Thank you.

@michaelhkay:

We could extend it with extra parameters, but we can't change the return type.

Exactly. This is why we need the new function.

i can't say that I've seen many occasions when I would want to use the other function you're proposing either, though I've no objection to it in principle. But it's a different function and needs a different name. array:squash perhaps?

I propose array:leaves . The definition is:

array:leaves returns an array whose members are exactly all the leaves of the input array, by the order of their appearance. By definition leaves are all, and at any depth, members that are not an array except when they are the empty array. Thus () (the empty sequence) and [] (the empty array) are leaves by definition.

It's incorrect to say that the new function is lossless. You can't reconstitute the input from the output.

It is lossless in the sense that it preserves all leaves of the input array, and their order. We are talking here if something is lost in the desired result, not about the ability to reconstitute the original. Nobody ever intended that the original could be reconstituted by the result. "Flatten" is a rather destructive operation - it decreases the dimensions of the object by at least 1.

@johnlumley

As your second example leaves an empty array as itself, does that imply that array:flatten-array([[[]]]) and array:flatten-array([[]]) would both yield [[]]?

Exactly by definition, the wanted output is:

[[]]

because both inputs have a single leaf: []

@ChristianGruen

What would be the expected results for these queries?

  1. array:flatten(1)
  2. array:flatten([ [], [ [] ] ])
  1. An error, because the argument must be an array.
  2. [ [], [] ] -because the input has two leaves, and both are the empty array.

@dnovatchev
Copy link
Contributor Author

I updated the proposal (the initial issue) by renaming array:flatten-array to array:leaves, provided a complete definition of the semantics of the function and corrected a few typos.

@michaelhkay
Copy link
Contributor

In other contexts, we're talking about map:find and a proposed ?? operator that treat a map or array as the root of a tree in which we descend through map values as well as array members. This proposed function "shallow-skips" arrays but not maps (any map is treated as a leaf in the tree). There's obviously a case for both, but I wonder whether we need to make the distinction more systematically? Because "leaf" suggests a particular way of constructing a tree view of the data.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Feb 19, 2023

In other contexts, we're talking about map:find and a proposed ?? operator that treat a map or array as the root of a tree in which we descend through map values as well as array members. This proposed function "shallow-skips" arrays but not maps (any map is treated as a leaf in the tree). There's obviously a case for both, but I wonder whether we need to make the distinction more systematically? Because "leaf" suggests a particular way of constructing a tree view of the data.

This only describes issues in the current array:flatten function.

As for selecting all the leaves of a Comp - hierrachy (as the one described in the proposal: "CompPath (Composite-objects path) Expressions", I believe this can be expressed as:

\\*[not(. instance of array(*) or (. instance of array(*) and array:empty(.) )
or (not(. instance of map(*)) or (. instance of map(*) and empty(map:keys(.)) )]

Maybe it would be good to also have a function map:leaves()

@ChristianGruen ChristianGruen removed the Bug Something that doesn't work in the current specification label Apr 27, 2023
@ChristianGruen ChristianGruen changed the title [FO] Issues with array:flatten array:flatten: Issues Apr 27, 2023
@dnovatchev
Copy link
Contributor Author

dnovatchev commented May 18, 2023

Update:

As discussed in a related issue, we now have a complete proposal for the fix to this function, without affecting backwards compatibility.

The fix was also refined to return not an array, but a sequence of arrays.

Here is the proposed fix, and it obsoletes the initial proposal:

We add a new parameter, named return-array-sequence with default value of false() ( return-array-sequence := false())

In the case when in a particular function call the value of this argument is true() then the result must be a sequence of arrays, each one of these having as its single member a corresponding leaf-member of the original array.

In the case with all pre-4.0 code, that doesn't know about this new function parameter, its default value, false() is used and thus the same result as always for previous versions is produced .

Below is code implementing the new behavior, when the value of return-array-sequence is specified as true():

array:flatten($ar, return-array := true()) can be implemented with the following XPath 3.1 code

let $result := [],
   $flattenBase := function($ar as array(*), $res as array(*), $self as function(*))
  {
    let $extractMember := function($mem as item()*)
       {
         if(not($mem instance of array(*)))
           then array:append($res, $mem)
           else $self($mem, $res, $self)
       }
       return
         for $i in 1 to array:size($ar)
           return
             $extractMember($ar($i))
          
   },
   $fixedFlatten := function($ar as array(*), $return-array-sequence as xs:boolean ) as item()*
    { 
       if($return-array-sequence) then $flattenBase($ar, $result, $flattenBase) 
         else (: Old, 3.1 :) array:flatten($ar)
     }
  return 
  (
    "New: 
",
    $fixedFlatten([1, (2, 3), 5, (), 7, [8, [10, 11, 12], (), 9]], true()),
    "
Old: 
",
    $fixedFlatten([1, (2, 3), 5, (), 7, [8, [10, 11, 12], (), 9]], false())
     )
    

The new correct result (a sequence of arrays, each containing the next leaf-level member) and the old one (losing the values that are sequences) is produced with BaseX :

New: 

[1]
[(2,3)]
[5]
[()]
[7]
[8]
[10]
[11]
[12]
[()]
[9]

Old: 

1
2
3
5
7
8
10
11
12
9

@ChristianGruen ChristianGruen changed the title array:flatten: Issues array:leaves Dec 19, 2023
@ChristianGruen
Copy link
Contributor

@dnovatchev Please object if you think we should we keep this one open.

@ChristianGruen ChristianGruen added the Propose Closing with No Action The WG should consider closing this issue with no action label Feb 13, 2024
@ndw
Copy link
Contributor

ndw commented Feb 27, 2024

The CG decided to close this issue without further action at meeting 067.

@ndw ndw closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

6 participants