Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xslt] Constructing arrays #113

Closed
michaelhkay opened this issue May 4, 2022 · 8 comments
Closed

[xslt] Constructing arrays #113

michaelhkay opened this issue May 4, 2022 · 8 comments
Labels
Enhancement A change or improvement to an existing feature XSLT An issue related to XSLT

Comments

@michaelhkay
Copy link
Contributor

michaelhkay commented May 4, 2022

I've felt for a while that the current proposal for xsl:array is messy. It's both semantically and syntactically messy with it's composite=yes|no attribute and the xsl:array-member child. I've been using it doing XML to JSON conversion and you get a lot of stuff like this:

<xsl:template match="closed_auctions">
      <xsl:array>
         <xsl:for-each select="closed_auction">
            <xsl:map>
               <xsl:apply-templates select="*"/>
            </xsl:map>
         </xsl:for-each>
      </xsl:array>
   </xsl:template>

Almost invariably, xsl:array has xsl:for-each or xsl:apply-templates as a child. So how about allowing:

<xsl:template match="closed_auctions">
         <xsl:for-each select="closed_auction" form="array">
            <xsl:map>
               <xsl:apply-templates select="*"/>
            </xsl:map>
         </xsl:for-each>
   </xsl:template>

The semantics here is that xsl:for-each delivers an array in which there is one member for each item in the input sequence. This cleanly eliminates the need for composite=yes|no and xsl:array-member: you can create a "composite" array using

<xsl:for-each select="1 to 5" form="array">
   <xsl:sequence select="., .+1"/>
</xsl:for-each>

which delivers [(1,2), (2,3), (3,4), (4,5), (5,6)].

The attribute form="array" can also appear on xsl:apply-templates and xsl:for-each-group. In the latter case each group produces one member of the resulting array:

<xsl:for-each-group select="0 to 9" group-adjacent="0 idiv 5" form="array">
   <xsl:sequence select="current-group()"/>
</xsl:for-each-group>

delivers [(0,1,2,3,4), (5,6,7,8,9)]

The attribute form="sequence" is the default and specifies the current behaviour.

I've been wondering also about extending this to form="map". In most cases when you construct a map from an input sequence, both the key and the value are functions of the input item. So instead of:

<xsl:template match="regions">
      <xsl:map>
         <xsl:for-each select="*">
            <xsl:map-entry key="name()">
               <xsl:array>
                  <xsl:for-each select="item">
                     <xsl:array-member>
                        <xsl:apply-templates select="."/>
                     </xsl:array-member>
                  </xsl:for-each>
               </xsl:array>
            </xsl:map-entry>
         </xsl:for-each>
      </xsl:map>
   </xsl:template>

we could write:

<xsl:template match="regions">
         <xsl:for-each select="*" form="map" key="name()">
                  <xsl:apply-templates select="item" form="array"/>
         </xsl:for-each>
 </xsl:template>

which strikes me as an improvement...

@martin-honnen
Copy link

Is xsl:iterate not also a candidate for those extensions, given that the experimental existing proposal (https://www.saxonica.com/html/documentation11/v4extensions/xslt-syntax-extensions/iteration-maps-arrays.html) of map and array attributes is supported in Saxon 11: https://www.saxonica.com/html/documentation11/xsl-elements/iterate.html?

@liamquin
Copy link

liamquin commented May 4, 2022

First, how would this interact with an as= attribute on xsl:for-each and xsl:iterate ? Second, what values other than 'array" make sense? what happens if i say "map"? What is the value for the default behaviour?

@michaelhkay
Copy link
Contributor Author

@martin, yes it probably makes sense to put it on xsl:iterate too, though I'm not sure exactly what xsl:break and xsl:on-completion should do.

@liam, there is no @as attribute on xsl:for-each or xsl:iterate. The default is form="sequence" which gives the current behaviour. I've sketched out a meaning for form="map" in the proposal.

@michaelhkay
Copy link
Contributor Author

I have revised my ideas on how to construct and deconstruct arrays. For background please see my Balisage 2022 paper at https://www.balisage.net/Proceedings/vol27/html/Kay01/BalisageVol27-Kay01.html

My current proposal is as follows.

First, we introduce a value called a parcel. A parcel is an item that encapsulates an arbitrary sequence. We provide a pair of functions

fn:parcel($seq as item()*) as item() - creates a parcel that wraps a supplied sequence
fn:unparcel($parcel as item()) as item()* - unwraps the contents of a parcel to return the contained sequence

We'll talk later about exactly how a parcel is represented.

We provide a function to decompose an array into a sequence of parcels, and another to compose an array from a sequence of parcels:

array:members($a as array(*)) as item()*
array:from-members($parcels as item()*) as array(*)

With the help of array:members and fn:unparcel, all functions, operators, and XSLT instructions that are designed to process sequences can now be used to process arrays. For example, we can do <xsl:iterate select="array:members($array)!fn:unparcel(.)"....

All that remains is to decide exactly where parcels fit into the type system. Possible candidates are:

  • a new kind of item, perhaps under the category of "external objects" as defined in §24.1.3 of the XSLT 3.0 specification.
  • an arity-0 anonymous function, where calling the function has the same effect as fn:unparcel().
  • a singleton map whose single entry maps a magic key to the wrapped value. The magic key could either be cryptic to discourage people accessing the value this way, or it could be exposed for example as the string "value" so people could achieve unparcelling by writing $parcel?value.
  • an array consisting either of a single members containing the wrapped value, or multiple members each containing one item of the wrapped value.

The choice involves a trade-off between extra complexity in the spec, extra complexity in implementations, and usability. We should consider how much type safety we want to provide, and whether we want to provide convenient ItemType and pattern syntax for matching parcels (and to avoid parcels matching other types inadvertently).

I think the approach that might give the best type safety with least disruption to the data model and type system is to use an arity-0 anonymous function with the annotation %parcel. Perhaps for usability we could also define a built-in item type alias fn:parcel-type defined to be equivalent to %parcel function() as item()*.

@martin-honnen
Copy link

The Balisage paper has no mentioning of array:members and array:from-members, I take it there

  • array:parcels($array) returns the members of an array as a sequence of parcels
  • array:of($parcels) creates an array from a sequence of parcels

are used.

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Aug 11, 2022 via email

@ChristianGruen
Copy link
Contributor

One thing I like about the design of arrays is that they have been considered in the atomization process. When calling a function that expects an atomic value, the members of the array are implicitly atomized:

substring([ 'string' ], 3)  →  ring

If parcels are represented as arrays, we would a) introduce no other concept, we’d b) have some symmetry to maps with single entries, and c) the array members could be supplied as arguments to atomizing functions without the invocation of a further function:

(: or array:parcels :)
array:members($array) ! substring(., 3)

In my opinion, array:join works fine in practice, but it’s the function name that is a bit unfortunate: Similar to map:merge, it’s mostly used to generate new arrays instead of joining existing ones (array:create/map:create, array:new/map:new, array:of/map:of, … may have been more intuitive). However, if array:members returns arrays for single members, the syntax is pretty concise:

(: build new array from all members of the original array that have a matching number :)
let $array := [ [ 123, 12 ], [ 45 ], [ 123 ] ]
return array:join( array:members($array)[. = 123] )

As can probably be seen, my perspective is pretty XQuery-centric.
I cannot judge if my examples are also intuitive for XSLT users.

@cedporter cedporter added XSLT An issue related to XSLT Enhancement A change or improvement to an existing feature labels Sep 14, 2022
@michaelhkay
Copy link
Contributor Author

The spec for xsl:array has now been revised and this issue is no longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement A change or improvement to an existing feature XSLT An issue related to XSLT
Projects
None yet
Development

No branches or pull requests

5 participants