Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPath] Introduce the lookup operator for sequences #50

Closed
dnovatchev opened this issue Jan 16, 2021 · 34 comments
Closed

[XPath] Introduce the lookup operator for sequences #50

dnovatchev opened this issue Jan 16, 2021 · 34 comments
Labels
Enhancement A change or improvement to an existing feature XPath An issue related to XPath

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Jan 16, 2021

In XPath 3.1 it is convenient to use the ? lookup operator on arrays and maps.

It is easy and readable to construct expressions, such as:

  [10, 20, 30]?(2, 3, 1, 1, 2)

And this understandably produces the sequence:

20, 30, 10, 10, 20

However, it is not possible to write:

(10, 20, 30)[2, 3, 1, 1, 2]

or

(10, 20, 30)(2, 3, 1, 1, 2)

or

(10, 20, 30)?(2, 3, 1, 1, 2)

This proposal is to allow the use on sequences of the postfix lookup operator ? with the same syntax as it is now used for arrays.

The ? lookup operator will be applied on sequences whose first item isn't an array or a map. The only change would be to allow the type of the left-hand side to be a sequence, in addition to the currently allowed map and array types. At present, applying ? on any such sequence results in error. In case the first item of the LHS sequence is an array or a map, then the current XPath 3.1 semantics is in force, which applies the RHS to each item in the sequence.

The restriction in the above paragraph can be eliminated if we decide to use a different than ? symbol for this operator, for example ^

The goal of this feature is achieving conciseness, readability, understandability and convenience.

For example, now one could easily produce from a sequence a projection / rearrangement with any desired multiplicity and ordering.

Thus, it would be easy to express the function reverse() as simply:

$seq?($len to 1 by -1)
@martin-honnen
Copy link

Actually both Saxon and BaseX do allow (['a', 'b', 'c'], array { 1 to 3 })?(2, 3, 1, 1, 2) and give the sequence b c a a b 2 3 1 1 2. With your suggestion, would the result stay the same or change to return a sequence of arrays?

@dnovatchev
Copy link
Contributor Author

@martin-honnen
Thanks for the feedback. Please, see the updated proposal.

Based on the update, do you have any remaining questions?

@ChristianGruen
Copy link
Contributor

The proposal is appealing. $sequence[position() = $start to $end] is a common pattern, and $sequence?($start to $end) would be shorter.

I guess the extension makes only sense for Postfix Lookups, although it may be more consistent to define the same rules for Unary Lookups.

As typing may not be strict, we should only check the first item of the left-hand operand, as it’s e. g. done when computing the Effective Boolean Value: “…a sequence whose first item is a node, fn:boolean returns true.”, https://www.w3.org/TR/xquery-31/#id-ebv:

(: the test is successful, although the second item of the test is an integer :)
let $input := (1 to 5)
let $test := (<a/>, 1)
return $input[$test]

So I would suggest rephrasing your rule to:

The lookup operator will be applied for sequences if the first item is neither an array nor a map.

This would generally be cheaper (in particular if the input is streamed or processed in an iterative manner). It would then still be possible to evaluate…

let $left := ($array1, $array2, ..., 1)
let $right := 1
return head( $left?($right) )

…and skip evaluation after the first result.

A similar question arises with the type of the right operand. The following filter expression is legal (but it may raise an error if the position is compared with the string):

(1 to 5)[position() = (1 to 10, 'a')]

So it would be consistent to make this legal as well:

(1 to 5)?((1 to 10, 'a')

And we may need to consider usability aspects: People might forget the question mark, and might wonder why $sequence(1 to 5) doesn’t work.

@martin-honnen
Copy link

So $sequence?($start to $end) is defined as $sequence[position() = $start to $end]? Is that (all) what Dimitre had in mind? I thought he wants the result of e.g. [10, 20, 30]?(2, 3, 1, 1, 2) as 20, 30, 10, 10, 20 extended to sequences and e.g. (10, 20, 30)?(2, 3, 1, 1, 2) would give the same result. I don't see how you could define that with position as (10, 20, 30)[position() = (2, 3, 1, 1, 2)] simply gives the sequence 10, 20, 30 (and not 20, 30, 10, 10, 20).

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Jan 17, 2021

Oh, you are right. So I think the following expressions would be equivalent then (provided that all position values are numbers)?

(10, 20, 30) ? (2, 3, 1, 1, 2),

for $i in (2, 3, 1, 1, 2)
return (10, 20, 30)[$i]

Array lookups can be rewritten as follows:

[10, 20, 30] ? (2, 3, 1, 1, 2),

for $i in (2, 3, 1, 1, 2)
return [10, 20, 30]($i)

@martin-honnen
Copy link

I think the first example captures Dimitre's intent, let's hear what he himself thinks.

Whether the second is a complete rewrite strategy for already existing abilities I am not sure, we can already do e.g. ([10, 20, 30], map { 1 : 'a'})?(1,2), i.e. use the lookup with a sequence on the right hand side on a sequence of arrays and/or maps on the left hand side.

@ChristianGruen
Copy link
Contributor

Thanks for your comment. Of course you’re right, if we have multiple input items, we need to have two for clauses:

([10, 20, 30], map { 1 : 'a'}) ? (1,2)

for $i in ([10, 20, 30], map { 1 : 'a'})
for $k in (1, 2)
return $i($k)

@dnovatchev
Copy link
Contributor Author

@ChristianGruen , @martin-honnen ,

I think that the original proposal states quite clearly:
"For example, now one could easily produce from a sequence a projection / rearrangement with any desired multiplicity and ordering."

Thus:

(10, 20, 30) ? (2, 3, 1, 1, 2)

produces

20, 30, 10, 10, 20

and

([10, 20, 30], map { 1 : 'a'}) ? (1,2)

produces (using the current XPath 3.1 rule, as the first item in the LHS is an array)

10, 20, 'a'

Thank you very much @ChristianGruen for the clever disambiguation solution, based on the type of the first item in the LHS sequence. I also thought about this but wanted to hear the first responses before digging in deeper.

@benibela
Copy link

I do not think this is a good idea

Currently, ? treats all values in the sequence the same independent of their position().

It would be very confusing to change it such that the position sometimes matters and sometimes not. Depending on the first element is also confusing. It was already a bad idea to do that for the EBV. (extremely unconvincing that boolean((<a/>, false()) and boolean((false(), <a/>)) are not the same)

One could allow sequences in [], i.e. shorten $sequence[position() = $start to $end] to $sequence[$start to $end]. It is much more intuitive, I keep proposing that

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Jan 18, 2021

I do not think this is a good idea

Currently, ? treats all values in the sequence the same independent of their position().

@benibela It seems that you are not against the idea per se but against using the ? operator to express it.

Using another symbol, such as ^ does not have what you consider the negative side of using ?.

Is my understanding correct?

Also, this proposal is not in conflict with the proposal for ranges. In fact these two complement each other well.
For example:

$seq ? ($len to 1 by -1)

implements the reverse() function.

In fact, this proposal has already inspired another great proposal by @ChristianGruen: #51

@benibela
Copy link

Using another symbol, such as ^ does not have what you consider the negative side of using ?.

Is my understanding correct?

Yes,

Without special handling of arrays/map

If (10, 20, 30)^(2, 3, 1, 1, 2) means 20, 30, 10, 10, 20, then ([10], [20], [[30]])^(2, 3, 1, 1, 2) should mean [20], [[30]], [10], [10], [20]

Although ^ in particular could be used for an XOR operation on integers

@michaelhkay
Copy link
Contributor

The way that the meaning of a[b] varies dramatically depending on the type of b is one of the ugliest features of the current language, and having a similar dependency for ? is equally ugly. Using a new operator ^ is better, but I don't think this operation is required so often that it justifies allocating one of the few remaining spare ASCII characters; it would be better done through a function.

@dnovatchev
Copy link
Contributor Author

The way that the meaning of a[b] varies dramatically depending on the type of b is one of the ugliest features of the current language, and having a similar dependency for ? is equally ugly. Using a new operator ^ is better, but I don't think this operation is required so often that it justifies allocating one of the few remaining spare ASCII characters; it would be better done through a function.

I am not going to defend something that is considered "ugly" even by its creators...

And indeed, ? is almost fully overloaded with all possible meanings.

Still, using an operator would be better than a function. Why not:

$seq  <- (1, 3, 2, 2, 3, 1)

Any idea for a better (or more suitable) operator string is welcome.

@ChristianGruen
Copy link
Contributor

I don’t think it’s a good idea to include yet another operator. If we created a completely new language, things would look completely different, but we already have /, ! and ? that all do similar things (from a user perspective), and it’s already tedious enough today to explain the subtle differences.

If we have fn:item-at in the future, or an equivalent function…

(10, 20, 30) ? (2, 3, 1, 1, 2)
→ (2, 3, 1, 1, 2) ! item-at((10, 20, 30), .)

…we don’t save too many characters by extending the lookup operator, so it’s probably cleaner indeed to restrict lookups to function items.

Talking about numeric predicates: I hated the language design while I was implementing it many years ago, I still struggle with the implications when adding optimizations, and I wouldn’t recommend the design for any new language; but I’m frequently surprised how intuitive the syntax is when I give lectures on XPath for (real) beginners.

@michaelhkay
Copy link
Contributor

michaelhkay commented Jan 23, 2021 via email

@ChristianGruen
Copy link
Contributor

I had a discussion with James Clark about this while XPath 1.0 was still in draft; I felt the overloading of [] was very confusing, but he felt that both meanings of [] were intuitive to users and it was up to the language designers and implementors to make it work.

Interesting. I could assume that your point of view might have prevailed if all had anticipated the further development and complexity of the languages we have today.

@dnovatchev
Copy link
Contributor Author

I am not going to defend something that is considered "ugly" even by its creators...

And indeed, ? is almost fully overloaded with all possible meanings.

Still, using an operator would be better than a function. Why not:

$seq  <- (1, 3, 2, 2, 3, 1)

Any idea for a better (or more suitable) operator string is welcome.

Here is something intuitive:

$seq  <-[]  (1, 3, 2, 2, 3, 1)

or even

$seq  <-[ 1, 3, 2, 2, 3, 1 ]

or

$seq  => [ 1, 3, 2, 2, 3, 1 ]

@ChristianGruen
Copy link
Contributor

With XQuery 3.1, sequence lookups can be achieved by wrapping a sequence into an array:

let $seq := (10, 20, 30)
let $lookup := (1, 3, 2, 2, 3, 1)
return array { $seq }?($lookup)

I’m still wondering if we really want to introduce yet another operator. Maybe we could collect some more use cases to find out how often people would require such an operator?

I’m also interested in the feedback and opinion of everyone else.

@martin-honnen
Copy link

@ChristianGruen , your latest comment to me it looks like an option good enough to not introduce a new operator, in particular given all the different opinions we have seen as to whether an ASCII operator, a Unicode symbol or a function or overloading ? is the best approach.

@dnovatchev
Copy link
Contributor Author

With XQuery 3.1, sequence lookups can be achieved by wrapping a sequence into an array:

let $seq := (10, 20, 30)
let $lookup := (1, 3, 2, 2, 3, 1)
return array { $seq }?($lookup)

@ChristianGruen Who will remember array { $seq }?($lookup) ,
given the nothing-to remember (actually no new syntax) of:

$seq  => [ 1, 3, 2, 2, 3, 1 ]

We already know that in XPath, for any given integer k, [k] is a function, which when applied on a sequence $vSeq produces the k-th item of the sequence.

The above just applies this function on the sequence, using the arrow operator. Nothing new in fact! The arrow operator is already given to us in XPath 3.1.

@michaelhkay
Copy link
Contributor

michaelhkay commented Mar 2, 2021 via email

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Mar 2, 2021

I find this incredibly confusing. .... Michael Kay Saxonica

@michaelhkay, in #50 (comment) I proposed three alternative syntaxes all of which seem intuitive.

Am I right to expect that you would reject all of them due to some reason?

Thanks,
Dimitre

@michaelhkay
Copy link
Contributor

I am not at all convinced that any operator-based syntax is going to be significantly more usable than a function such as $seq => items-at($positions).

I'd really like to avoid inventing new operators where a function will do the job - the grammar is far too fragile to make this an easy undertaking, and it's not clear to me that it improves usability.

@drdm
Copy link

drdm commented Mar 3, 2021 via email

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Mar 6, 2021

I find this incredibly confusing. The thing on the RHS of the arrow operator is, in effect, a partially applied function (that has been partially applied by supplying its first argument). An array is a function from integers to array members. So I would expect this construct to take integers from $seq, and use them to extract items from the array on the RHS. But you seem to be proposing that it should do the reverse! (@michaelhkay) Michael Kay Saxonica

Actually,

[{integer k}]

Has always been an existing operator used in XPath consistently starting with XPath 1.0, and when it is applied on a sequence (on its left-hand side) it selects the k-th item of the sequence.

The operator above is actually the operator below, in the case when the length of the sequence of integers is 1:

[{integer k1}, {integer k2}, ..., {integer kN}]

So there is nothing new and confusing in this operator. It has been used for more than 20 years in its abbreviated form by everyone, thus there is almost 0 barrier for understanding/using this operator.

In order to eliminate any confusion, the proposal is to use this operator applied explicitly with the => (arrow) operator on the left-hand side sequence:

$vSeq => [{integer k1}, {integer k2}, ..., {integer kN}]

If there is still someone confused, please say so and ask your questions. I will be happy to answer them :)

Dimitre

@michaelhkay
Copy link
Contributor

michaelhkay commented Mar 6, 2021 via email

@michaelhkay
Copy link
Contributor

Here's a proposal.

(a) We introduce a new construct A [# B ] where A is an arbitrary sequence, and B evaluates to a sequence of integers. If the integers are positive, the result is equivalent to

for $b in B return A[position() = $b]

If an integer is negative then it counts from the end, so the full expansion becomes

let $C := count(A) + 1 return
for $b in B return A[position() = $b ge 0 then $b else $C + $b]

Examples:

X[#1] returns the first item
X[#1 to 3] returns the first three items, in order
X[#-1] returns the last item
X[#-3 to -1] returns the last three items, in order
("A", "B", "C")[#3, 1, 2] returns "C", "A", "B"

The [#..] operator is available in both places predicates are allowed in the grammar.

The expression B is evaluated with the same focus as A. So

<xsl:for-each select="C">
  <xsl:value-of select="B[#position()+1]"/>
</xsl:for-each>

selects items in B corresponding to the positions of the selected items from C.

(b) We introduce an operator "downto", analogous to "to". The result of A downto B is the same as reverse(B to A). For example, X[#-1 downto -3] selects the last three items in a sequence, starting with the last.

(c) We introduce an inverse subscripting operator X[^Y]. Y evaluates to a sequence of integers, the expression selects all items whose position is NOT in Y. Negative numbers again count from the end. For example

A[^1] is equivalent to tail(A)
A[^-1] is equivalent to A[position() != last()]
A[^3] is equivalent to remove(A, 3)

(Slightly confusing for C# users, admittedly, since C# uses ^ in a subscript to mean counting from the end. But the use here is analogous to its use in regular expressions. We could use "!" instead.)

@martin-honnen
Copy link

I suppose

let $C := count(A) + 1 return
for $b in B return A[position() = $b ge 0 then $b else $C + $b]

is meant to say

let $C := count(A) + 1 return
for $b in B return A[if (position() = $b ge 0) then $b else $C + $b]

@martin-honnen
Copy link

The section

The expression B is evaluated with the same focus as A. So

<xsl:for-each select="C">
<xsl:value-of select="B[#position()+1]"/>
</xsl:for-each>

selects items in B corresponding to the positions of the selected items from C.

kind of confuses or at least shatters my existing understanding of the use of position() in predicates. Is anyone able to demonstrate what it means in pure XPath expressions with several steps like foo[#position() + 1]/bar[#position() - 1]?

@michaelhkay
Copy link
Contributor

Sorry about the messed-up attempt to use a ternary conditional.

Basically, the expression A[#B] evaluates A and B with the same focus. So using "." and "position()" and "last()" within B gives the same result as using them outside the "predicate". So (3 to 5)!X[#.] selects the items in positions 3, 4, and 5, while X!Y[#position()] selects the items in Y in positions 1 to count(X).

I wondered about having a function end() that returns an implementation-defined integer greater than or equal to the maximum length of a sequence, so you can do things like X[#5 to end()].

I also wondered about alternative syntax "#[" in place of "[#" so it becomes X#[1 to 5] in place of X[#1 to 5]. Another possibility would be "?[" which suggests a (perhaps misleading or perhaps helpful?) analogy with the lookup operator.

@ChristianGruen
Copy link
Contributor

Another possibility would be "?[" which suggests a (perhaps misleading or perhaps helpful?) analogy with the lookup operator.

I like this alternative, it looks more intuitive to me than [#.

All the same, I still wonder if the number of users who would benefit from the proposed extension is large enough. Personally, I would probably use the new syntax whenever it gets available due to my tendency to write compact code, but I would claim that the number of users who can tell the difference between A(B) and A?(B) is pretty small. It can be argued that the number may increase if a similar operator exists for sequences.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Mar 15, 2021 via email

@rhdunn rhdunn added XPath An issue related to XPath XQuery An issue related to XQuery Feature A change that introduces a new feature Enhancement A change or improvement to an existing feature and removed Feature A change that introduces a new feature labels Sep 14, 2022
@dnovatchev dnovatchev removed the XQuery An issue related to XQuery label Sep 22, 2022
@michaelhkay
Copy link
Contributor

Issue #213 takes this forward as a concrete proposal, and proposes that this issue now be closed.

@ndw
Copy link
Contributor

ndw commented Oct 17, 2022

I suggest that if a proposal is raised to resolve an issue and we have at least one other person agreeing to close an issue, we can close it. I don't really want to spend telcon time discussing whether or not an issue can be closed. There's a bare minimum amount of time required to just ask the question on a telcon.

If one of us sees that an issue is closed and we disagree that it should have been closed, we can reopen it. (Preferably with a comment that explains what issue we feel is unresolved.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement A change or improvement to an existing feature XPath An issue related to XPath
Projects
None yet
Development

No branches or pull requests

8 participants