Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend FLWOR expressions to maps #31

Open
rhdunn opened this issue Dec 18, 2020 · 33 comments
Open

Extend FLWOR expressions to maps #31

rhdunn opened this issue Dec 18, 2020 · 33 comments
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue Propose for V4.0 The WG should consider this item critical to 4.0 Tests Needed Tests need to be written or merged XPath An issue related to XPath XQuery An issue related to XQuery

Comments

@rhdunn
Copy link
Contributor

rhdunn commented Dec 18, 2020

Edit: Current proposal (#31 (comment)):

for key $key in ...
returnfor value $value in ...
returnfor key $key value $value in ...
return

With the addition of the for member syntax for arrays, it is possible to use a ForExpr/FLWORExpr to enumerate the contents of sequences and arrays, but not maps. In order to be consistent and symmetric across these types, the for member syntax should be extended to support maps by enumerating the key/value entries of the map.

Given a map of type map(K, V) the member RecordTest would be record(key as K, value as V). Given a map of type map(*), the member RecordTest would be record(key, value).

This would allow a user to write expressions like:

for member $entry in $map
return element { $entry?key } { $entry?value }

NOTE: With the addition of the array:values and map:entries functions in issue #29, it is possible to avoid the need of the for member syntax for arrays and maps, but may be worth keeping for people who prefer the wordy style of the XPath/XQuery syntax.

@joewiz
Copy link

joewiz commented Dec 18, 2020

I like this, but shouldn't it be for entry? Arrays have members, and maps have entries. From XDM:

An array is an ordered list of values; these values are called the members of the array.

XDM doesn't explicitly define entry, but it's used exclusively in FO, e.g.:

The function map:entry returns a map which contains a single entry. The key of the entry in the new map is $key, and its associated value is $value.

@benibela
Copy link

It could also be done without the temporary map. For example:

for key $key in $map return ...

for value $value in $map return ...

for key $key value $value in $map return ...

for value $value at $key in $map return ...

@rhdunn
Copy link
Contributor Author

rhdunn commented Dec 18, 2020

I would be happy with for entry ... for maps, so ForExpr/FLWORExpr would support:

  1. for ... for sequences (existing);
  2. for member ... for arrays (new in the draft spec);
  3. for entry ... for maps (from this proposal).

@michaelhkay
Copy link
Contributor

I'm not entirely convinced. Mainly because "entries" in a map aren't values in the data model; also because maps are unordered, and also because unlike processing arrays, it's not very difficult or inconvenient to do it using "for $k in map:keys($m), $v in $m($k) return....

@dnovatchev
Copy link
Contributor

dnovatchev commented Dec 19, 2020

@michaelhkay commented:
"... I'm not entirely convinced. Mainly because "entries" in a map aren't values in the data model; also because maps are unordered,"

It is high time that we come up with a set type in XPath. We actually have to deal all the time with sets (not just node-sets, but sets of any-type values), and it is painful to read in the spec how two maps are compared for equality when explaing fn:deep-equal():

"If $i1 and $i2 are both ·maps·, the result is true if and only if all the following conditions apply:

Both maps have the same number of entries.

For every entry in the first map, there is an entry in the second map that:

has the ·same key· (note that the collation is not used when comparing keys), and

has the same associated value (compared using the fn:deep-equal function, under the collation supplied in the original call to fn:deep-equal)."

When if we had the set type the above would simply say:

"If $i1 and $i2 are both ·maps·, the result is true if and only if the sets of their keys are equal, and the corresponding values for each key in the two maps are deep-equal."

I propose that starting with XPath 4.0 we introduce the set type and define set equality, the union ( | ), intersection (intersect) and set difference (except) not only for node-sets but for sets of any-typed values.

Then we can have a function: to-set($collection as item()*) as set, which would produce a set (of the distinct values) of any collection-typed argument supplied to it: sequence(its distinct values) , map (a set of its entries), array (a set of its members).

This makes fn:distinct-values() almost unnecessary.

We will no longer have to explain in a "Remarks" section that the result of a function is "unordered" or that its order is "implementation-defined" -- just by making this function return a set.

How can almost all major programming languages (not even speaking of SQL), such as C#, Python and Java can have a set data type / interface, but even in XPath version 4 we still have to describe it in a free language narrative?

Thanks,
Dimitre

@michaelhkay
Copy link
Contributor

From a technical point of view, I concur. However, I want to restrict the scope of what we're attempting in this version: the reason the last version took 10 years to complete is that we were too ambitious. The most expensive changes to make are those that change the data model, and I think we should strongly resist doing that.

@dnovatchev
Copy link
Contributor

From a technical point of view, I concur. However, I want to restrict the scope of what we're attempting in this version: the reason the last version took 10 years to complete is that we were too ambitious. The most expensive changes to make are those that change the data model, and I think we should strongly resist doing that.

Let's see what other people have to say.

Should the language still be incomplete and encumbered, or can we do something good here?

@michaelhkay
Copy link
Contributor

What we could contemplate, however, is a library of functions for manipulating sets of atomic values, using a map as the underlying representation. For example

set:of("a", "b", "c")
set:union($set1, $set2)
set:intersection($set1, $set2)

Going beyond sets of atomic values to sets of arbitrary values immediately gets you into the problem of defining equality between arbitrary values, which is a quagmire.

@ChristianGruen
Copy link
Contributor

Let's see what other people have to say.

I completely agree with Michael Kay here. Smaller steps to start with won’t prevent us from doing bigger steps later on.

And I personally feel it's helpful not to go beyond the scope of the original issue in the discussion (here, it was “Extend the for member syntax to maps”). Otherwise, issues like this tend to be closed later on and replaced by new issues.

@dnovatchev
Copy link
Contributor

Let's see what other people have to say.

I completely agree with Michael Kay here. Smaller steps to start with won’t prevent us from doing bigger steps later on.

And I personally feel it's helpful not to go beyond the scope of the original issue in the discussion (here, it was “Extend the for member syntax to maps”). Otherwise, issues like this tend to be closed later on and replaced by new issues.

Agreed, I submitted this as a new issue:

Proposal to introduce the set datatype in XPath 4

@dnovatchev
Copy link
Contributor

Going beyond sets of atomic values to sets of arbitrary values immediately gets you into the problem of defining equality between arbitrary values, which is a quagmire.

People do it in other PLs. In the .NET world one simply uses object.GetHashCode()

@rhdunn rhdunn added XPath An issue related to XPath XQuery An issue related to XQuery Feature A change that introduces a new feature labels Sep 14, 2022
@michaelhkay
Copy link
Contributor

It's unfortunate that in 3.1, the map:entry() and map:merge() functions model a key-value pair as a singleton map. I'm coming to the conclusion that decomposing a map into (key, value) records (record(key as xs:anyAtomicType, value as item()*)* is much more useful. But mixing the two representations is awkward.

@ChristianGruen ChristianGruen changed the title [XPath] [XQuery] Extend the for member syntax to maps. Extend the for member syntax to maps Apr 27, 2023
@ChristianGruen
Copy link
Contributor

Instead of for entry $entry in ..., it might be more user-friendly to offer:

for key $key in ...
for value $value in ...
for key $key value $value in ...

Related: https://lists.w3.org/Archives/Public/public-xslt-40/2023Jun/0026.html

@ChristianGruen ChristianGruen changed the title Extend the for member syntax to maps Extend FLWOR expressions to maps Jun 29, 2023
@ChristianGruen
Copy link
Contributor

ChristianGruen commented Sep 15, 2023

The grammar for the for key $k value $v in ... syntax could be as follows (provided we simplify the existing grammar for member bindings as proposed in #706):

InitialClause     ::=  ForClause | LetClause | WindowClause
ForClause         ::=  "for" ForBinding ("," ForBinding)*
ForBinding        ::=  (SimpleForBinding | ForMemberBinding | ForMapBinding) PositionalVar? "in" ExprSingle
SimpleForBinding  ::=  VarBinding AllowingEmpty?
ForMemberBinding  ::=  "member" VarBinding
ForMapBinding     ::=  (ForKeyBinding ForValueBinding?) | ForValueBinding
ForKeyBinding     ::=  "key" VarBinding
ForValueBinding   ::=  "value" VarBinding
VarBinding        ::=  "$" VarName TypeDeclaration?
AllowingEmpty     ::=  "allowing" "empty"
PositionalVar     ::=  "at" "$" VarName

@ChristianGruen ChristianGruen added the Tests Needed Tests need to be written or merged label Sep 16, 2023
ChristianGruen added a commit to qt4cg/qt4tests that referenced this issue Sep 16, 2023
@ChristianGruen ChristianGruen added the Propose for V4.0 The WG should consider this item critical to 4.0 label Oct 16, 2023
michaelhkay added a commit to qt4cg/qt4tests that referenced this issue Oct 24, 2023
@michaelhkay
Copy link
Contributor

I would love to find a more specific term than "value" for use when we talk of key-value pairs. Both for use in narrative prose and for use in this new syntax. It needs something that alerts the reader that we're talking about values in a key-value pair, not just any old value.

@dnovatchev
Copy link
Contributor

I would love to find a more specific term than "value" for use when we talk of key-value pairs. Both for use in narrative prose and for use in this new syntax. It needs something that alerts the reader that we're talking about values in a key-value pair, not just any old value.

Maybe key-projection or just projection ?

Or mapping-result or map-result or just result ?

@rhdunn
Copy link
Contributor Author

rhdunn commented Dec 17, 2023

The terms key and value are widely used in computer science literature in reference to maps and other similar constructs (such as key-value data stores). Those names are also used consistently across programming languages (JavaScript, Java, Kotlin, C#, C++, Python, etc.) when referring to the entries in a map.

Therefore, I would personally object to something else used in the proposed syntax.

For the prose, we could use something like "map value" or "value of the map" if we want to be specific -- I wouldn't object to something like that.

Things like "projection" and "result" have different meaning that is likely to confuse a user.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Dec 17, 2023

I would love to find a more specific term than "value" for use when we talk of key-value pairs. Both for use in narrative prose and for use in this new syntax. It needs something that alerts the reader that we're talking about values in a key-value pair, not just any old value.

Yes, that’s a tricky one. In an ideal world, I would prefer another term as well, but I don’t expect it to be misleading in the given context: When talking about maps, directories, associative arrays etc., it’s just too common to talk about keys & values or names & values. If you don’t have maps as input, there’ll be no reason to use the keyword in your FLWOR expression.

ChristianGruen added a commit to qt4cg/qt4tests that referenced this issue Mar 13, 2024
@ChristianGruen
Copy link
Contributor

Shall we go for for key $k value $v in ... , or are there any serious concerns against it?

@dnovatchev
Copy link
Contributor

Now when we have established the record type (or haven't we?) why not use just a single $kvp variable of type "key-value" record?

@ChristianGruen
Copy link
Contributor

Now when we have established the record type (or haven't we?) why not use just a single $kvp variable of type "key-value" record?

A while ago when we gathered feedback, separate key/value variables were considered to be more intuitive (see also #31 (comment)).

@dnovatchev
Copy link
Contributor

for key $k value $v in

Speaking about convenience, I would prefer something like:

for ($k, $v) in $someMap

Another possible issue is that the order of the results is unpredictable, thus it may be useful to have:

for ($k, $v) in sorted $someMap

which would return the ($k, $v) tuples in key-sorted order

@michaelhkay
Copy link
Contributor

I'd be happy to have either

for key $k value $v in ...

or

for {$k, $v} in ...

With a preference for the former as it's better aligned with the syntax for member $m.

But what about "as" and "at" clauses?

As for sorting, I think a standard order by $kclause is perfectly adequate.

@dnovatchev
Copy link
Contributor

I'd be happy to have either

for key $k value $v in ...

or

for {$k, $v} in ...

My preference is for the latter - more concise and succinct.

@ChristianGruen
Copy link
Contributor

If we added for {$k, $v}, we could (in alignment with #37) change for member $m to for [$m]; but I think we should stop changing features that have already been added to the spec (users are getting impatient for understandable reasons and increasingly use 4.0 features in productive code).

Next, for key $k in $map and for value $v in $map are helpful if you only need one segment of a map entry.

But what about "as" and "at" clauses?

at is beneficial if you want to enumerate map entries and don’t care about their order. People already do things like:

for $key at $pos in map:keys($map)
for $value in $map($key)
return $pos || '. ' || $key || ': ' || $value

Similarly, I would keep as even if it may not be used excessively. The more similar for works with and without member, key and value, the better.

@rhdunn
Copy link
Contributor Author

rhdunn commented Apr 12, 2024

Note that the for {$x, $y} ... syntax is similar to the map part of the decomposition syntax I proposed in expath/xpath-ng#8.

@rhdunn
Copy link
Contributor Author

rhdunn commented Apr 12, 2024

The XPath 4.0 discussion is in issue #37 for the for {$x, $y} ... map syntax. I propose keeping this about the for key $k value $v in ... and for member $e in ... syntax, and having the destructuring {$k, $v} in map:entries($map) syntax ib #37 as that extends the support to let expressions as well as sequences and arrays.

@ChristianGruen
Copy link
Contributor

The XPath 4.0 discussion is in issue #37 for the for {$x, $y} ... map syntax. I propose keeping this about the for key $k value $v in ... and for member $e in ... syntax, and having the destructuring {$k, $v} in map:entries($map) syntax ib #37 as that extends the support to let expressions as well as sequences and arrays.

I would favor that.

@dnovatchev
Copy link
Contributor

If we added for {$k, $v}, we could (in alignment with #37) change for member $m to for [$m]; but I think we should stop changing features that have already been added to the spec (users are getting impatient for understandable reasons and increasingly use 4.0 features in productive code).

What about the standard F&O 3.1 functions fn:load-xquery-module and fn:transform that even now (after waiting 10+ years) are still not implemented in BaseX?

Yes, someone could tell us that "there is something like this... in BaseX", but the fact remains that BaseX is not even 3.1-compliant 10 years after publishing the official F&O Specs. Isn't this an intentional effort to prevent portability and interoperability in order to lock users into BaseX?

Why worrying only about some users and not about other, who need to rely on compliance and interoperability?

Please, be aware of this fact, which suggests that at least some users may have become desperate by now.

@ChristianGruen
Copy link
Contributor

Please, be aware of this fact, which suggests that at least some users may have become desperate by now.

I’m not sure how that relates to my observation. The point I was trying to make is that we are seeing more and more 4.0 features being used in production environments – it’s not just a hypothesis. I certainly won’t object, though, if the majority of us believes that it’s better to revert or revise features of the drafts.

If you rather want to point out we should take our work on BaseX more seriously, or set other priorities, and if you want to learn what others think about it, our mailing list may be a better platform than this thread. Another option is to participate. As you may know, it’s all Open Source and freely available to everyone…

@dnovatchev
Copy link
Contributor

I’m not sure how that relates to my observation.

I was just pointing out the fact that BaseX is in a hurry to implement new 4.0 features, while at the same time this application still remains incompliant with 3.1.

This points out to particular, special and selective preferences in the implementors' decision-making.

Personally, as a user I would prefer a compliant and interoperable XPath implementation to one that might be even more optimized and might have implemented a lot of new features, but at the same time has remained incompliant with established, official versions and specifications, which hinders interoperability and results in user-lockdown.

@ChristianGruen
Copy link
Contributor

Personally, as a user […]

Just use the alternatives ;)

@michaelhkay
Copy link
Contributor

This is not the right place to discuss the conformance level of implementations.

As regards the point:

I think we should stop changing features that have already been added to the spec

I think we should avoid encouraging users to believe that they can rely on stability in the current draft specifications. A level of stability is desirable because it is the only way we will ever finish, but we should not allow that kind of argument to prevent us improving the work we have done where improvements are identified.

@michaelhkay michaelhkay added the PR Pending A PR has been raised to resolve this issue label Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature PR Pending A PR has been raised to resolve this issue Propose for V4.0 The WG should consider this item critical to 4.0 Tests Needed Tests need to be written or merged XPath An issue related to XPath XQuery An issue related to XQuery
Projects
None yet
Development

No branches or pull requests

6 participants