Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPath]Proposal: Notation for using an operator as a function #83

Closed
dnovatchev opened this issue Jul 29, 2021 · 37 comments
Closed

[XPath]Proposal: Notation for using an operator as a function #83

dnovatchev opened this issue Jul 29, 2021 · 37 comments
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators

Comments

@dnovatchev
Copy link
Contributor

dnovatchev commented Jul 29, 2021

In this thread of (XML slack)#general:
https://app.slack.com/client/T011VK9115Z/C011NLXE4DU/thread/C011NLXE4DU-1627497085.455200?cdn_fallback=2
there is this expression:

for-each-pair($aa, $bb, function($x, $y) {$x ne $y}) 
           => index-of(true())

Notice the long and unreadable: function($x, $y) {$x ne $y}

Writing, understanding and maintaining XPath code, would be enhanced if we had a better way of expressing the use of an operator as a function.
In Haskell one simply writes:

(/)  4, 2
half   = (/2)
(-) 4, 2
negate = (0-)
ne = (/=)

We could accept a similar convention, so the original expression above is written simply as:

for-each-pair($aa, $bb, (ne)) 
           => index-of(true())

Or we could use something less overloaded than parenthesis, for example:

`ne`

Then the original expression looks like this:

for-each-pair($aa, $bb, `ne`) 
           => index-of(true())

Regardless which lexical representation is chosen, being able to represent an operator as a function leads to significant code simplification, and improves its readability.

Please, share your thoughts/questions on this proposal.

@ChristianGruen
Copy link
Contributor

How would you like to integrate the operator definitions in the language? Would they be inluded in the query prolog, similar to variable declarations, etc?

Currently, I would do the following:

declare variable $ne := function($x, $y) { $x ne $y };

let $aa := 1 to 10
let $bb := 1 to 10
return for-each-pair($aa, $bb, $ne) => index-of(true())

And as a general note… Maybe we should really finalize all the existing proposals first. There will be people who’ll need to implement all this (…volunteers are welcome).

@michaelhkay
Copy link
Contributor

To avoid the need for new syntax, I'm inclined to provide this functionality using a function op(), so op("+") returns function($x, $y){ $x + $y}. Allows you to write, for example, for-each-pair($x, $y, op("ne")). Applies only to binary operators, and not to higher order operators such as / and !. (So, + - * div mod idiv eq ne lt le gt ge < > = <= >= != is and or to union intersect except << >>, if I haven't missed any)

@michaelhkay
Copy link
Contributor

It would be easier to specify if we got rid of the operator mapping table and had a one-to-one mapping between operators and functions, with the functions accepting union types in their signature where appropriate. Would using the function conversion rules in place of the custom rules for operators have any adverse effect? Hopefully not, but needs checking. The static type semantics would be a bit weaker (no way to specify declaratively that (duration div double = duration) for example. But I think that's liveable with.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Jul 30, 2021

@michaelhkay,
One way to avoid the need to discuss the arguments - conversion / restriction complexities when an operator is treated as a function, is by defining

op(⊕)

as a lexical shorthand for:

function($x, $y){ $x ⊕ $y}

Thus, op(⊕) would be expanded as part of the lexical analysis stage and is just like invoking a macro-definition with a particular argument.

In this way, an existing XPath processor (and the XPath language itself) doesn't need to be modified except for adding a little bit of lexical sugar on the surface.

I am slightly puzzled why you propose not to have op('!') . What potential caveat do you see in this?

@Conal-Tuohy
Copy link

Conal-Tuohy commented Jul 30, 2021

In the XPath 3.1 functions and operators spec, many XPath operators are already defined as a set of functions with a namespace prefix op:, even though implementations don't 'necessarily' make such functions available, and there's no actual namespace URI associated with the prefix. Not every operator is specified this way: in particular there's no op:numeric-not-equal pseudo-function which would be needed for Dimitre's example, though there are pseudo-functions defined for op:numeric-equal), op:numeric-less-than, and op:numeric-greater-than.

I do like the idea of providing a functional notation for operators, but my preference would be to do this not by adding to XPath syntax, but rather by promoting these pseudo-functions to actual functions, by defining an XML namespace for operators and adding any operators which aren't already defined (including op:numeric-not-equal).

@dnovatchev
Copy link
Contributor Author

@Conal-Tuohy

Having to write op:numeric-greater-than(), op:date-greater-than(), op:string-greater-than() ..., etc.... more than a dozen functions ... for essentially the same idea of just the gt operator -- this feels really terrible.

And the names themselves are terribly long, difficult to use and remember.

Instead, the idea here is to have for all of these just a single and short-named op("gt") where gt is the operator all we know well and don't need to learn from scratch.

@Conal-Tuohy
Copy link

Having to write op:numeric-greater-than(), op:date-greater-than(), op:string-greater-than() ..., etc.... more than a dozen functions ... for essentially the same idea of just the gt operator -- this feels really terrible.

And the names themselves are terribly long, difficult to use and remember.

We could have an op:gt() function, too, I guess, if you're not concerned about parameter types.

@dnovatchev
Copy link
Contributor Author

We could have an op:gt() function, too, I guess, if you're not concerned about parameter types.

Yes, this is exactly the idea. The types would be the types that are allowed for the gt operator, which is already implemented, and also its arguments' type-checking is specified / implemented by the current XPath processors. Thus no new/more effort would be required.

An XPath processor can just internally expand op:gt() to function($x, $y) {$x gt $y} and this is something that is part of the current XPath language, so no additional effort is required.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Jul 30, 2021

@ChristianGruen:

How would you like to integrate the operator definitions in the language? Would they be inluded in the query prolog, similar to variable declarations, etc?

Currently, I would do the following:

declare variable $ne := function($x, $y) { $x ne $y };

I think this is the simplest way to do it, and the function definitions, of course, should be built-in (default/standard) so one will never be required to write manually/themselves:

declare variable $ne := function($x, $y) { $x ne $y };

@ChristianGruen
Copy link
Contributor

@dnovatchev I see I got this wrong, and it’s mostly about standard operators. Or would you also like to support custom operators, such as half? If yes, how would an exemplary expression with operator declaration look like?

@michaelhkay
Copy link
Contributor

Yes, I agree that defining it as a lexical equivalence bypasses the problems of the operator mapping table.

Higher order operators such as ! are definitely more tricky because the second argument is an expression dependent on the context item, which would have to be converted to a function in some way.

@dnovatchev
Copy link
Contributor Author

@ChristianGruen:

@dnovatchev I see I got this wrong, and it’s mostly about standard operators. Or would you also like to support custom operators, such as half? If yes, how would an exemplary expression with operator declaration look like?

Yes I meant just the standard operators.

However the idea to do the reverse: designate a function as an operator can also be done.
One example is again Haskell, where people write:

collection `contains` item

instead of

contains collection item

The convention is that just surrounding a prefix-function with backticks makes it an infix function (operator). They have even a way of specifying the associativity (left or right or none) and priority of an operator or a function when used as operator.

I have thought of that too, but in XPath we already have the => operator which kinda allows us to do something similar. Thus having yet another way to represent a (prefix) function as an (infix) operator seems not so so pressing.

@martin-honnen
Copy link

So an XSLT 3 "implementation" of the op function would be like

  <xsl:function name="op:op" as="function(item()*, item()*) as item()*">
    <xsl:param name="operator" as="xs:string"/>
    <xsl:evaluate xpath="'function($operand1, $operand2) { $operand1 ' || $operator || ' $operand2 }'"/>
  </xsl:function>

wouldn't it (aside from the need to use a prefix for a user defined function)?

@michaelhkay
Copy link
Contributor

Yes. Plus some error checking (you wouldn't want to allow a call on op("|| doc('/etc/password') ||")). And a real implementation could do some smarter type checking.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Aug 1, 2021

So an XSLT 3 "implementation" of the op function would be like

  <xsl:function name="op:op" as="function(item()*, item()*) as item()*">
    <xsl:param name="operator" as="xs:string"/>
    <xsl:evaluate xpath="'function($operand1, $operand2) { $operand1 ' || $operator || ' $operand2 }'"/>
  </xsl:function>

wouldn't it (aside from the need to use a prefix for a use

@martin-honnen This could be one way of specifying the semantics, yes.

However, this would be too-generic and seems unnecessarily complicated.

Remember that in XPath there are only a finite set of operators, thus we don't need to receive/inspect/handle the value for the operator dynamically.

Instead, a simpler way is to have all possible op('operator') functions predefined as (in pseudocode):

lexical-replacement('op:("opCode")', 'function($x, $y) {$x "opCode" $y}')

where the above can probably be defined as a regular expression,
and where any actual value for 'opCode' must belong to the set $opCodes:

let $opCodes := 
'+', '-',  '*', 'div', 'mod', 'idiv', 'eq', 'ne', 'lt', 'le', 'gt', 'ge', '<', '>', '=', '<=', '>=', '!=', 
'is', 'and', 'or', 'to', 'union', 'intersect', 'except', '<<', '>>'

If someone needs to be precise, this could be specified using an XML-Schema enumeration facet.

All such lexical definitions (for all strings in the above set) will be predefined, thus the code-writer only uses the shorthand
'op:("opCode")' and this is expanded lexically by the XPath processor to a call of one of the 27 possible corresponding predefined functions.

@Conal-Tuohy
Copy link

@dnovatchev
I'm not sure if I've followed your point about lexical replacement, but doesn't your suggestion imply the argument to op() would necessarily be a string literal? Or would this be possible: op($opCode)?

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Aug 2, 2021

I'm not sure if I've followed your point about lexical replacement, but doesn't your suggestion imply the argument to op() would necessarily be a string literal? Or would this be possible: op($opCode)?

@Conal-Tuohy, good point.

If a variable is passed as argument, then the function call should expand to one of the 27 predefined functions. So in this case we can imagine that if all 27 function definitions are defined as entries of a map $allOps, then the lexical replacement could be represented like this:

                 replacement
'op($opCode)'    ==========>         Q{http://www.w3.org/2005/xpath-functions/map}get($allOps, $opCode)

Here is a small example (using just the functions corresponding to 2 of the 27 operators:

let $allOps := map { 'ne': function($x, $y) {$x ne $y}, 
                     'eq': function($x, $y) {$x eq $y}
}, 

    $op := function($opCode as xs:string) { Q{http://www.w3.org/2005/xpath-functions/map}get( $allOps, $opCode)},
$opCode := 'eq'

return
  $op($opCode)(3, 3)

@Conal-Tuohy
Copy link

@dnovatchev
I'm still struggling to see what the point of a "lexical replacement" stage is. If you have a function which returns the various operators as dyadic functions, isn't that all you need? Why would you not just take the $allOps map in your example, and baptise it as a standard XPath function fn:op? Or is that what you mean by "lexical replacement"?

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Aug 3, 2021

@Conal-Tuohy

Why would you not just take the $allOps map in your example, and baptise it as a standard XPath function fn:op? Or is that what you mean by "lexical replacement"?

we want the function $op, and it isn't exactly the map $allOps. The relationship between these two is:

$op := function($opCode as xs:string) { Q{http://www.w3.org/2005/xpath-functions/map}get( $allOps, $opCode)}

And this is exactly the lexical rewriting. The XPath processor replaces every occurrence of a call to $op($opCode) with

Q{http://www.w3.org/2005/xpath-functions/map}get( $allOps, $opCode)

$allOps is built-in and doesn't need to be defined explicitly by the programmer.

@Conal-Tuohy
Copy link

Conal-Tuohy commented Aug 3, 2021

Isn't $allOps($opCode) the same as Q{http://www.w3.org/2005/xpath-functions/map}get( $allOps, $opCode)}?

If the function here called $allOps were provided as a standard XPath function fn:op that would do the job, wouldn't it? I see you talking about a "lexical replacement" step, but I'm struggling to see what the point of it is. It seems to me that it's just this function $allOps. What am I missing?

Is there a reason to prefer adding op as a feature of the XPath language, as distinct from just another higher-order function in the http://www.w3.org/2005/xpath-functions namespace?

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Aug 3, 2021

Is there a reason to prefer adding op as a feature of the XPath language, as distinct from just another higher-order function in the http://www.w3.org/2005/xpath-functions namespace?

Initially I proposed to have a special notation for an operator to be used as a function.

Then @michaelhkay proposed to have this as a standard XPath function:

op("+") returns function($x, $y){ $x + $y}

And this is a good way of specifying the proposed feature.

But try to write the rules for the types of the arguments of the produced function, their passing, conversion, compatibility... It quickly becomes very complicated...

This is why we agreed that just a lexical rewriting of the above would be an easy way to avoid the need to specify all parameters types, conversion rules and restrictions.

@dnovatchev dnovatchev changed the title Proposal: Notation for using an operator as a function [XPath]Proposal: Notation for using an operator as a function Nov 23, 2021
@michaelhkay
Copy link
Contributor

Revisiting this, I think the proposal that

op("⊙") returns function($x, $y){$x ⊙ $y}

Actually works quite well. As far as the spec is concerned, the signature can be

function($x as item()*, $y as item()*) as item()* {$x ⊙ $y}

We can leave it to implementations to do smarter type checking and inferencing if they choose. The operator itself performs any necessary conversions on the arguments (such as atomisation) so there's no need to do any further conversion at the level of the function.

This form of the proposal allows the operator to be defined dynamically. There are certainly use cases for this. This puts a bit more of a strain on the implementation, but it's not that difficult. Obviously static type-checking isn't possible in this case.

The operators allowed should be all binary operators that evaluate their arguments in the containing context prior to applying the operator, plus "and" and "or": that is, if I'm not mistaken,

, (comma) and or + - * div idiv mod = < <= > >= != eq lt le gt ge ne << >> is || | union except intersect to

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Jan 21, 2022

The operators allowed should be all binary operators that evaluate their arguments in the containing context prior to applying the operator, plus "and" and "or": that is, if I'm not mistaken,

, (comma) and or + - * div idiv mod = < <= > >= != eq lt le gt ge ne << >> is || | union except intersect to

I am just trying to think about useful other operators.

What about !, ? and =>, unary -, instance of, cast as, castable as, treat as ?

Maybe even -> ? 😉

@michaelhkay
Copy link
Contributor

The result of "!" is not a function of the values of its operands. The RH operand is evaluated repeatedly, once for each item in the LHS: it's thus a higher-order operator: if you mapped it to a function, that function would do something like fn:for-each().

Instance-of, etc, take a type as an operand, and types are not first-class values in the data model, so these expressions cannot work as functions.

I believe I listed all the "lower order" binary operators: those whose LHS and RHS are both arbitrary expressions, that evaluate both their argument expressions independently, and then apply some computation to the two resulting values.

@dnovatchev
Copy link
Contributor Author

The result of "!" is not a function of the values of its operands. The RH operand is evaluated repeatedly, once for each item in the LHS: it's thus a higher-order operator: if you mapped it to a function, that function would do something like fn:for-each().

Instance-of, etc, take a type as an operand, and types are not first-class values in the data model, so these expressions cannot work as functions.

I believe I listed all the "lower order" binary operators: those whose LHS and RHS are both arbitrary expressions, that evaluate both their argument expressions independently, and then apply some computation to the two resulting values.

@michaelhkay I understand this, and my question is:

Can't we have a similar function, let's name it op-apply, so that:

op-apply("⊙") returns function($x, $y){$x ⊙ apply( $y, [.])}

For example:

op-apply("!") := function($x, $y)
                   { $x ! apply($y, [.]) }

The following is a correct example even in XPath 3.1:

let $f := function($x, $y)
                   { $x ! apply($y, [.]) }
  return
     $f(("a", "bc"), string-length#1)

Evaluating this correctly produces:

1
2

@michaelhkay
Copy link
Contributor

Your proposed op:apply("!") seems to return fn:for-each#2. I can't see why it's needed.

@dnovatchev
Copy link
Contributor Author

Your proposed op:apply("!") seems to return fn:for-each#2.

@michaelhkay , No. the results of the evaluation are 1, 2 -- exactly as expected.

I I evaluated this both with BaseX and Saxon 9.9.1.7 (via Oxygen)

let $f := function($x, $y)
                   { $x ! apply($y, [.]) }
  return
     $f(("a", "bc"), string-length#1)

and have the same identical, correct and expected results:

1
2

Now, just substitute $f from above with op-eval .

The result is:

op-apply("!") := function($x, $y)
                   { $x ! apply($y, [.]) }

So, generally op-apply can be defined as (substitute ! from above with ):

op-apply("⊙") returns function($x, $y){$x ⊙ apply( $y, [.])}

I can't see why it's needed.

Obviously, to be able to convert such "higher-order" operators as ! to functions.

@michaelhkay
Copy link
Contributor

I can't see why it's needed.

Obviously, to be able to convert such "higher-order" operators as ! to functions.

That's not a requirement, that's a feature. What requirement does it solve? What can we do more effectively that we can do now?

@dnovatchev
Copy link
Contributor Author

I can't see why it's needed.

Obviously, to be able to convert such "higher-order" operators as ! to functions.

That's not a requirement, that's a feature. What requirement does it solve? What can we do more effectively that we can do now?

I am not saying this is a requirement, and not pushing at all to introduce such a function.

It is just good to know that this can be done, and exactly how.

@dnovatchev
Copy link
Contributor Author

dnovatchev commented Jun 25, 2022

We missed these two:

instance of

castable as

Wondering what the type of the 2nd (RHS) argument should be?

Type of a type? Must be type.

Unfortunately we do not have type as a data item (object) in XPath. Maybe write a proposal for a "type object" or a "type type"?

There are non-terminal symbols corresponding to types in the XPath 3.1 grammar starting from Rule [79].

However if the language has operators on types, then a type certainly needs to be defined as a data item of the language. Passing or storing a type as a string makes it indistinguishable from a string. We want the type of a type to be type, not string.

I will be writing a separate proposal for this.

@AlainCouthures
Copy link

AlainCouthures commented Jun 25, 2022 via email

@michaelhkay
Copy link
Contributor

If we want to implement types as first class objects (and in my view it's a nice feature rather than a "must have") then we need to take into account that we have two overlapping type systems: sequence types and schema types. The second operand of "instance of" is a sequence type, the second operand of "castable as" is a SimpleType, which is a subtype of schema type. The fact that these are overlapping categories (with atomic types and "pure union types" as the intersection) creates a lot of potential complexity.

@rhdunn rhdunn added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature labels Sep 14, 2022
@michaelhkay
Copy link
Contributor

PROPOSAL

fn:op($operator as enum(",", "and", "or", "+", "-", "", "div", "idiv", "mod", "=", "<", "<=", ">", ">=", "!=", "eq", "lt", "le", "gt", "ge", "ne", "«", "»", "is", "||", "|", "union", "except", "intersect", "to")) as (function(item(), item()) as item())

Summary: returns a function corresponding to a supplied operator symbol

Rules: The effect of the function is equivalent to the following XQuery implementation

switch ($operator) {
  case "," return function($x, $y) {$x , $y}
  case "and" return function($x, $y) {$x and $y}
  case "or" return function($x, $y) {$x or $y}
  case "+" return function($x, $y) {$x + $y}
  ...
  case "to" return function($x, $y) {$x to $y}
  default return error()
}

EXAMPLES
op("+")(3, 4) returns 7
etc

@dnovatchev
Copy link
Contributor Author

@michaelhkay

Rules: The effect of the function is equivalent to the following XQuery implementation

switch ($operator) {
  case "," return function($x, $y) {$x , $y}
  case "and" return function($x, $y) {$x and $y}
  case "or" return function($x, $y) {$x or $y}
  case "+" return function($x, $y) {$x + $y}
  ...
  case "to" return function($x, $y) {$x to $y}
  default return error()
}

EXAMPLES op("+")(3, 4)
returns 7

What about the pure XPath and more concise:

let $ops := map {
  ","   : function($x, $y) {$x , $y},
  "and" : function($x, $y) {$x and $y},
  "or"  : function($x, $y) {$x or $y},
  "+"   : function($x, $y) {$x + $y},
  (: . . . . . . . . . . . . . . . . . . :)
  "to"  : function($x, $y) {$x to $y}
}
 return $ops("+")(3, 4)

@michaelhkay
Copy link
Contributor

Thanks, yes, that's an improvement.

michaelhkay added a commit to michaelhkay/qtspecs that referenced this issue Sep 30, 2022
@michaelhkay
Copy link
Contributor

I've submitted a pull request.

I decided not to give a concrete implementation in such turgid detail, but to keep it high level.

ndw pushed a commit to ndw/qtspecs-xslt4 that referenced this issue Oct 10, 2022
michaelhkay added a commit to michaelhkay/qtspecs that referenced this issue Oct 18, 2022
@michaelhkay
Copy link
Contributor

Now implemented.

ndw pushed a commit to ndw/qtspecs-xslt4 that referenced this issue Oct 19, 2022
ndw pushed a commit to ndw/qtspecs-xslt4 that referenced this issue Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

7 participants