Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Function Calls: Processing Empty Sequences #707

Closed
ChristianGruen opened this issue Sep 15, 2023 · 15 comments
Closed

Dynamic Function Calls: Processing Empty Sequences #707

ChristianGruen opened this issue Sep 15, 2023 · 15 comments
Labels
Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action XPath An issue related to XPath

Comments

@ChristianGruen
Copy link
Contributor

A fundamental – and brilliant – property of XPath is that many operations tolerate empty sequences: Instead of throwing an error, the empty result is passed on unchanged to the next operation. While this is unrewardingly confusing for binary operations (() + 1, () eq 5), it’s wonderful for pipelines:

(: paths :)
$nodes / a / b / c
(: lookups :)
$data ? 1 ? 2 ? 3
(: simple map operators :)
$data ! do(.) ! something(.)
(: arrow operator works differently, but the syntax is similar:  :)
$data => do() => something()

As far as I can judge, it would be a very simple and user-friendly addition if we extended dynamic function calls to return an empty sequence (instead of raising an error) if the base expression is an empty sequence. This way, the following expressions would all run through:

let $map := map { 'giovanni': map { 'city': 'roma' } }
return $map('andrea')('city'),
let $data := ()
return $data(1)(2)(3),
()(123),
()()

Many people use parentheses instead of the lookup operator for accessing maps & arrays, and the proposed change would make the syntax more interchangeable. I believe it would also be useful for function items in general.

@ChristianGruen ChristianGruen added XPath An issue related to XPath Enhancement A change or improvement to an existing feature labels Sep 15, 2023
@michaelhkay
Copy link
Contributor

There's a counter-argument, which is that propagating empty sequences is one of the things that makes XPath/XSLT very hard to debug: you get a select expression wrong, and the result is empty output, not an error.

I've been thinking recently about adding checked{} and unchecked{} modes. For example

<xsl:apply-templates select="checked{.//item}"/>

would throw an error if there are no items. The mode of execution would propagate downwards, so checked{a/b/c/d} would be able to tell you that a b was found, but it had no c children.

checked{item[22]} would have the effect of making sequences behave more like arrays, with bound checking; conversely unchecked{item?22} would return an empty sequence instead of throwing an error.

Hope this isn't hijacking your thread: my point is, if the behaviour were switchable like this, I would find your proposal much more appealing.

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Sep 15, 2023

Yes, checks can be helpful. We’ve done things like this in the past:

let $traverse := function($root, $steps) {
  fold-left($steps, $root, function($nodes, $step) {
    let $result := $nodes/*[name() = $step]
    return if ($result) then $result else error((), $step || ' not found')
  })
}
return $traverse(<xml><a><b/></a></xml>, ('a', 'b'))

…or, more generally:

let $traverse := fn($root, $steps) {
  fold-left($steps, $root, fn($nodes, $step) {
    $step($nodes) otherwise error()
  })
}
return $traverse(<xml><a><b/></a></xml>, (fn { a }, fn { b }))

The checked { $node/a/b } or $node/checked { a/b } syntax would certainly be easier (although I’d spontaneously be afraid of considering all the implications of implementing it).

Besides paths, would you also like to cover all other XPath functions and operations that accept empty sequences (arithmetic expressions, value comparisons, XPath 1.0 functions)?

@ChristianGruen
Copy link
Contributor Author

PS: What would be the difference between the two approaches?

<xsl:apply-templates select="checked{ a/b }"/>
<xsl:apply-templates select="a/b otherwise error()"/>

I assume that checked {} would give you a better error string, e. g., stating which of the steps yielded no results?

@michaelhkay
Copy link
Contributor

Would you also like to cover all other XPath functions and operations that accept empty sequences (arithmetic expressions, value comparisons, XPath 1.0 functions)?

I think the key thing is to catch "selection expressions" that select nothing. That includes axis steps, filter expressions, and map lookups. The detailed list would need to be carefully worked out. Other things like value comparisons generally only return () if one of their operands returns (), and the aim is to catch the emptiness as early as possible so you can work out which selection has "failed".

@michaelhkay
Copy link
Contributor

As regards alternatives such as "otherwise" or "treat as item()+", I think the key difference is that checked{} propagates to subexpressions, and therefore gives you the diagnostics closer to the point of failure. Also, the checking could be disabled at API level in the same way that xsl:assert checking is disabled.

@dnovatchev
Copy link
Contributor

dnovatchev commented Sep 16, 2023

As far as I can judge, it would be a very simple and user-friendly addition if we extended dynamic function calls to return an empty sequence (instead of raising an error) if the base expression is an empty sequence.

Let's not forget the proposal for the capability to add a default key-value to maps: #105

We can conveniently return the default value of $75 for a night's hotel price for any night after the third one:

let $price-for-night := map {
1 : 100,
2 : 90,
3 : 80
'\' : 75
}
  return
    $price-for-night(10)

produces 75 which is the wanted result, and much more usable than returning the empty sequence.

#105 actually subsumes this proposal, because if we want the empty sequence to be returned, we can simple express this as:

let $price-for-night := map {
1 : 100,
2 : 90,
3 : 80
'\' : ()
}
  return
    $price-for-night(10)

It is easy to see that this current proposal has a few shortcomings (the last one critical):

  • The caller has to do additional analysis of the result to check for the empty sequence.

  • In many cases we may want just an error to be raised, thus returning the empty sequence will need more coding and checking (as above) or, if this additional code has been accidentally omitted, may lead to subtle, difficult to catch and locate run-time errors. In other words, this is error-prone and the resulting run-time errors are difficult to catch and locate

  • In some cases, as with arrays, this is obviously the wrong thing to do:

        let $myAr := [1, ()]
          return $myAr(2)

    Evaluating this expression produces the empty sequence. And we don't know if the array had only one member, or its 2nd member was the empty sequence.

    Exactly the same confusion arises with a map, that has a key "K", whose value is the empty sequence. From the fact that empty($myMap?K) is true, it cannot be concluded whether or not $myMap has a key "K", whose value is the empty sequence (and we could probably do some meaningful processing using ()), or whether it hasn't a key named "K" at all (in which case we probably need to do some error-handling)

Lesson learned: It is good to know and remember that the empty sequence () is not some kind of "exceptional" or "error", or even "ignorable" value, and it is wrong to use it to convey such meaning.

@ChristianGruen
Copy link
Contributor Author

Let's not forget the proposal for the capability to add a default key-value to maps: #105

True; if you have full control over your input data, things look different. My proposal has mostly driven by users who work with external data (JSON, CSV, YAML, MD) converted to maps/arrays, and who tend to use parentheses instead of the lookup operator.

Another showstopper for this in practice is the incompatibility between maps and arrays. With unchecked expressions, array boundary checks could possibly be disabled (and map checks could be enabled with checked). JavaScript provides the Optional chaining operator for tolerant traversals. But that’s a indeed topic for another issue.

@dnovatchev
Copy link
Contributor

Let's not forget the proposal for the capability to add a default key-value to maps: #105

True; if you have full control over your input data, things look different. My proposal has mostly driven by users who work with external data (JSON, CSV, YAML, MD) converted to maps/arrays, and who tend to use parentheses instead of the lookup operator.

Another showstopper for this in practice is the incompatibility between maps and arrays. With unchecked expressions, array boundary checks could possibly be disabled (and map checks could be enabled with checked). JavaScript provides the Optional chaining operator for tolerant traversals. But that’s a indeed topic for another issue.

Most importantly, the 3rd issue from my comment is definitely a show-stopper:

  • In some cases, as with arrays, this is obviously the wrong thing to do:

        let $myAr := [1, ()]
          return $myAr(2)

    Evaluating this expression produces the empty sequence. And we don't know if the array had only one member, or its 2nd member was the empty sequence.
    Exactly the same confusion arises with a map, that has a key "K", whose value is the empty sequence. From the fact that empty($myMap?K) is true, it cannot be concluded whether or not $myMap has a key "K", whose value is the empty sequence (and we could probably do some meaningful processing using ()), or whether it hasn't a key named "K" at all (in which case we probably need to do some error-handling)

Lesson learned: It is good to know and remember that the empty sequence () is not some kind of "exceptional" or "error", or even "ignorable" value, and it is wrong to use it to convey such meaning.

@dnovatchev
Copy link
Contributor

dnovatchev commented Sep 17, 2023

Would you also like to cover all other XPath functions and operations that accept empty sequences (arithmetic expressions, value comparisons, XPath 1.0 functions)?

I think the key thing is to catch "selection expressions" that select nothing. That includes axis steps, filter expressions, and map lookups. The detailed list would need to be carefully worked out. Other things like value comparisons generally only return () if one of their operands returns (), and the aim is to catch the emptiness as early as possible so you can work out which selection has "failed".

This is just a kind of data validation.

This can be implemented as a simple decorator (#106) and can provide the necessary level of flexibility - no need for "special modes".

Actually, this is a decorator factory, that is a decorator that accepts its own parameters besides the function (or map or array) that it decorates, and produces a parameter-less decorator.

^empty-validation(strict := true())

produces a decorator that evaluates the function passed to it, and if the result of this evaluation is empty (whatever we define this value to be, or maybe pass it as another parameter to the decorator-factory) it raises an error.

And:

^empty-validation(strict := false())

produces a no-op (or identity) decorator, that evaluates the function and returns its result.

@ChristianGruen
Copy link
Contributor Author

Apart from new constructs/expressions, we could also use annotations or pragmas. – But that's out of scope, it might be better to create a new issue for this discussion.

@michaelhkay
Copy link
Contributor

As I understand it, the decorator concept applies only to functions, whereas we are concerned here with expressions such as $a/b/c or $a[1] or $a?code.

It's true that we could use pragma syntax for this; although it's currently XQuery-only.

@dnovatchev
Copy link
Contributor

dnovatchev commented Sep 17, 2023

As I understand it, the decorator concept applies only to functions, whereas we are concerned here with expressions such as $a/b/c or $a[1] or $a?code.

It's true that we could use pragma syntax for this; although it's currently XQuery-only.

Isn't it true that:

$a/b/c

is equivalent to:

fn($a as node()) {$a/b/c} ()

We can decorate any such function, which means any such expression.

Also, let us not forget that both maps and arrays are functions, therefore we can decorate both maps and arrays.

@ChristianGruen
Copy link
Contributor Author

@dnovatchev @michaelhkay I have created a new issue which we could use for further discussion on (un)checked evaluation: #709

@ChristianGruen
Copy link
Contributor Author

I propose to close this; there hasn’t been much applause (and one can still use the lookup operator as a lax fallback).

@ChristianGruen ChristianGruen added the Propose Closing with No Action The WG should consider closing this issue with no action label Dec 14, 2023
@ndw
Copy link
Contributor

ndw commented Dec 19, 2023

The CG agreed to close this issue without action meeting 59.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action XPath An issue related to XPath
Projects
None yet
Development

No branches or pull requests

4 participants