Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fn:void: Naming, Arguments #639

Closed
ChristianGruen opened this issue Jul 25, 2023 · 24 comments
Closed

fn:void: Naming, Arguments #639

ChristianGruen opened this issue Jul 25, 2023 · 24 comments
Labels
Discussion A discussion on a general topic. Editorial Minor typos, wording clarifications, example fixes, etc. Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators

Comments

@ChristianGruen
Copy link
Contributor

A new function fn:void was added to the spec (see #359 for details).

This issue can be used to discuss alternative names for the function, as was suggested by @dnovatchev.

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Editorial Minor typos, wording clarifications, example fixes, etc. Discussion A discussion on a general topic. labels Jul 25, 2023
@dnovatchev
Copy link
Contributor

I have reopened #359 and there provided preferred names for (what my understanding is of) the function.

We need to avoid placing in the Spec any function, whose meaning is not commonly understood -- see the l o o o o n g
thread of #359

@dnovatchev
Copy link
Contributor

I read the description of the function in the FO Specification::

This paragraph makes the function completely confusing to me and I wonder what on Earth would ever make me use it:

"It is ·implementation-dependent· whether the supplied argument is evaluated or ignored. An implementation may decide to evaluate ·nondeterministic· expressions and ignore deterministic ones."

I propose to remove this function from the FO Spec, until a better description is provided, that shows the reader a convincing use case for its unambiguous semantic, usability and value.

@ChristianGruen
Copy link
Contributor Author

I wonder what on Earth would ever make me use it:

There will definitely be no need for everyone to use the function; it always depends on the use cases you are confronted with, and I assume yours simply don’t apply.

It’s easy to confirm that an equivalent function (prof:dump) has become a popular solution in our own processor to avoid expressions like the ones I’ve listed in the initial comment of #359. It is also used now in one of the official XQFO function definitions, see map:get:

map:get(
  $map       as map(*),
  $key       as xs:anyAtomicType,
  $fallback  as function(xs:anyAtomicType) as item()*  :=  fn:void#1as item()*

…and it can e.g. be used as a compact fallback function for array:get if you want to get an empty sequence instead of an error for invalid array positions:

array:get(array { 1, 2, 3 }, 4, void#1)

@michaelhkay
Copy link
Contributor

I think it's a good name. The English verb "to void" means to cancel something, to nullify it, and that's a good description of what this function is doing; in addition there is a tradition in computing that associates the word "void" with functions that return nothing.

I think the main occasions I've needed something like this are in connection with fn:trace(), and we now have fn:log(), where fn:log(X) effectively does fn:trace(fn:void(X)) -- so that use case has disappeared. But I can think of others, for example

let $sort-key := if (CONDITION) then fn{age} else void#1
sort(//x, (), $sort-key)

It's a function I could live without, but it's a simple primitive that makes logical sense.

@ChristianGruen ChristianGruen changed the title fn:void: Naming fn:void: Naming, Arguments Jul 26, 2023
@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Aug 2, 2023

I’ve come across an interesting function in Mary Holstege’s nice MathLing library. It’s located in the testlib.xqy module:

declare function this:assertFails(
  $message as xs:string,
  $f as function() as item()*
) as empty-sequence()
{
  let $ok :=
    try {
      ($f(), false())=>tail()
    } catch * {
      true()
    }
  return (
    if ($ok) then ()
    else fn:error(QName("http://mathling.com/errors","ml:FAILURE"), "Did not get expected error: "||$message)
  )
};

I noticed that the code is evaluated differently, depending on the processor you use. The reason is that there’s no need to evaluate $f() if it can statically be detected that it will always return a single item.

You might think that fn:void could come to the rescue…

  try {
    void($f()),
    fn:error(QName("http://mathling.com/errors","ml:FAILURE"), "Did not get expected error: "||$message)
  } catch err:* {
    (: expected :)
  }

…but it won’t in practice, as we don’t enforce the evaluation of the argument. I still think it would be a bad idea, but we could think about adding an option that makes the evaluation strategy explicit.

I wonder how others would write code like this? How would you proceed to enforce the evaluation of the function that’s supplied to this:assertFails without returning its result?

@dnovatchev
Copy link
Contributor

dnovatchev commented Aug 2, 2023

The reason is that there’s no need to evaluate $f() if it can statically be detected that it will always return a single item.

Not if the function is indeterministic.

We could introduce a new type of function: "external", for which we must not make any assumptions, thus it needs to be evaluated if the expression containing a call to it is evaluated.

@dnovatchev
Copy link
Contributor

dnovatchev commented Aug 4, 2023

The reason is that there’s no need to evaluate $f() if it can statically be detected that it will always return a single item.

If $f uses the result of an expression that calls fn:transform, or maybe even better: fn:parse-xml, it is not correct to conclude that $f will always return the same result.

Thus, $f() will need to be evaluated by any compliant XPath 3.1 processor.

@ChristianGruen
Copy link
Contributor Author

Some functions in the spec, including fn:parse-xml and fn;transform, are marked as nondeterministic. The current specification doesn't mandate those functions to be evaluated, though. For example, a processor is allowed to simplify count(fn:parse-xml($s)) < 2 to true().

@dnovatchev
Copy link
Contributor

dnovatchev commented Aug 4, 2023

Some functions in the spec, including fn:parse-xml and fn;transform, are marked as nondeterministic. The current specification doesn't mandate those functions to be evaluated, though. For example, a processor is allowed to simplify count(fn:parse-xml($s)) < 2 to true().

Then why did these functions get added to the Spec, in the first place? With this clarification they are meaningless!

I couldn't find any such rule explicitly stated in the FO 3.1 Spec. Here is the complete definition of nondeterministic:

image

The description of fn:parse-xml says nothing about the fact that the specification doesn't mandate this function to be evaluated. This feels like intentionally misleading...

@ChristianGruen
Copy link
Contributor Author

Then why did these functions get added to the Spec, in the first place?

Do you refer to fn:parse-xml and fn:transform? I wasn’t involved, I assume it was to parse XML and transform XSL ;)

The description of fn:parse-xml says nothing about the fact that the specification doesn't mandate this function to be evaluated.

That's no specific property of this function. It’s a general thing that expressions needn't be evaluated if their results are not further processed.

@dnovatchev
Copy link
Contributor

Then why did these functions get added to the Spec, in the first place?

Do you refer to fn:parse-xml and fn:transform? I wasn’t involved, I assume it was to parse XML and transform XSL ;)

The description of fn:parse-xml says nothing about the fact that the specification doesn't mandate this function to be evaluated.

That's no specific property of this function. It’s a general thing that expressions needn't be evaluated if their results are not further processed.

See my updated comment above. There is no text in the spec saying that an implementation may choose not to evaluate a nondeterministic function.

@ChristianGruen
Copy link
Contributor Author

See my updated comment above. There is no text in the spec saying that an implementation may choose not to evaluate a nondeterministic function.

That's no specific property of nondeterministic functions. It’s a general thing that expressions needn't be evaluated if their results are not further processed.

@dnovatchev
Copy link
Contributor

dnovatchev commented Aug 4, 2023

See my updated comment above. There is no text in the spec saying that an implementation may choose not to evaluate a nondeterministic function.

That's no specific property of nondeterministic functions. It’s a general thing that expressions needn't be evaluated if their results are not further processed.

You provided this example:

The current specification doesn't mandate those functions to be evaluated, though. For example, a processor is allowed to simplify count(fn:parse-xml($s)) < 2 to true()

Why do you think the result of fn:parse-xml($s) is "not further processed" (it clearly is) in the expression count(fn:parse-xml($s))

what would be the result of evaluating:

if(count(fn:parse-xml($s)) < 2)
  then 5
  else 7

@michaelhkay
Copy link
Contributor

Of course you don't have to evaluate a function if you already know the answer. For example if someone writes abs(EXP) >= 0 then you know the answer is true without having to evaluate EXP.

@dnovatchev
Copy link
Contributor

dnovatchev commented Aug 4, 2023

@michaelhkay,

Of course you don't have to evaluate a function if you already know the answer. For example if someone writes abs(EXP) >= 0 then you know the answer is true without having to evaluate EXP.

How can you statically know the value of count(fn:parse-xml($s)//*) ?

@michaelhkay
Copy link
Contributor

How can you statically know the value of count(fn:parse-xml($s)//*)

You can't know its actual value, but you can know that it's >= 1. Not that I can see any optimizer taking the trouble to do that.

@dnovatchev
Copy link
Contributor

@michaelhkay The true expression is this:

if(count(fn:parse-xml($s)//*) < 2)
  then 5
  else 7

And the processor cannot know statically the result of evaluating this

@michaelhkay
Copy link
Contributor

michaelhkay commented Aug 4, 2023

I can think of cases where it can. For example if $s is

let $s := "<A>" || serialize($v) || "</A>"

and $v is

let $v := element{"X"}{42}

then a processor that cares to do the analysis can indeed infer statically that parse-xml($s) will contain two element nodes. It seems implausible that any processor would attempt to do this analysis, but I'm amazed at the power of the data flow analysis done by IntelliJ even on a procedural language like Java.

@dnovatchev
Copy link
Contributor

I can think of cases where it can. For example if $s is

let $s := "<A>" || serialize($v) || "</A>"

and $v is

let $v := element{"X"}{42}

then a processor that cares to do the analysis can indeed infer statically that parse-xml($s) will contain two element nodes. It seems implausible that any processor would attempt to do this analysis, but I'm amazed at the power of the data flow analysis done by IntelliJ even on a procedural language like Java.

We can use something as:

if(count(fn:parse-xml(unparsed-text($myUrl))//*) < 2)
  then 5
  else 7

@michaelhkay
Copy link
Contributor

It's not hard to come up with expressions where I can't think of any way of making any static inferences about the result, but I'm not sure what such examples prove! My experience over many years with the test suite is that implementors are very imaginative with the optimizations they discover.

Going back a few steps in the thread, you said:

There is no text in the spec saying that an implementation may choose not to evaluate a nondeterministic function.

and that is simply incorrect. See XPath 3.1 §2.3.4, for example:

In some cases, a processor can determine the result of an expression without accessing all the data that would be implied by the formal expression semantics. For example, the formal description of filter expressions suggests that $s[1] should be evaluated by examining all the items in sequence $s, and selecting all those that satisfy the predicate position()=1. In practice, many implementations will recognize that they can evaluate this expression by taking the first item in the sequence and then exiting.

This is equally true of the expression (23, fn:transform(XXX))[1].

@ChristianGruen
Copy link
Contributor Author

Why do you think the result of fn:parse-xml($s) is "not further processed" (it clearly is) in the expression count(fn:parse-xml($s))

Not the result matters here, but the number of resulting items, and this information is statically known for fn:parse-xml, it is 0 or 1.

The interesting thing I believe is that we are trying to be smarter than the processor, as there is no vendor-independent way to enforce the evaluation of an expression. The current definition of nondeterminstic in the specification doesn't help here. A processor is not obliged to evaluate an expression just because it is statically known to be nondeterministic; it must only ensure that the expression is evaluated again and again if it is repeatedly evaluated. This could possibly be changed (and fn:void could then be labeled as nondeterministic), but it would be a far-reaching change and too restrictive for the existing nondeterministic functions.

I have no easy answer, in particular none that would easily be applicable to different processors.

@dnovatchev
Copy link
Contributor

dnovatchev commented Aug 4, 2023

Going back a few steps in the thread, you said:

There is no text in the spec saying that an implementation may choose not to evaluate a nondeterministic function.

and that is simply incorrect. See XPath 3.1 §2.3.4, for example:

In some cases, a processor can determine the result of an expression without accessing all the data that would be implied by the formal expression semantics. For example, the formal description of filter expressions suggests that $s[1] should be evaluated by examining all the items in sequence $s, and selecting all those that satisfy the predicate position()=1. In practice, many implementations will recognize that they can evaluate this expression by taking the first item in the sequence and then exiting.

This is equally true of the expression (23, fn:transform(XXX))[1].

This text does not mention "nondeterministic". And if the processor cannot skip the evaluation of an expression that depends on the value of a call to a nondeterministic function, then the processor would have to evaluate that call.

As for the imagination of implementors, one can use any known law/formula that the Processor cannot prove..

For example, that the root of:

sin(x) - 0.5 = 0

in a close interval containing π / 4 is π / 6 (30° degrees).

And the code of the function can use the value of the root (straightforward to calculate using the FXSL function findRootNR, as can be seen here: ) of this equation as a base upon which to make a decision what value to return

@ChristianGruen
Copy link
Contributor Author

I propose that we close this issue unless we can find substantial agreement on what could be changed or better documented.

@ChristianGruen ChristianGruen added the Propose Closing with No Action The WG should consider closing this issue with no action label Jan 19, 2024
@ndw
Copy link
Contributor

ndw commented Jan 23, 2024

The CG agreed to close this issue without action meeting 062

@ndw ndw closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion A discussion on a general topic. Editorial Minor typos, wording clarifications, example fixes, etc. Propose Closing with No Action The WG should consider closing this issue with no action XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

4 participants