Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function identity: documentation, nondeterminism #908

Closed
ChristianGruen opened this issue Dec 17, 2023 · 13 comments
Closed

Function identity: documentation, nondeterminism #908

ChristianGruen opened this issue Dec 17, 2023 · 13 comments
Assignees
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. Propose Closing with No Action The WG should consider closing this issue with no action XDM An issue related to the XPath Data Model XQFO An issue related to Functions and Operators

Comments

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Dec 17, 2023

In #520, the concept of function identities was introduced. This is what the current draft says:

XDM, 2.9.4 Function Items

identity: an abstract property that can be used to test whether two variables refer to the same function or to different functions. This property is exposed only for this purpose.

Note: Currently, the concept of function identity is used for two purposes: firstly, when functions appear in the arguments supplied to the fn:deep-equal function; and secondly, in establishing whether the arguments and results of a function are "the same" when deciding whether the function is deterministic.

Note: Function identity is not currently defined for maps and arrays, because in the circumstances where function identity would otherwise be used, maps and arrays are compared by examining their content.

XQFO, 1.8.4 Properties of functions

  1. […] the two function items have the same function identity. The concept of function identity is explained in Section 2.9.4 Function Items.

XQFO, 14.2.8 fn:deep-equal

c. $i1 and $i2 have the same function identity. The concept of function identity is explained in Section 2.9.4 Function Items.

XQFO, 17.1.1 fn:function-lookup

The function identity is determined in the same way as for a named function reference. Specifically, if there is no context dependency, two calls on fn:function-lookup with the same name and arity must return the same function.

While I definitely believe in the concept, I believe the documentation is still cryptic, or even impossible, to understand, at least without reading #520 or consuming the existing QT4 test cases. Here are some questions that I’m trying to answer:

  • Does “abstract property” mean that the property will not be materialized in an implementation, or does it mean that the property too vague to be precisely defined?
  • We should try to specify what “refer to the same function” means. Are function properties that allow us to at least safely identify a subset of functions the same? For example, will true#0 and true#0 always be identical? The test cases imply this, whereas Function identity #520 doesn’t.
  • In XQFO 17.1.1, there’s a hint that context-dependency influences the decision if functions are identified as equal. Does this mean that name#0 and name#0 cannot be equal? Or can, or will, they be equal if the context is identical?
  • The term “deterministic” does not appear anywhere else in the XDM spec, so one is inclined to think of the XQFO nondeterminism. It is then unclear whether two instances of, for example, map:entries and fn:parse-xml can be “the same” if the parameters are the same.
  • In XQFO 17.1.1, “must return the same function.” is also unclear: What exactly is meant by “same function”? Is it a function that creates an identical result (thus, excluding nondeterministic functions like fn:parse-xml)?
  • We should explain better what was the motivation to include the context-dependency of functions in the definition. It would certainly be more intuitive if both deep-equal(name#0, name#0) and deep-equal(name#1, name#1) returned true.

I’m sorry for not offering good answers in return. I could try to describe what we’ve implemented so far – mostly inspired by the test cases – but I’m not sure if it meets the requirements.

Related: #333

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators XDM An issue related to the XPath Data Model Editorial Minor typos, wording clarifications, example fixes, etc. Propose for V4.0 The WG should consider this item critical to 4.0 labels Dec 17, 2023
@michaelhkay
Copy link
Contributor

Does “abstract property” mean that the property will not be materialized in an implementation, or does it mean that the property too vague to be precisely defined?

It means that the property is not exposed to users it's merely there to underpin the semantics of comparing two functions with each other.

We should try to specify what “refer to the same function” means.

Indeed, that's exactly what we are trying to do.

Are function properties that allow us to at least safely identify a subset of functions the same?

Not sure I understand the question.

For example, will true#0 and true#0 always be identical? The test cases imply this, whereas #520 doesn’t.

§4.6.2.4 (Named function references) states "Otherwise [if the function is not context-dependent], a function identity that is the same as that produced by the evaluation of any other named function reference with the same function name and arity. "So yes, true#0 and true#0 will always be identical.

In XQFO 17.1.1, there’s a hint that context-dependency influences the decision if functions are identified as equal. Does this mean that name#0 and name#0 cannot be equal? Or can, or will, they be equal if the context is identical?

It says they are not guaranteed to be identical. But there is a let-out clause (in §4.6.2.4): "Optimizers, however, are allowed to detect cases where the captured context happens to be the same, or where any variations are immaterial, and where it is therefore safe to return the same function item each time."

XQFO 17.1.1 refers to this: "The function identity is determined in the same way as for a named function reference."

The term “deterministic” does not appear anywhere else in the XDM spec, so one is inclined to think of the XQFO nondeterminism. It is then unclear whether two instances of, for example, map:entries and fn:parse-xml can be “the same” if the parameters are the same.

You're referring I think to the note in XDM "Currently, the concept of function identity is used for two purposes: firstly, when functions appear in the arguments supplied to the fn:deep-equal function; and secondly, in establishing whether the arguments and results of a function are "the same" when deciding whether the function is deterministic." We could add cross-references to the two places mentioned if you think it would help.

In XQFO 17.1.1, “must return the same function.” is also unclear: What exactly is meant by “same function”?

It means "must return functions having the same identity". (Talking about multiple things having the same identity is always tricky, because if two things have the same identity then they are really one thing not two, and use of a plural noun is inappropriate. We have this problem everywhere with node identity, e.g. when we talk about removing duplicate nodes.

We should explain better what was the motivation to include the context-dependency of functions in the definition. It would certainly be more intuitive if both deep-equal(name#0, name#0) and deep-equal(name#1, name#1) returned true.

A more realistic example is

let $x := <e/>/name#0, $y := <f/>/name#0 return deep-equal($x, $y) where the result has to be false.

@ChristianGruen
Copy link
Contributor Author

Thanks. I managed to ignore what the XPath spec say about function identities. The reason was that XQFO, 14.2.8 fn:deep-equal points to the XDM spec…

c. $i1 and $i2 have the same function identity. The concept of function identity is explained in Section 2.9.4 Function Items.

…which doesn’t provide the promised explanation. Maybe the XPath spec could be referenced here instead?

[…] when deciding whether the function is deterministic." We could add cross-references to the two places mentioned if you think it would help.

I still haven’t understood what “deterministic” is supposed to say here. How can it be “decided” whether a function is deterministic, if determinism is a fixed function property? Maybe we should choose a different term?

It means "must return functions having the same identity". (Talking about multiple things having the same identity is always tricky, because if two things have the same identity then they are really one thing not two, and use of a plural noun is inappropriate.

I see. Maybe with a link to the definition: “must return functions with the same function identity” ?


Two more observations: XPath, 4.6.2.4 Named Function References states that:

Otherwise, a function identity that is the same […] This rule also applies when the target function definition is nondeterministic

…whereas XPath, 4.6.2.7 Function Identity says:

Functions with the same identity are indistinguishable in every way; in particular, any function call with identical arguments will produce an identical result.

XQFO, 1.8.4 Properties of Functions defines identical values (nodes) as follows:

[…] 2. Both items are nodes, and represent the same node. […]

…which doesn’t match for nondeterministic function calls like parse-xml#1('<x/>') is parse-xml#1('<x/>'). I think we should either treat nondeterministic function items to be non-equal to each other, or replace “identical result” with “results that are deep-equal to each other”.

My next question is about 4.6.2.4 Named Function References. It says:

In the general case, a function reference to a context-dependent function will produce different results every time it is evaluated, because the resulting function item has a captured context […]

Doesn’t <x/> ! name#0() produce the same result every time – or is this a case that is regarded not to be “general”? An example would be helpful.

@ChristianGruen
Copy link
Contributor Author

…which doesn’t match for nondeterministic function calls like parse-xml#1('<x/>') is parse-xml#1('<x/>'). I think we should either treat nondeterministic function items to be non-equal to each other, or replace “identical result” with “results that are deep-equal to each other”.

My first suggestion is probably nonsense; otherwise, a nondeterministic function would not be identical to itself anymore.

In either case, I think we cannot maintain the definition for identical functions that “any function call with identical arguments will produce an identical result.”. This can only be true for deterministic functions. The definition “results that are deep-equal to each other” would still not be applicable to external nondeterministic functions like http:send-request, which might return different results with each call.

@ChristianGruen
Copy link
Contributor Author

@michaelhkay I’m sorry, I’m still struggling to properly and comprehensively implement the comparison of function items. I wanted to contribute more test cases, but chances are high that they might be wrong. Some questions:

A) Is the following code required to return true (or is false a legal alternative result)?

<x/>
! name#0
! deep-equal(., .)

B) Is the comparison of two identical, but context-dependent, functions allowed to return true?

deep-equal(
  <xml/> ! id#1,
  <xml/> ! id#1
)

C) For partial function applications, the rule is “A new function identity distinct from the identity of any other function item.”, but deep-equal requires the functions to have the same identity. Is the following code still allowed to return true?

let $xml := <xml/>
return deep-equal(
  id(?, $xml),
  id(?, $xml)
)

D) If a nondeterministic function is repeatedly invoked, it may return different results w.r.t. the order, or node identity (if we consider external libraries, the result itself may be completely different). Maybe we should treat context-dependent and nondeterministic functions equally, and require the following code to return false?

deep-equal(
  analyze-string('a', ?),
  analyze-string('a', ?)
)

@michaelhkay
Copy link
Contributor

A) Is the following code required to return true (or is false a legal alternative result)?

! name#0 ! deep-equal(., .)

There is only one function item involved here. Therefore both operands are functions having the same identity, therefore the result must be true.

B) Is the comparison of two identical, but context-dependent, functions allowed to return true?

deep-equal(
  <xml/> ! id#1,
  <xml/> ! id#1
)

The spec for named function references says that: "Optimizers, however, are allowed to detect cases where the captured context happens to be the same, or where any variations are immaterial, and where it is therefore safe to return the same function item each time. "

This applies here: the two named function references are evaluated with different dynamic contexts, but a smart optimizer is allowed to determine that the in both cases the result of the relevant function will always be an empty sequence (because neither of the two documents contains any ID values); and therefore both id#1 references can evaluate to the same function, a function that always returns an empty sequence.

If the expression were

deep-equal(
  <a xml:id='x'/> ! id#1,
  <a xml:id='x'/> ! id#1
)

then no such optimization would be possible, because the the two function items are different: consider

<a xml:id='x'/> ! id#1 ('x') is <a xml:id='x'/> ! id#1 ('x')

which returns false.

C) For partial function applications, the rule is “A new function identity distinct from the identity of any other function item.”, but deep-equal requires the functions to have the same identity. Is the following code still allowed to return true?

let $xml :=
return deep-equal(
id(?, $xml),
id(?, $xml)
)

The spec for partial function application refers to §4.6.2.7 which states:

An optimizer is permitted to rewrite expressions in such a way that repeated evaluation is avoided if it can be established that the result will be the same each time, and this may be done without consideration of function identity. For example, if the expression contains(?, "e") appears within the body of a for expression, or if the same expression is written repeatedly in a query, then an optimizer may decide to evaluate it once only, and thus return the same function item each time.
Similarly, optimizers are allowed to replace any expression with an equivalent expression; for example, count(?) may be rewritten as count#1.

This applies here. An optimizer that identifies common subexpressions is allowed to rewrite

return deep-equal(
  id(?, $xml),
  id(?, $xml)
)

as

let $temp := id(?, $xml) return deep-equal($temp, $temp)

which evaluates to true.

D) If a nondeterministic function is repeatedly invoked, it may return different results w.r.t. the order, or node identity (if we consider external libraries, the result itself may be completely different). Maybe we should treat context-dependent and nondeterministic functions equally, and require the following code to return false?

deep-equal(
  analyze-string('a', ?),
  analyze-string('a', ?)
)

Again, the rules in §4.6.2.7 say that the partial function application can be treated as a common subexpression and evaluated once to produce the same function item each time.

@ChristianGruen
Copy link
Contributor Author

"Optimizers, however, are allowed to detect cases where the captured context happens to be the same, or where any variations are immaterial, and where it is therefore safe to return the same function item each time. "

I see. So I assume that “any variations are immaterial” implies that the result of an invoked function item will be “the same” as the evaluation of another function item with the same name and arity.

...
then no such optimization would be possible, because the the two function items are different: consider
...
<a xml:id='x'/> ! id#1 ('x') is <a xml:id='x'/> ! id#1 ('x')

So “the same” means that both deep-equal(A, B) returns true, and, for nodes, A is B returns true. Did I get this right?

I would then assume that deep-equal(fn { 1 }, fn { 1 }) may return true, whereas deep-equal(fn { <a/> }, fn { <a/> }) must return false; correct?

deep-equal(
  analyze-string('a', ?),
  analyze-string('a', ?)
)

Again, the rules in §4.6.2.7 say that the partial function application can be treated as a common subexpression and evaluated once to produce the same function item each time.

I believe this contrasts with the example further above:

  • For fn { <a xml:id='x'/> ! id#1 ('x') }, it was said that not no optimization is possible, because the results of two functions with that function body would have different node identities.
  • If fn { analyze-string('a', ?) } is invoked multiple times, the results will always have different node identities.

@michaelhkay
Copy link
Contributor

I see. So I assume that “any variations are immaterial” implies that the result of an invoked function item will be “the same” as the evaluation of another function item with the same name and arity.

Yes. I agree it's not a mathematically rigorous formulation, but that's the intended reading.

So “the same” means that both deep-equal(A, B) returns true, and, for nodes, A is B returns true. Did I get this right?

Yes, for the moment we're stuck with the fact that node constructors are guaranteed to return a different node on each evaluation. We could consider changing that, but it's a separate topic.

I would then assume that deep-equal(fn { 1 }, fn { 1 }) may return true, whereas deep-equal(fn { }, fn { }) must return false; correct?

Yes.

If fn { analyze-string('a', ?) } is invoked multiple times, the results will always have different node identities.

Not so. The spec for analyze-string() says: If the function is called twice with the same arguments, it is [·implementation-dependent·] whether the two calls return the same element node or distinct (but deep equal) element nodes. In this respect it is [·nondeterministic with respect to node identity·]

@ChristianGruen
Copy link
Contributor Author

Thanks, things are getting clearer. This means that…

  1. fn(){<a/>} ! deep-equal(., .) must return true even though fn(){<a/>} ! (.() is .()) returns false; and
  2. deep-equal(fn(){<a/>}, fn(){<a/>}) must return false even though the function bodies are identical.

In a nutshell:

  1. When function items are compared with fn:deep-equal, it does not matter if they return the same or different results.
  2. However, when function items are optimized, it does matter: Two function items can only get the same identity if the optimizing processor would return the same result for the same arguments.

@michaelhkay
Copy link
Contributor

If it helps, the Saxon implementation is to treat two function items as deep-equal if and only if they are represented internally by the same Java object. For that to work we have to make sure that a function reference like count#1 always delivers the same Java object. But beyond that, we simply rely on the fact that (a) two expressions (or expression evaluations) will only return the same Java object if the compiler has already decided the two expressions are equivalent, in which case we are allowed to treat the functions as identical, and (b) the only case where the spec requires two expressions to return identical function items is the count#1 case, for a non-context-dependent function, in which case we treat the expression as a compile-time constant.

@ChristianGruen
Copy link
Contributor Author

If it helps, the Saxon implementation […]

Yes, that’s helpful. We proceed differently: <a/> ! name#0 is rewritten to fn() { name(<a/>) }, so…

deep-equal(<e/> ! name#0, <e/> ! name#0)

…will be rewritten to…

deep-equal(fn() { name(<e/>) }, fn() { name(<e/>) })
→ deep-equal(fn() { 'e' }, fn() { 'e' })

…which means we have no information at runtime whether the original expression was a named function reference.

It seems we’ll either need to register all inline functions as constants or compare the parameters and function bodies at runtime (which is a bit tricky as the parameter names may change during compilation, so count#1 may result in fn($input1) { count($input1) }) or fn($input2) { count($input2) })).

I’ll have more thoughts on the implications.

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Jan 15, 2024

We’ve found a way to treat different kinds of functions equal (in short, by applying more rewrites at compile time, and comparing the Object identity or arity/function body at runtime).

My assessment is that we can’t expect all implementations to behave identically to Saxon (or it can be very difficult to simulate its behavior). In particular, …

I would then assume that deep-equal(fn { 1 }, fn { 1 }) may return true, whereas deep-equal(fn { }, fn { }) must return false; correct?

Yes.

…we should allow deep-equal(fn { <a/> }, fn { <a/> }) to return true. The following examples reflect the status quo…

Function Result
deep-equal(<a/>, <a/>) true
deep-equal(fn { <a/> }, fn { <a/> }) false
let $f := fn { <a/> } return deep-equal($f, $f) true
deep-equal(fn { 1 }, fn { 1 }) true or false

I think that the result should be more intuitive.

@ChristianGruen ChristianGruen changed the title Function identity: documentation still too vague Function identity: documentation, nondeterminism Mar 26, 2024
@ChristianGruen
Copy link
Contributor Author

While I’ve now understood the implications of the current definition of function identity, it could still be challenging for future implementors. As I cannot offer a particular suggestion on how to improve the status quo, I propose to close the issue.

@ChristianGruen ChristianGruen added Propose Closing with No Action The WG should consider closing this issue with no action and removed Propose for V4.0 The WG should consider this item critical to 4.0 labels May 22, 2024
@ndw
Copy link
Contributor

ndw commented May 28, 2024

The CG agreed to close this issue without further action at meeting 079.

@ndw ndw closed this as completed May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Editorial Minor typos, wording clarifications, example fixes, etc. Propose Closing with No Action The WG should consider closing this issue with no action XDM An issue related to the XPath Data Model XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

3 participants