Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality of function items #333

Closed
michaelhkay opened this issue Feb 1, 2023 · 33 comments
Closed

Equality of function items #333

michaelhkay opened this issue Feb 1, 2023 · 33 comments
Labels
Enhancement A change or improvement to an existing feature XPath An issue related to XPath

Comments

@michaelhkay
Copy link
Contributor

The question of equality of function items arises in the discussion of determinism of functions and memo functions in XSLT - see F&O 3.1 section 1.7.4, and came up again today in the context of fn:deep-equal.

1.7.4 makes a brave attempt to describe situations under which two functions are "identical", though leaving implementations room for flexibility. I think we can build on this and improve it, by describing more situations in which the result is predictable.

The data model describes the properties of a function item, and we can say that two function items are equivalent if all their properties are the same.

The properties that cause problems are the "implementation" and the "closure", and in both cases I think we can find ways of doing a comparison.

For the implementation, we can define this by reference to the way in which the implementation property is set. For function items constructed by reference to static functions (e.g. my:func#3 or function-lookup(my:func, 3)) then they have the same implementation if and only if they are constructed by reference to the same static function. Similarly for function items constructed by evaluating an inline function expression. Other ways of constructing a function item, such as partial application, essentially create a new function with the same implementation as an existing function and a different closure.

For the closure (ignoring for the moment functions that include parts of the dynamic context in their closure), this is essentially just a set of variable bindings and it's not too difficult to say that functions are identical if these sets of variable bindings are identical.

@dnovatchev
Copy link
Contributor

dnovatchev commented Feb 2, 2023

Maybe a stupid question, but why is this capability necessary? Just to achieve the property of reflexivity for the comparison operation? But is it justified trying to achieve this at any cost?

Isn't it OK to postulate that the comparison of two functions of which at least one is neither an array or a map, is by default, or always, false() (in the case when we don't want the comparison to raise an error)?

I think this is better than trying to do the impossible (how would one compare two functions that happen to be external objects, that is just some binary code compiled from Java, C#, ..., etc. ?) Even if the source code is available, the same source code compiled with different compiler options (such as full, partial or no optimization, index boundary checking, etc.) results in different executable (binary) code instances for that could produce different results on the same tuple of arguments.

It was "heroic" in op:same-key() to postulate that NaN is equal to NaN, and the same for two INF values, but we all know that this isn't true ...

I believe that it would be better than that to limit the value-space for the parameters of fn:deep-equal() to sequences that don't (deeply)contain function items, which are neither maps or arrays, and to say that this value space could be extended (when an option is specified as an entry in the $options argument) with such functions, but then the comparison automatically returns false(). This will also save a lot of trouble work for the implementors.

Of course, we could also have another such entry in the $options argument, which when specified obliges the implementation to try to determine the "equality" of two such functions. But this must not be the default behavior, or otherwise this would result in compatibility issues with the previous versions of XPath.

@michaelhkay
Copy link
Contributor Author

Yes, reflexivity is important. There's the challenge of saying what it means for functions to be deterministic ("Same arguments, same result" requires a definition of "same"). And a deep-equal that delivers an error-free boolean result over the entire value space can be useful.

In the definition of determinism we do make an attempt to define what it means for two functions to be the same, but it's a rather feeble attempt, and I would like to see if we can improve it. There will always be some kind of recourse to things being implementation-defined (everything relating to external functions is already implementation-defined, so that's not a problem).

@dnovatchev
Copy link
Contributor

dnovatchev commented Feb 2, 2023

It is a flaw to believe that NaN is equal to NaN, and Infinity is equal to Infinity.

This is like saying that the Earth is flat or that the Sun rotates around the Earth.

I will never support tweaking the facts / laws of mathematics and physics simply to make some "beautiful" theory seem true.

I believe that there is a better way - just exclude the anomalous values from the Domain of the comparison-function.

We need an item type, let me call it (initially, due to lack of imagination) xs:trueNumeric which is the current value-space for xs:numeric (the union of all numeric types), from which 'NaN', '-INF' and 'INF' are removed.

Then provide fn:atomic-equal with an $options argument with a possible entry that makes it accept only xs:trueNumeric values.

Apply the same approach to fn:deep-equal - have an option that excludes function items from the allowed domains for the (deep)members of the two sequences; also have a similar option that excludes 'NaN', '-INF' and 'INF'

Only then, the comparison function becomes truly reflexive.

@ChristianGruen
Copy link
Contributor

I like the attempt to try to compare everything that’s possible to compare, provided that the behavior is well-defined (which is what we are trying to achieve). For more complex items, I think it’s both intuitive and justifiable to argue that such comparisons do not succeed, or may succeed in a future version of the language.

It is a flaw to believe that NaN is equal to NaN, and Infinity is equal to Infinity.

It’s also a question of definition. The floating-point placeholder value NaN is defined to be unequal to itself, and INF is defined to be equal, so we can safely work with that, even despite counterintuitive implications, which also exist in mathematics as well, though (such as INF - INF + INF not being equivalent to INF).

An endless number of solutions in computer science can be regarded as pragmatic tradeoff, particularly if we consider the technical limitations when things were introduced. It may be seen as a flaw that NaN was introduced as a pseudo-value in floating-point arithmetic computation. Raising errors may seem more appropriate today, but the standardized error handling is a rather new feature in programming languages, compared to floating points. I believe to remember that Konrad Zuse had already suggested using comparable to NaN.

If we don’t want to work with INF and NaN, we can use xs:decimal today.

@michaelhkay
Copy link
Contributor Author

Arguing about whether NaN is equal to NaN is like arguing about whether 0^0 should be 0 or 1. It's entirely a matter of convenience. Saying that you're not allowed to ask the question is one solution, certainly, but not always the best one.

@dnovatchev
Copy link
Contributor

For more complex items, I think it’s both intuitive and justifiable to argue that such comparisons do not succeed, or may succeed in a future version of the language.

But someone may not sleep well, because then we lose the property of reflexivity 😄

So we have two possible solutions here:

  1. Pretend that these uncomparable items are equal, so that reflexivity is thus achieved (and we can sleep well)

  2. Exclude such anomalous items from the domain of the comparison-function. Again, we do have reflexivity here but not at the cost of lying unusual definitions.

If we don’t want to work with INF and NaN, we can use xs:decimal today.

@ChristianGruen I see that you and I agree that solution 2 above is a good way to deal with the anomalies.

I think that we should not allow '- INF' , 'INF', '+INF' and NaN as values of keys of a map. This is the good way to get the dream of Reflexivity fulfilled, without needing ghost values, or the equivalents of "dark matter" in physics.

@dnovatchev
Copy link
Contributor

Arguing about whether NaN is equal to NaN is like arguing about whether 0^0 should be 0 or 1. It's entirely a matter of convenience. Saying that you're not allowed to ask the question is one solution, certainly, but not always the best one.

If there is no one good answer, then why should we accept one?

This is not forbidding the question, but realizing that the answer is a paradox -- like we have such in set theory

@ChristianGruen
Copy link
Contributor

If there is no one good answer, then why should we accept one?

Should we stop communicating because Derrida’s concept of différance compellingly suggests that communication between mankind is doomed from the ground up?

@dnovatchev
Copy link
Contributor

dnovatchev commented Feb 2, 2023

If there is no one good answer, then why should we accept one?

Should we stop communicating because Derrida’s concept of différance compellingly suggests that communication between mankind is doomed from the ground up?

C'mon,

Not using NaN and '+-Inf' values in value comparisons is not the same as "we stop communicating" . It is more like we do not spend time arguing how many angels can fit on the tip of a needle 😄

Also, such values can really be used for internal purposes, if they are not in the domain of comparison functions.

In the past I did point out, and more recently @michaelhkay agreed, that we do need to have a few reserved values, and these make the perfect candidates.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Feb 2, 2023

I would be pretty surprised if xs:double('INF') ! (. = .) (or 1 div 0e0 = 1 div 0e0, to choose a different syntax) didn’t return true.

Are there programming languages that behave like this?

@dnovatchev
Copy link
Contributor

I would be pretty surprised if xs:double('INF') ! (. = .) (or 1 div 0e0 = 1 div 0e0, to choose a different syntax) didn’t return true.

Are there programming languages that behave like this?

How can one say that two infinities with different cardinality are equal?

Like compare the number of all points in a segment with the number of those points that represent integers (or rational numbers).

We know well that the first number is greater than the latter, that is, there is no 1:1 mapping between the two sets.

@ChristianGruen
Copy link
Contributor

How can one say that two infinities with different cardinality are equal?

I’m the wrong to ask; other people much smarter than me have spent more time and mental resources on that many decades ago. You might find some helpful reasoning in the Wikipedia article on floating-point arithmetic.

@dnovatchev
Copy link
Contributor

I would be pretty surprised if xs:double('INF') ! (. = .) (or 1 div 0e0 = 1 div 0e0, to choose a different syntax) didn’t return true.

Are there programming languages that behave like this?

Even if we have in Python:

float('inf') == float('inf') returns True

we still have:

float('inf') - float('inf') returns NaN

Do note, the result is not 0 !

Thus, we end up with two numbers $x that we say are equal: $x eq $x , but at the same time $x - $x is Nan

@dnovatchev
Copy link
Contributor

How can one say that two infinities with different cardinality are equal?

I’m the wrong to ask; other people much smarter than me have spent more time and mental resources on that many decades ago. You might find some helpful reasoning in the Wikipedia article on floating-point arithmetic.

This resource doesn't even mention a comparison operation, not to speak of comparing Inf to Inf.

When we have a paradox, we could choose a convenient interpretation, but when we have facts, it is simply wrong/unacceptable to choose an interpretation that is not supported by the facts, and whose incorrectness is confirmed by the facts.

Thus it is possible to have infinity1 with bigger cardinality that infinity2. These two cannot be equal, the one with greater cardinality is bigger. There are more points on any segment of a line than the points that represent rational numbers on the whole line. Both sets have infinite number of elements, but the first number is bigger (not equal to) than the second number.

@ChristianGruen
Copy link
Contributor

Even if we have in Python:

float('inf') == float('inf') returns True

we still have:

float('inf') - float('inf') returns NaN

Do note, the result is not 0 !

Sure. That's how floating-point arithmetic has been defined. All that is not specific to XPath.

@dnovatchev
Copy link
Contributor

I like the attempt to try to compare everything that’s possible to compare, provided that the behavior is well-defined (which is what we are trying to achieve). For more complex items, I think it’s both intuitive and justifiable to argue that such comparisons do not succeed, or may succeed in a future version of the language.

If we define two functions to be "equal" when they have the same Domain and Codomain and for every specific argument tuple that is in the Domain they return the same value in the Codomain, then functions are generally not comparable, unless there is a proof of such equality.

This follows from the Halting problem. Even if we have a tool with which we can run the two functions on all possible argument tuples from the Domain, this tool cannot decide that the function execution will ever halt, thus it would be impossible in practice to get and compare the results. Notable exception is when the Domain is finite, and this corresponds to maps/arrays in XPath.

Languages in which functions can be proven to be equal (Such as Coq) typically yield some of their expressive power in order to achieve this capability -- for example such languages are not Turing-complete.

But in our case all three X-languages are Turing complete.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Feb 3, 2023

If we define two functions to be "equal" when they have the same Domain and Codomain and for every specific argument tuple that is in the Domain they return the same value in the Codomain, then functions are generally not comparable, unless there is a proof of such equality.

I would use your definition to describe the equivalence, not equality, of two functions. Otherwise, we would need to exclude non-deterministic and side-effecting function items such as fn:random-number-generator#0 or file:delete#1. Next, function names should be meaningful in the comparison.

Completeness cannot be achieved either way, though.

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Feb 3, 2023

Rest assured, I have no intention of trying to solve the halting problem. My aims are much more modest. Here is what I have in mind. This is a refinement of the text that we already have for describing "determinism" of functions.

Two function items F and G are defined to be demonstrably equal if it is possible to establish that for any supplied arguments, F and G will return the same result (or raise the same error). This specification describes some situations in which two functions are demonstrably equal; implementations may define other situations in which they are demonstrably equal, provided that two function items must only be deemed demonstrably equal if they can be guaranteed to be interchangeable in this sense.

Here are some examples of pairs of function items $F and $G that are demonstrably equal:

let $F := (any function item)
let $G := $F
let $F := (any function item)
let $G := [$F](1)
let $F := (any function item)
let $G := fn:replicate($F, 3) => fn:foot()

The above examples illustrate a general principle that if the output of an expression includes a function item that was present in the input, and the semantics of the expression do not modify the function item in any way, then the function item in the output is demonstrably equal to the function item in the input.

let $F := fn:exists#1
let $G := fn:exists#1
let $F := fn:contains(?, "z")
let $G := fn:contains(?, "z")
let $F := function($x, $y) {$x + $y}
let $G := function($x, $y) {$x + $y}

The above examples illustrate a general principle that if two function items are constructed in the same way, then they are demonstrably equal to each other.

An implementation may also (but is not required to) consider the following pairs of functions to be demonstrably equal:

let $F := function($x, $y) {$x + $y}
let $G := function($p, $q) {$p + $q}

(Note: this assumes we don't do anything to make parameter names in function items significant to the caller)

let $F := function($x, $y) {$x || $y}
let $G := function($x, $y) {concat($x, $y)}
let $F := function($x, $y) {3}
let $G := function($x, $y) {let $p := 3 return $p}

The above examples illustrate that we are not concerned with exact lexical equivalence of the function bodies. Generally, a parser will make some simplifications or normalisations of an expression as it is processed, which might include skipping whitespace and comments, expanding abbreviations, perhaps renaming variables or expanding namespace prefixes. The primary intent here is to allow implementors to normalize expressions before comparing them. It is not expected that any advanced techniques for detecting the equivalence of different algorithms will be invoked (though it is not precluded).

The following pair of functions must not be considered demonstrably equal, because they have different effects when the supplied arguments are an xs:dateTime value and an xs:duration value:

let $F := function($x, $y) {$x + $y}
let $G := function($x, $y) {$y + $x}

In practical terms, there are two ways in which equivalence of two function items can be demonstrated. In very many cases it is possible to do it simply by tracking the provenance of the function item. If the item isn't changed, then it remains the same: the output is demonstrably equal to the input. In the general case, however, function items can be compared by examining their properties, and this approach is described below.

The situations in which the specification mandates that two functions F and G are demonstrably equal are described below. To avoid confusion we will use the term "implementation" to refer to the XPath/XQuery language processor, and "function body" to refer to the "implementation" property of a function item.

  1. F and G must have the same name, or both must be anonymous
  2. F and G must have the same signature. Equivalence of function signatures is defined in more detail in section N below.
  3. F and G must have equivalent function bodies. This is defined in more detail in section P below.
  4. F and G must have equivalent captured contexts. [We'll extend the concept of "non-local variable bindings" in XDM to "captured context" (or "closure") to include other aspects of the dynamic context that are captured in the function item, for example the focus where relevant). This is defined in more detail in section Q below.

N - Equivalence of function signatures.

(This is relatively easy. It's tackled, for example, in defining the rules for overriding functions in xsl:override in XSLT)

P - Equivalence of function bodies.

Function bodies can be classified into 3 categories:

(a) User-written functions whose body is written in XPath, XQuery, XSLT, or some other language known to the processor

(b) Built-in functions whose implementation is internal to the processor

(c) External functions whose implementation is external to the processor, where the processor has the ability to invoke the function but not to inspect its internal logic

In the first category, two user written function bodies are demonstrably equivalent if they have identical source expressions and identical static contexts.

A processor may also determine that two such function bodies are equivalent if:

(a) any differences in the static context can be shown to be immaterial, and/or
(b) any differences in the source expressions (for example, lexical variations such as whitespace and comments; choice of variable names) can be shown to be immaterial

In the second category, the bodies of built-in functions having the same name and arity are deemed equivalent. (Note: internally, a processor might have different algorithms for implementing a function such as fn:contains depending say on whether the collation is known statically. For the purpose of this discussion however, these different algorithms all constitute parts of a single function body.)

In the third category, external functions are largely implementation-defined, and the same applies to considerations of their equivalence.

If any expression takes one function item as input and produces another in its result, and the rules for the expression say that the function body of the result is the same as the function body of the input, then the input and output function items are deemed to have the same function body. This applies trivially to operations such as filter expressions that leave items unchanged. It also applies to more complex operations, for example, a partial application of a function has the same function body as its input (it differs in the nonlocal variable bindings).

Q Equivalence of captured contexts

TBA

@dnovatchev
Copy link
Contributor

If we define two functions to be "equal" when they have the same Domain and Codomain and for every specific argument tuple that is in the Domain they return the same value in the Codomain, then functions are generally not comparable, unless there is a proof of such equality.

I would use your definition to describe the equivalence, not equality, of two functions. Otherwise, we would need to exclude non-deterministic and side-effecting function items such as fn:random-number-generator#0 or file:delete#1. Next, function names should be meaningful in the comparison.

Completeness cannot be achieved either way, though.

@ChristianGruen Seems you are wrong. Here is the definition of function equality as per Wikipedia:

"Two functions f and g are equal if their domain and codomain sets are the same and their output values agree on the whole domain. More formally, given f: X → Y and g: X → Y, we have f = g if and only if f(x) = g(x) for all x ∈ X"

@michaelhkay
Copy link
Contributor Author

@dnovatchev It's very easy to describe function equality as an abstract mathematical property of two mathematical functions. It's much harder to describe it as a quality that is computable given a pair of software engineering artefacts, especially in a world where the very concept of "=" is fuzzy, which sadly is the world we live in. Our numbers are not mathematical numbers, they are IEEE numbers; our world is created by engineers, not by mathematicians.

@dnovatchev
Copy link
Contributor

Rest assured, I have no intention of trying to solve the halting problem. My aims are much more modest. Here is what I have in mind. This is a refinement of the text that we already have for describing "determinism" of functions.

@michaelhkay I admire your intent and effort.

Maybe my understanding is wrong, but it seems that even though you narrow the focus to demonstrable equality of functions, your proposal is for recognizing only some (and not all) cases when two functions are demonstrably equal (in other words, you provide a set of sufficient conditions for demonstrable equality, but they are not also the set of necessary conditions) .

Even if this is so, getting a positive result from the evaluation of the demonstrable equality of two functions may be something nice to have.

Yet we need to be aware of the possibility for false negatives, and it is not clear what getting a false actually means for the logic of an algorithm.

Also, it seems that this could be very challenging for the implementors (will they hate this feature? 😄 ) and users will only like it if it turns out to have reasonably small perceivable computational complexity.

I would be particularly interested to see the proposal for "Equivalence of captured context", in particular I have proposed in the past a function that returns the set of all "live" at the moment in-context variable names and their respective values.

@dnovatchev
Copy link
Contributor

@dnovatchev It's very easy to describe function equality as an abstract mathematical property of two mathematical functions. It's much harder to describe it as a quality that is computable given a pair of software engineering artefacts, especially in a world where the very concept of "=" is fuzzy, which sadly is the world we live in. Our numbers are not mathematical numbers, they are IEEE numbers; our world is created by engineers, not by mathematicians.

@michaelhkay Even if so, it is certainly good to try to be as close to the mathematical concepts, as possible.

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Feb 3, 2023

From an implementor perspective, many cases are actually quite easy, because they reduce to object identity in the underlying implementation. Obviously if two functions are represented by the same Java object then they are equal (or equivalent, which ever word we want to use). It's also easy enough, for example, to ensure that simple function references to context-free functions like count#1 return the same Java object every time; and similarly for closure-free inline functions like function($x, $y){$x+$y}. And we already have logic for common subexpression elimination, which means that comparing expressions for equivalence is something we have in the toolkit. Comparing closures and contexts, however, is something we don't currently attempt.

@dnovatchev
Copy link
Contributor

Comparing closures and contexts, however, is something we don't currently attempt.

Yes, and in the real world (for example take most existing Javascript code) functions using closures are an everyday fact of life.

@dnovatchev
Copy link
Contributor

From an implementor perspective, many cases are actually quite easy, because they reduce to object identity in the underlying implementation.

This is possible only within the same running process and only while it is running...

Maybe some more permanent form of "global function registry" would be useful?

This certainly requires a lot of newly-designed and created infrastructure that needs centralized resources and fail-tolerant maintenance (GUIDs as keys and DB as storage is the easier part of it).

But without such global support, this idea of "demonstrable equality" seems quite restricted.

@ChristianGruen
Copy link
Contributor

@ChristianGruen Seems you are wrong. Here is the definition of **[function equality]

We need to take care not to mix up mathematics and computer science. In our world, the name of a function can be relevant (which is why we have functions like fn:function-name), so I would expect to have both the function arity and function name included in equality checks, thus making the following three functions unequal even though they return the same result for abitrary input:

let $f1 := true#0
let $f2 := function() { true() }
let $f3 := function($x, $y, $z) { true() }
for $f at $p in ($f1, $f2, $f3)
return '$f' || $p || ': ' || function-name($f) || '#' || function-arity($f)

(: result :)
$f1: true#0
$f2: #0
$f3: #3

If we include results in the check, it is always the compiled and optimized result of an expression that will be considered in the equality check. For example, the following functions are mathematically equal, but implementations will inevitably come to different results (depending on, and among other choices, whether $x - $x was simplified to 0 at compile time):

function($x as xs:int) { $x - $x }
function($x as xs:int) { 0 }

I think a fundamental question is:

  1. Do we want as consistent results as possible across all implementations, or
  2. do we want to give implementations as much flexibility as possible to play off their strengths?

@dnovatchev
Copy link
Contributor

@ChristianGruen Seems you are wrong. Here is the definition of **[function equality]

We need to take care not to mix up mathematics and computer science. In our world, the name of a function can be relevant (which is why we have functions like fn:function-name), so I would expect to have both the function arity and function name included in equality checks, thus making the following three functions unequal even though they return the same result for abitrary input:

According to the true definition of function equality: "Two functions f and g are equal if their domain and codomain sets are the same ..."

Thus just using this (sub) criterion all four functions are not equal to each other, because they have different domains.

So, function arity plays the same important role in the mathematical definition of equality of functions.

let $f1 := true#0
let $f2 := function() { true() }
let $f3 := function($x, $y, $z) { true() }
for $f at $p in ($f1, $f2, $f3)
return '$f' || $p || ': ' || function-name($f) || '#' || function-arity($f)

(: result :)
$f1: true#0
$f2: #0
$f3: #3

If we include results in the check, it is always the compiled and optimized result of an expression that will be considered in the equality check. For example, the following functions are mathematically equal, but implementations will inevitably come to different results (depending on, and among other choices, whether $x - $x was simplified to 0 at compile time):

function($x as xs:int) { $x - $x }
function($x as xs:int) { 0 }

This is exactly why such "equality comparison" doesn't seem too-useful.

I think a fundamental question is:

  1. Do we want as consistent results as possible across all implementations, or

Yes, but this seems impossible unrealistic.

  1. do we want to give implementations as much flexibility as possible to play off their strengths?

No, because then any code that relies on comparisons of functions would generally yield different results across implementations.

The true solution is not to require the property of reflexivity on an artificially enlarged domain of a comparison function. The true solution is to exclude from the domain of the comparison function any anomalous values (function items, '±INF', and NaN).

@ChristianGruen
Copy link
Contributor

So as our scope remains XPath, what is your definition of domains and codomains in our language, and how do you differentiate between functions and function items?

@dnovatchev
Copy link
Contributor

So as our scope remains XPath, what is your definition of domains and codomains in our language, and how do you differentiate between functions and function items?

This is quite straight-forward: the Domain is the Cartesian product of the value spaces for the types of the arguments of the function. To put it simply, this is the set of all possible different tuples where the type of the Nth component of the tuple is the same as the type of the Nth parameter of the function definition.

The Codomain is the value space of the possible results -- that is all possible values whose type is the same as the return type of the function.

@line-o
Copy link
Contributor

line-o commented Feb 5, 2023

I tried to follow and understand this interesting discussion as best as I can.
The questions I have are:

  • Does a function that is said to be "deterministic" always considered pure, without side effects, or are these two separate concepts with a lot of overlap?
  • Can functions with side-effects ever be considered equal? Or are they specifically excluded from the original proposal?
  • Would it help to look at other implementations of function equality? In JavaScript two functions are considered equal if they point to the same function object. That is fundamentally different from the approach that they share the same name or domain and codomain. It is a limited but practical way of knowing if two functions are equal. This approach does include dynamic context as a function item created in a closure will have a separate pointer in memory.

There is one thing I am missing in this thread. That is the comparison of a function with a set. If both share the same domain and codomain they should also be considered equal. Maybe all posters agree to this implicitly, but I would like to explicitly mention it here, so we can be sure. As XQuery already considers maps as function items, is this proposal also including comparisons between function(xs:integer) as xs:string = map(s:integer, xs:string)?

@ChristianGruen
Copy link
Contributor

@line-o Thanks for your good questions.

  • Does a function that is said to be "deterministic" always considered pure, without side effects, or are these two separate concepts with a lot of overlap?

The XQFO 3.1 Specification defines determinism as follows:

[Definition] A function that is guaranteed to produce ·identical· results from repeated calls within a single ·execution scope· if the explicit and implicit arguments are identical is referred to as deterministic.

[Definition] A function that is not ·deterministic· is referred to as nondeterministic.

The standard function set includes various nondeterministic functions, but none of the functions are expected to have side effects. However, it may happen that requests to external instances, e.g., via fn:doc, trigger side-effecting behavior.

Things are getting more complicated if we include language extensions: The EXPath File Module is an example of a module which includes definitions of side-effecting functions that are simplistically tagged as nondeterministic. It would have been cleaner to differentiate between nondeterminism and side-effecting behavior, and an optimizer could certainly have profited if that distinction had been made, but back then we decided to restrict ourselves to the conventions of the W3 specifications.

Coming back to your original question, most side-effecting functions I have encountered are also nondeterministic, but I agree that a side-effecting functions could be regarded as deterministic if we can be sure that the caused side effects will never affect the output of the function. This is particularly true for functions that always return the same result (such as an empty sequence) and that are not expected to raise any errors.

  • Can functions with side-effects ever be considered equal? Or are they specifically excluded from the original proposal?

My perspective is that non-deterministic (and, thus, potentially side-effecting) function items are equal if their properties are equal. I would expect fn:random-number-generator#0, file:list#1 or local:function-with-nondeterministic-function-calls#0 to be regarded as equal to itself, even though their invocation might return different results.

  • Would it help to look at other implementations of function equality?

I think it does. Processors of XPath implementations may differ, and it may not always be sufficient to compare the identity of an object, but I think we shouldn’t try to compare different functions just because they may return the same output for the same input. If we try that, we won’t get identical results across different implementations.

is this proposal also including comparisons between function(xs:integer) as xs:string = map(s:integer, xs:string)?

I think we would need to do so if we tried to include the input/output (domain/codomain) of a function in the comparison.


I think it could help to differentiate between function items and function (calls) in this discussion: A function item as such is always deterministic; it’s simply an item. I believe that’s what’s already stated in 1.7.4 Properties of functions anyway, I’m just quoting this again because the discussion in the issue has focused a lot on the function body and the input/output (domain/codomain) of a function.

@dnovatchev
Copy link
Contributor

dnovatchev commented Feb 5, 2023

As XQuery already considers maps as function items, is this proposal also including comparisons between function(xs:integer) as xs:string = map(s:integer, xs:string)?

@line-o Thank you for the interesting questions!

I will address the above.

In case we adopt the limited definition of "demonstrable equality" of functions, then most probably the function-item on the left-side and the map on the right-side will have different pointers in memory (to them as items) and then, according to this limited (and I wonder if useful) definition, these will be "different".

On the other side, if we follow the established definition of function equality, we have simply to determine that:

  1. Both have the same domain, that is, the function on the left is defined exactly on values that belong to the key-set of the map on the right (on all of them and on nothing else).
  2. Given 1) is satisfied, then it must be true that $f(Kn) eq $m(Kn) for every key Kn ∈ map:keys($m)

It is easy to check condition 2), as this involves a finite number ( count(map:keys($m)) ) comparisons.

It is just a little-bit more tricky to verify 1). We can again easily check that the function $f is defined for all keys of $m, and one way to do this is to apply it on each of the keys of $m (if it isn't defined on any of these, we might get an error raised as result, and catch it). What seems a little bit more tricky is to show that $f is only defined on each of the keys of $m and on nothing else. For this purpose, we need to find at least one value $x that is not a key of the map $m, on which $f($x) is not an error.

If my proposal for total maps is approved, and $m is a total map, that is, it has some declared default value $d

for any $xmap:keys($m), then we will simply select another value $t, such that $t ∉ map:keys($m), and $t ≠ $d If $f($t) doesn't produce an error, then $t belongs to the domain of $f but not to the domain (key-set) of $m and thus **$f ≠ $m** .

If we don't have total maps in the language, or we do have this feature, but $m is not a total map (has no declared default value), then here is one heuristic that can produce reliable negative results (but still false-positives):

  1. If count(map:keys($m)) is n, let the keys be: K1, K2, ..., Kn.
  2. We define these n+1 values: V0, V1, V2, ..., Vn, such that Ki < Vi< Ki+1 for i ∈ [1, n-1] and V0 < K1, and Vn > Kn
  3. If applying $f on any of the values V0, V1, V2, ..., Vn doesn't raise an error, then this value does not belong to the keys of the map, but it is in the domain of the function. According to the definition of function equality, in this case we have proven that the function $f is not equal to the map $m.

Note: Here we silently accepted that the keys of $m can be ordered. If this is not the case, then the strategy is to simply pick values that aren't equal to any of the keys.

@michaelhkay
Copy link
Contributor Author

Resolved by acceptance of PR #525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement A change or improvement to an existing feature XPath An issue related to XPath
Projects
None yet
Development

No branches or pull requests

4 participants