-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equality of function items #333
Comments
Maybe a stupid question, but why is this capability necessary? Just to achieve the property of reflexivity for the comparison operation? But is it justified trying to achieve this at any cost? Isn't it OK to postulate that the comparison of two functions of which at least one is neither an array or a map, is by default, or always, I think this is better than trying to do the impossible (how would one compare two functions that happen to be external objects, that is just some binary code compiled from Java, C#, ..., etc. ?) Even if the source code is available, the same source code compiled with different compiler options (such as full, partial or no optimization, index boundary checking, etc.) results in different executable (binary) code instances for that could produce different results on the same tuple of arguments. It was "heroic" in op:same-key() to postulate that NaN is equal to NaN, and the same for two INF values, but we all know that this isn't true ... I believe that it would be better than that to limit the value-space for the parameters of fn:deep-equal() to sequences that don't (deeply)contain function items, which are neither maps or arrays, and to say that this value space could be extended (when an option is specified as an entry in the $options argument) with such functions, but then the comparison automatically returns false(). This will also save a lot of Of course, we could also have another such entry in the $options argument, which when specified obliges the implementation to try to determine the "equality" of two such functions. But this must not be the default behavior, or otherwise this would result in compatibility issues with the previous versions of XPath. |
Yes, reflexivity is important. There's the challenge of saying what it means for functions to be deterministic ("Same arguments, same result" requires a definition of "same"). And a deep-equal that delivers an error-free boolean result over the entire value space can be useful. In the definition of determinism we do make an attempt to define what it means for two functions to be the same, but it's a rather feeble attempt, and I would like to see if we can improve it. There will always be some kind of recourse to things being implementation-defined (everything relating to external functions is already implementation-defined, so that's not a problem). |
It is a flaw to believe that NaN is equal to NaN, and Infinity is equal to Infinity. This is like saying that the Earth is flat or that the Sun rotates around the Earth. I will never support tweaking the facts / laws of mathematics and physics simply to make some "beautiful" theory seem true. I believe that there is a better way - just exclude the anomalous values from the Domain of the comparison-function. We need an item type, let me call it (initially, due to lack of imagination) xs:trueNumeric which is the current value-space for xs:numeric (the union of all numeric types), from which 'NaN', '-INF' and 'INF' are removed. Then provide fn:atomic-equal with an $options argument with a possible entry that makes it accept only xs:trueNumeric values. Apply the same approach to fn:deep-equal - have an option that excludes function items from the allowed domains for the (deep)members of the two sequences; also have a similar option that excludes 'NaN', '-INF' and 'INF' Only then, the comparison function becomes truly reflexive. |
I like the attempt to try to compare everything that’s possible to compare, provided that the behavior is well-defined (which is what we are trying to achieve). For more complex items, I think it’s both intuitive and justifiable to argue that such comparisons do not succeed, or may succeed in a future version of the language.
It’s also a question of definition. The floating-point placeholder value An endless number of solutions in computer science can be regarded as pragmatic tradeoff, particularly if we consider the technical limitations when things were introduced. It may be seen as a flaw that If we don’t want to work with |
Arguing about whether NaN is equal to NaN is like arguing about whether 0^0 should be 0 or 1. It's entirely a matter of convenience. Saying that you're not allowed to ask the question is one solution, certainly, but not always the best one. |
But someone may not sleep well, because then we lose the property of reflexivity 😄 So we have two possible solutions here:
@ChristianGruen I see that you and I agree that solution 2 above is a good way to deal with the anomalies. I think that we should not allow '- INF' , 'INF', '+INF' and NaN as values of keys of a map. This is the good way to get the dream of Reflexivity fulfilled, without needing ghost values, or the equivalents of "dark matter" in physics. |
If there is no one good answer, then why should we accept one? This is not forbidding the question, but realizing that the answer is a paradox -- like we have such in set theory |
Should we stop communicating because Derrida’s concept of différance compellingly suggests that communication between mankind is doomed from the ground up? |
C'mon, Not using NaN and '+-Inf' values in value comparisons is not the same as "we stop communicating" . It is more like we do not spend time arguing how many angels can fit on the tip of a needle 😄 Also, such values can really be used for internal purposes, if they are not in the domain of comparison functions. In the past I did point out, and more recently @michaelhkay agreed, that we do need to have a few reserved values, and these make the perfect candidates. |
I would be pretty surprised if Are there programming languages that behave like this? |
How can one say that two infinities with different cardinality are equal? Like compare the number of all points in a segment with the number of those points that represent integers (or rational numbers). We know well that the first number is greater than the latter, that is, there is no 1:1 mapping between the two sets. |
I’m the wrong to ask; other people much smarter than me have spent more time and mental resources on that many decades ago. You might find some helpful reasoning in the Wikipedia article on floating-point arithmetic. |
Even if we have in Python:
we still have:
Do note, the result is not 0 ! Thus, we end up with two numbers $x that we say are equal: |
This resource doesn't even mention a comparison operation, not to speak of comparing Inf to Inf. When we have a paradox, we could choose a convenient interpretation, but when we have facts, it is simply wrong/unacceptable to choose an interpretation that is not supported by the facts, and whose incorrectness is confirmed by the facts. Thus it is possible to have infinity1 with bigger cardinality that infinity2. These two cannot be equal, the one with greater cardinality is bigger. There are more points on any segment of a line than the points that represent rational numbers on the whole line. Both sets have infinite number of elements, but the first number is bigger (not equal to) than the second number. |
Sure. That's how floating-point arithmetic has been defined. All that is not specific to XPath. |
If we define two functions to be "equal" when they have the same Domain and Codomain and for every specific argument tuple that is in the Domain they return the same value in the Codomain, then functions are generally not comparable, unless there is a proof of such equality. This follows from the Halting problem. Even if we have a tool with which we can run the two functions on all possible argument tuples from the Domain, this tool cannot decide that the function execution will ever halt, thus it would be impossible in practice to get and compare the results. Notable exception is when the Domain is finite, and this corresponds to maps/arrays in XPath. Languages in which functions can be proven to be equal (Such as Coq) typically yield some of their expressive power in order to achieve this capability -- for example such languages are not Turing-complete. But in our case all three X-languages are Turing complete. |
I would use your definition to describe the equivalence, not equality, of two functions. Otherwise, we would need to exclude non-deterministic and side-effecting function items such as Completeness cannot be achieved either way, though. |
Rest assured, I have no intention of trying to solve the halting problem. My aims are much more modest. Here is what I have in mind. This is a refinement of the text that we already have for describing "determinism" of functions. Two function items F and G are defined to be demonstrably equal if it is possible to establish that for any supplied arguments, F and G will return the same result (or raise the same error). This specification describes some situations in which two functions are demonstrably equal; implementations may define other situations in which they are demonstrably equal, provided that two function items must only be deemed demonstrably equal if they can be guaranteed to be interchangeable in this sense. Here are some examples of pairs of function items $F and $G that are demonstrably equal:
The above examples illustrate a general principle that if the output of an expression includes a function item that was present in the input, and the semantics of the expression do not modify the function item in any way, then the function item in the output is demonstrably equal to the function item in the input.
The above examples illustrate a general principle that if two function items are constructed in the same way, then they are demonstrably equal to each other. An implementation may also (but is not required to) consider the following pairs of functions to be demonstrably equal:
(Note: this assumes we don't do anything to make parameter names in function items significant to the caller)
The above examples illustrate that we are not concerned with exact lexical equivalence of the function bodies. Generally, a parser will make some simplifications or normalisations of an expression as it is processed, which might include skipping whitespace and comments, expanding abbreviations, perhaps renaming variables or expanding namespace prefixes. The primary intent here is to allow implementors to normalize expressions before comparing them. It is not expected that any advanced techniques for detecting the equivalence of different algorithms will be invoked (though it is not precluded). The following pair of functions must not be considered demonstrably equal, because they have different effects when the supplied arguments are an xs:dateTime value and an xs:duration value:
In practical terms, there are two ways in which equivalence of two function items can be demonstrated. In very many cases it is possible to do it simply by tracking the provenance of the function item. If the item isn't changed, then it remains the same: the output is demonstrably equal to the input. In the general case, however, function items can be compared by examining their properties, and this approach is described below. The situations in which the specification mandates that two functions F and G are demonstrably equal are described below. To avoid confusion we will use the term "implementation" to refer to the XPath/XQuery language processor, and "function body" to refer to the "implementation" property of a function item.
N - Equivalence of function signatures. (This is relatively easy. It's tackled, for example, in defining the rules for overriding functions in xsl:override in XSLT) P - Equivalence of function bodies. Function bodies can be classified into 3 categories: (a) User-written functions whose body is written in XPath, XQuery, XSLT, or some other language known to the processor (b) Built-in functions whose implementation is internal to the processor (c) External functions whose implementation is external to the processor, where the processor has the ability to invoke the function but not to inspect its internal logic In the first category, two user written function bodies are demonstrably equivalent if they have identical source expressions and identical static contexts. A processor may also determine that two such function bodies are equivalent if: (a) any differences in the static context can be shown to be immaterial, and/or In the second category, the bodies of built-in functions having the same name and arity are deemed equivalent. (Note: internally, a processor might have different algorithms for implementing a function such as fn:contains depending say on whether the collation is known statically. For the purpose of this discussion however, these different algorithms all constitute parts of a single function body.) In the third category, external functions are largely implementation-defined, and the same applies to considerations of their equivalence. If any expression takes one function item as input and produces another in its result, and the rules for the expression say that the function body of the result is the same as the function body of the input, then the input and output function items are deemed to have the same function body. This applies trivially to operations such as filter expressions that leave items unchanged. It also applies to more complex operations, for example, a partial application of a function has the same function body as its input (it differs in the nonlocal variable bindings). Q Equivalence of captured contexts TBA |
@ChristianGruen Seems you are wrong. Here is the definition of function equality as per Wikipedia: "Two functions f and g are equal if their domain and codomain sets are the same and their output values agree on the whole domain. More formally, given f: X → Y and g: X → Y, we have f = g if and only if f(x) = g(x) for all x ∈ X" |
@dnovatchev It's very easy to describe function equality as an abstract mathematical property of two mathematical functions. It's much harder to describe it as a quality that is computable given a pair of software engineering artefacts, especially in a world where the very concept of "=" is fuzzy, which sadly is the world we live in. Our numbers are not mathematical numbers, they are IEEE numbers; our world is created by engineers, not by mathematicians. |
@michaelhkay I admire your intent and effort. Maybe my understanding is wrong, but it seems that even though you narrow the focus to demonstrable equality of functions, your proposal is for recognizing only some (and not all) cases when two functions are demonstrably equal (in other words, you provide a set of sufficient conditions for demonstrable equality, but they are not also the set of necessary conditions) . Even if this is so, getting a positive result from the evaluation of the demonstrable equality of two functions may be something nice to have. Yet we need to be aware of the possibility for false negatives, and it is not clear what getting a false actually means for the logic of an algorithm. Also, it seems that this could be very challenging for the implementors (will they hate this feature? 😄 ) and users will only like it if it turns out to have reasonably small perceivable computational complexity. I would be particularly interested to see the proposal for "Equivalence of captured context", in particular I have proposed in the past a function that returns the set of all "live" at the moment in-context variable names and their respective values. |
@michaelhkay Even if so, it is certainly good to try to be as close to the mathematical concepts, as possible. |
From an implementor perspective, many cases are actually quite easy, because they reduce to object identity in the underlying implementation. Obviously if two functions are represented by the same Java object then they are equal (or equivalent, which ever word we want to use). It's also easy enough, for example, to ensure that simple function references to context-free functions like |
Yes, and in the real world (for example take most existing Javascript code) functions using closures are an everyday fact of life. |
This is possible only within the same running process and only while it is running... Maybe some more permanent form of "global function registry" would be useful? This certainly requires a lot of newly-designed and created infrastructure that needs centralized resources and fail-tolerant maintenance (GUIDs as keys and DB as storage is the easier part of it). But without such global support, this idea of "demonstrable equality" seems quite restricted. |
We need to take care not to mix up mathematics and computer science. In our world, the name of a function can be relevant (which is why we have functions like let $f1 := true#0
let $f2 := function() { true() }
let $f3 := function($x, $y, $z) { true() }
for $f at $p in ($f1, $f2, $f3)
return '$f' || $p || ': ' || function-name($f) || '#' || function-arity($f)
(: result :)
$f1: true#0
$f2: #0
$f3: #3 If we include results in the check, it is always the compiled and optimized result of an expression that will be considered in the equality check. For example, the following functions are mathematically equal, but implementations will inevitably come to different results (depending on, and among other choices, whether function($x as xs:int) { $x - $x }
function($x as xs:int) { 0 } I think a fundamental question is:
|
According to the true definition of function equality: "Two functions f and g are equal if their domain and codomain sets are the same ..." Thus just using this (sub) criterion all four functions are not equal to each other, because they have different domains. So, function arity plays the same important role in the mathematical definition of equality of functions.
This is exactly why such "equality comparison" doesn't seem too-useful.
Yes, but this seems
No, because then any code that relies on comparisons of functions would generally yield different results across implementations. The true solution is not to require the property of reflexivity on an artificially enlarged domain of a comparison function. The true solution is to exclude from the domain of the comparison function any anomalous values (function items, '±INF', and NaN). |
So as our scope remains XPath, what is your definition of domains and codomains in our language, and how do you differentiate between functions and function items? |
This is quite straight-forward: the Domain is the Cartesian product of the value spaces for the types of the arguments of the function. To put it simply, this is the set of all possible different tuples where the type of the Nth component of the tuple is the same as the type of the Nth parameter of the function definition. The Codomain is the value space of the possible results -- that is all possible values whose type is the same as the return type of the function. |
I tried to follow and understand this interesting discussion as best as I can.
There is one thing I am missing in this thread. That is the comparison of a function with a set. If both share the same domain and codomain they should also be considered equal. Maybe all posters agree to this implicitly, but I would like to explicitly mention it here, so we can be sure. As XQuery already considers maps as function items, is this proposal also including comparisons between |
@line-o Thanks for your good questions.
The XQFO 3.1 Specification defines determinism as follows:
The standard function set includes various nondeterministic functions, but none of the functions are expected to have side effects. However, it may happen that requests to external instances, e.g., via Things are getting more complicated if we include language extensions: The EXPath File Module is an example of a module which includes definitions of side-effecting functions that are simplistically tagged as nondeterministic. It would have been cleaner to differentiate between nondeterminism and side-effecting behavior, and an optimizer could certainly have profited if that distinction had been made, but back then we decided to restrict ourselves to the conventions of the W3 specifications. Coming back to your original question, most side-effecting functions I have encountered are also nondeterministic, but I agree that a side-effecting functions could be regarded as deterministic if we can be sure that the caused side effects will never affect the output of the function. This is particularly true for functions that always return the same result (such as an empty sequence) and that are not expected to raise any errors.
My perspective is that non-deterministic (and, thus, potentially side-effecting) function items are equal if their properties are equal. I would expect
I think it does. Processors of XPath implementations may differ, and it may not always be sufficient to compare the identity of an object, but I think we shouldn’t try to compare different functions just because they may return the same output for the same input. If we try that, we won’t get identical results across different implementations.
I think we would need to do so if we tried to include the input/output (domain/codomain) of a function in the comparison. I think it could help to differentiate between function items and function (calls) in this discussion: A function item as such is always deterministic; it’s simply an item. I believe that’s what’s already stated in 1.7.4 Properties of functions anyway, I’m just quoting this again because the discussion in the issue has focused a lot on the function body and the input/output (domain/codomain) of a function. |
@line-o Thank you for the interesting questions! I will address the above. In case we adopt the limited definition of "demonstrable equality" of functions, then most probably the function-item on the left-side and the map on the right-side will have different pointers in memory (to them as items) and then, according to this limited (and I wonder if useful) definition, these will be "different". On the other side, if we follow the established definition of function equality, we have simply to determine that:
It is easy to check condition 2), as this involves a finite number ( It is just a little-bit more tricky to verify 1). We can again easily check that the function If my proposal for total maps is approved, and for any If we don't have total maps in the language, or we do have this feature, but
Note: Here we silently accepted that the keys of |
Resolved by acceptance of PR #525 |
The question of equality of function items arises in the discussion of determinism of functions and memo functions in XSLT - see F&O 3.1 section 1.7.4, and came up again today in the context of fn:deep-equal.
1.7.4 makes a brave attempt to describe situations under which two functions are "identical", though leaving implementations room for flexibility. I think we can build on this and improve it, by describing more situations in which the result is predictable.
The data model describes the properties of a function item, and we can say that two function items are equivalent if all their properties are the same.
The properties that cause problems are the "implementation" and the "closure", and in both cases I think we can find ways of doing a comparison.
For the implementation, we can define this by reference to the way in which the implementation property is set. For function items constructed by reference to static functions (e.g.
my:func#3
orfunction-lookup(my:func, 3)
) then they have the same implementation if and only if they are constructed by reference to the same static function. Similarly for function items constructed by evaluating an inline function expression. Other ways of constructing a function item, such as partial application, essentially create a new function with the same implementation as an existing function and a different closure.For the closure (ignoring for the moment functions that include parts of the dynamic context in their closure), this is essentially just a set of variable bindings and it's not too difficult to say that functions are identical if these sets of variable bindings are identical.
The text was updated successfully, but these errors were encountered: