Signature binary function (e.g. comparison) design #50

thorwhalen · 2023-01-05T12:04:24Z

There's a recurring need for more control over how we compare and/or merge signatures.

These often involve some logic on parameter objects (with their name, kind, and optionally default and annotation), order, etc.

We'd like to have a clean framework that enables the user to easily and correctly define these functions, reusing much of the common logic.

Examples:

comparison Boolean operators/functions such as:
- sig1 == sig2: equality (more precisely "equivalence relation") of two signatures
- sig1 <= sig2: comparison (more precisely "total order relation" as issubset, issubclass etc.)
operators/functions such as signature merging (sig1 + sig2)
function that gives a score or more complex information about how appropriate it is to feed the function with signature sig1 to an input value of function with signature sig2

Don't miss this comment that proposes a design.

Notes

Return type of a comparison

Must cases might just need a Boolean, like with x == y or x <= y (or issubset, issubclass, etc.).

But in some cases, such as with self-organizing code (or less crazily: validation scores), we'd like to have a numerical (or more generally, vectorial) score, or more generally, some "comparable" type that can be used to do things like "best match".

Use a key function or a binary comparison function as the specifier?

Might want to express comparisons based on key functions like builtin sorted, max etc. do. What are the pros/cons? It seems like a cleaner design enabling easier UX (and obviously, a big pro is the interface-consistency with familiar builtins). But, for example, is it complete? Are there things we cannot do with key functions that we can with the usual binary operator way (specifying comparison function via a two-element boolean function such as compare(x, y)).

For example, how would comparison scores be derived from key functions? Even if it's possible to "fit" any comparison function with vectorial key functions, would it be the most intuitive way of expressing it?

References

Related issues

Dag.ch_func and signature comparison
Enable code_to_dag to use wrapped function signature for the dag signature it produces, which could be a convenient interface for the user to get over the restrictive signature comparison when using code_to_dag. But that's only for code_to_dag. Still, the long issue has some reflections about signature comparison.
The right way to parametrize comparison operators, though the subject is about using operators to compare signatures, also has some relevant ideas.
Need a more powerful and clean signature merging functionality: Indeed, we encounter the signature comparison concern again when we do signature merging (union). There are useful pieces of code around the problem that one could collect to get a better idea of the problem.
Write a general parameters merging function: an older signature merging issue, possibly duplicate.
Revise, refactor and robust(ize) function normalization tools: Because signature comparison comes up in the situations described there.
Rethink what i2.Sig instances should do, or not do also contains some analysis around the subject of signature comparison
The "Binding changes when using ch_funcs" was perceived as a bug, but wasn't. The clarification points yet once more to the need of proper signature comparison (and error messages for them).

Further notes

The definition of signature equality

Sig doesn't redefine it -- the builtin is used, which boils down to using the following (acts basically as a "key function"):

    def _hash_basis(self):
        params = tuple(param for param in self.parameters.values()
                             if param.kind != _KEYWORD_ONLY)

        kwo_params = {param.name: param for param in self.parameters.values()
                                        if param.kind == _KEYWORD_ONLY}

        return params, kwo_params, self.return_annotation

The text was updated successfully, but these errors were encountered:

sylvainbonnot · 2023-01-05T15:27:22Z

About the question: "Are there things we cannot do with key functions that we can with the usual binary operator way (specifying comparison function via a two-element boolean function such as compare(x, y))?", the answer should be "No" because of the existence of functools.cmp_to_key(func).
Transform an old-style comparison function to a key function. See the doc: doc for cmp_to_key
A comparison function is any callable that accepts two arguments, compares them, and returns a negative number for less-than, zero for equality, or a positive number for greater-than. A key function is a callable that accepts one argument and returns another value to be used as the sort key.

sylvainbonnot · 2023-01-05T15:31:43Z

The code of cmp_to_key is nothing more than:

def cmp_to_key(mycmp):
    """Convert a cmp= function into a key= function"""
    class K(object):
        __slots__ = ['obj']
        def __init__(self, obj):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        __hash__ = None
    return K

thorwhalen · 2023-01-05T16:47:38Z

Excellent! That's going to be a valuable thing for simplifying the design!

thorwhalen · 2023-01-13T11:03:48Z

I'm not managing to come up with a design that I'm satisfied with.

I'll do some more work on this, but am thinking that, for now, I won't solve the issue as a mini-framework like I was hoping, but rather collect a few useful tools around the issue. (Perhaps, this is the right design I've been looking for.) Here are my thoughts about these.

At the very top level, some contexts will call for a signature binary function (SBF). That is, a Callable[[Signature, Signature], Any] (where; typing.Callable and inspect.Signature). The output is often a bool (example signatures_are_compatible) or a Signature (example, merge or bust). The first thing we could do for this issue is

Task: extract what ever logic is hardcoded in our code and percolate it to the surface (as a function's argument).

By doing this, we're keeping things as they are, while making them open-closed for this issue. A user can always change the behavior by making a different SBF.

From there, there's a temptation to go further, because we'll quickly realize that there's a lot of reuse potential when making these SBFs. I propose we don't go further (until the need calls for it), except perhaps documenting the problem patterns we see, and possibly making functions for these (but without trying to fit these in a framework).

Here are some problem patterns I see:

matching params

Essentially, adding a layer of information (or structuring, annotating) about the params of each of the signatures (and possibly the return_annotations). For example,

In strict equality, we assume the both have the same number of params and each pair is compared (in a strict sense) on the basis of their position in the list
A looser equality might perform this matching not based on position, but on name (this would be more convenient for meshed.DAGs for example, whose bindings are based on param names, but not (once the meshed.FuncNodes are initialized) on position
When merging two signatures (the way it is done now), matching is made on a name-basis and same-name params must be equal in the strictest sense (i.e. also equal in kind, default and annotation). But here, we can have non-matched params on both sides, which will then (as a separate concern) be incorporated in the final aggregate signature.

param binary functions

Several times in the above, we forwarded the problem down to the "pairs of params" level.
This means that there an implicit Callable[[Parameter, Parameter], Any] (param comparison, or more generally, param binary function) going on. The default one is usually strict: That is "values must be equal for each of the four inspect.Parameter attributes: .name, .kind, .annotation, .default. But here again, this strictness might be in the way of expressivity.
Here are some example situations:

We might not care about the kind because we're calling our functions with i2.call_forgivingly anyway, as in a DAG
When merging, we might decide that only cases where the two annotations or two defaults are non-empty (and conflicting) are problematic, because if only one of them is empty, our protocol is to take the value of the non-empty one.

param attribute binary functions

Most of the time, our param binary function will really be just an aggregate (e.g. all for Booleans, or sum for scores) of functions applied to each of the four attributes individually. Even here, when we said "equality", we differed to what ever that meant for each of the values of the attributes. There's definitely cases where we want to control that.
In fact, probably most of the time it will be through this definition (plus an aggregate function) that we'll define our param binary functions. Here are some examples:

When using types as annotations, do we really want to use equality? Might the issubclass be a better idea?
The "only two conflicting non-empties" case mentioned in the last section is really a concern at the param attribute level. Here we're saying, return True if any of the attributes are empty.
Only some pairs of kinds would be problematic if not equal.
In some cases, we might not need strict string equality for matching names, but "both members of a same synset (set of semantically equivalent words)" is enough

The following code (meant to be used as a factory that makes param binary functions through functool.partial) might do the trick for param binary functions through the definition of functions over their attributes:

from operator import eq

def param_binary_func(
    param1, param2, *, name=eq, kind=eq, default=eq, annotation=eq, aggreg=all
):
    return aggreg((
        name(param1.name, param2.name),
        kind(param1.kind, param2.kind),
        default(param1.default, param2.default),
        annotation(param1.annotation, param2.annotation),
    ))

Note that the same param_binary_func can be used to produce precise error messages.
The way to do this is to have the attribute binary functions produce some info dicts
that can then be aggregated in aggreg to produce a final error message (or even
a final error object, which can even be raised) if there is indeed a mismatch at all.
Further more, we might want to make a function that will take a parametrized
param_binary_func and produce such a error raising function from it, using the
specific functions (extracted by Sig) to produce the error message.

thorwhalen · 2023-01-13T12:37:00Z

Though "binary function" is more precisely indicates the generality we need here, I'm wondering if we should sinfully just call it "comparator" since:

it's not an aggregate word like "binary function", so no need for underscore in variable names
it's a fairly known concept
often (but not always) our binary functions will indeed be comparators (or we can say that the object they return can themselves be endowed with order through a key function).
Cons:
in some cases the output REALLY doesn't feel like a comparator.

Perhaps we can call it "collator" and "comparison->collation"?

I really insist on finding a non-composite word because things get out of hand when adding other words to it.

Any opinions?

thorwhalen · 2023-01-20T14:01:25Z

Regarding using the parameter comparators to make signature comparators, one way is to do:

graph
    a(sig1) --> matcher
    b(sig2) --> matcher
    matcher --> matches(matches)
    matcher --> remainder(remainder)
    matches -- for each match --> param_comparator
    param_comparator -- collect --> comparison(comparison)
    comparison --> aggreg
    remainder --> aggreg
    aggreg --> output(final output or error)

Or something like that.

But, it could also be:

graph
    sig1 --> cartesian_product
    sig2 --> cartesian_product
    cartesian_product -- for each pair --> param_comparator
    param_comparator --> comparisons_matrix(comparisons_matrix)
    comparisons_matrix --> reduction
    reduction --> result(result)

The latter is cleaner, but quite wasteful, and maybe harder to express some simple cases (our most common cases).

thorwhalen added the design discussion To discuss design options, pros and cons, etc. label Jan 5, 2023

thorwhalen mentioned this issue Jan 5, 2023

Dag.ch_func and signature comparison i2mint/meshed#45

Closed

thorwhalen assigned thorwhalen, sylvainbonnot and valentin-feron Jan 5, 2023

thorwhalen added a commit that referenced this issue Jan 6, 2023

feat: Experimental work on issue #50

f0f2dd1

sync-by-unito bot unassigned thorwhalen Jan 20, 2023

i2mint locked and limited conversation to collaborators Apr 17, 2023

thorwhalen converted this issue into discussion #63 Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Signature binary function (e.g. comparison) design #50

Signature binary function (e.g. comparison) design #50

thorwhalen commented Jan 5, 2023 •

edited

sylvainbonnot commented Jan 5, 2023

sylvainbonnot commented Jan 5, 2023

thorwhalen commented Jan 5, 2023

thorwhalen commented Jan 13, 2023 •

edited

thorwhalen commented Jan 13, 2023 •

edited

thorwhalen commented Jan 20, 2023 •

edited

This issue was moved to a discussion.

This issue was moved to a discussion.

Signature binary function (e.g. comparison) design #50

Signature binary function (e.g. comparison) design #50

Comments

thorwhalen commented Jan 5, 2023 • edited

Notes

Return type of a comparison

Use a key function or a binary comparison function as the specifier?

References

Related issues

Further notes

The definition of signature equality

sylvainbonnot commented Jan 5, 2023

sylvainbonnot commented Jan 5, 2023

thorwhalen commented Jan 5, 2023

thorwhalen commented Jan 13, 2023 • edited

matching params

param binary functions

param attribute binary functions

thorwhalen commented Jan 13, 2023 • edited

thorwhalen commented Jan 20, 2023 • edited

This issue was moved to a discussion.

thorwhalen commented Jan 5, 2023 •

edited

thorwhalen commented Jan 13, 2023 •

edited

thorwhalen commented Jan 13, 2023 •

edited

thorwhalen commented Jan 20, 2023 •

edited