Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signature binary function (e.g. comparison) design #50

Closed
thorwhalen opened this issue Jan 5, 2023 · 6 comments
Closed

Signature binary function (e.g. comparison) design #50

thorwhalen opened this issue Jan 5, 2023 · 6 comments
Assignees
Labels
design discussion To discuss design options, pros and cons, etc.

Comments

@thorwhalen
Copy link
Member

thorwhalen commented Jan 5, 2023

There's a recurring need for more control over how we compare and/or merge signatures.

These often involve some logic on parameter objects (with their name, kind, and optionally default and annotation), order, etc.

We'd like to have a clean framework that enables the user to easily and correctly define these functions, reusing much of the common logic.

Examples:

  • comparison Boolean operators/functions such as:
  • operators/functions such as signature merging (sig1 + sig2)
  • function that gives a score or more complex information about how appropriate it is to feed the function with signature sig1 to an input value of function with signature sig2

Don't miss this comment that proposes a design.

Notes

Return type of a comparison

Must cases might just need a Boolean, like with x == y or x <= y (or issubset, issubclass, etc.).

But in some cases, such as with self-organizing code (or less crazily: validation scores), we'd like to have a numerical (or more generally, vectorial) score, or more generally, some "comparable" type that can be used to do things like "best match".

Use a key function or a binary comparison function as the specifier?

Might want to express comparisons based on key functions like builtin sorted, max etc. do. What are the pros/cons? It seems like a cleaner design enabling easier UX (and obviously, a big pro is the interface-consistency with familiar builtins). But, for example, is it complete? Are there things we cannot do with key functions that we can with the usual binary operator way (specifying comparison function via a two-element boolean function such as compare(x, y)).

For example, how would comparison scores be derived from key functions? Even if it's possible to "fit" any comparison function with vectorial key functions, would it be the most intuitive way of expressing it?

References

Related issues

Further notes

The definition of signature equality

Sig doesn't redefine it -- the builtin is used, which boils down to using the following (acts basically as a "key function"):

    def _hash_basis(self):
        params = tuple(param for param in self.parameters.values()
                             if param.kind != _KEYWORD_ONLY)

        kwo_params = {param.name: param for param in self.parameters.values()
                                        if param.kind == _KEYWORD_ONLY}

        return params, kwo_params, self.return_annotation
@sylvainbonnot
Copy link
Contributor

About the question: "Are there things we cannot do with key functions that we can with the usual binary operator way (specifying comparison function via a two-element boolean function such as compare(x, y))?", the answer should be "No" because of the existence of functools.cmp_to_key(func).
Transform an old-style comparison function to a key function. See the doc: doc for cmp_to_key
A comparison function is any callable that accepts two arguments, compares them, and returns a negative number for less-than, zero for equality, or a positive number for greater-than. A key function is a callable that accepts one argument and returns another value to be used as the sort key.

@sylvainbonnot
Copy link
Contributor

The code of cmp_to_key is nothing more than:

def cmp_to_key(mycmp):
    """Convert a cmp= function into a key= function"""
    class K(object):
        __slots__ = ['obj']
        def __init__(self, obj):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        __hash__ = None
    return K

@thorwhalen
Copy link
Member Author

Excellent! That's going to be a valuable thing for simplifying the design!

thorwhalen added a commit that referenced this issue Jan 6, 2023
@thorwhalen
Copy link
Member Author

thorwhalen commented Jan 13, 2023

I'm not managing to come up with a design that I'm satisfied with.

I'll do some more work on this, but am thinking that, for now, I won't solve the issue as a mini-framework like I was hoping, but rather collect a few useful tools around the issue. (Perhaps, this is the right design I've been looking for.) Here are my thoughts about these.

At the very top level, some contexts will call for a signature binary function (SBF). That is, a Callable[[Signature, Signature], Any] (where; typing.Callable and inspect.Signature). The output is often a bool (example signatures_are_compatible) or a Signature (example, merge or bust). The first thing we could do for this issue is

Task: extract what ever logic is hardcoded in our code and percolate it to the surface (as a function's argument).

By doing this, we're keeping things as they are, while making them open-closed for this issue. A user can always change the behavior by making a different SBF.

From there, there's a temptation to go further, because we'll quickly realize that there's a lot of reuse potential when making these SBFs. I propose we don't go further (until the need calls for it), except perhaps documenting the problem patterns we see, and possibly making functions for these (but without trying to fit these in a framework).

Here are some problem patterns I see:

matching params

Essentially, adding a layer of information (or structuring, annotating) about the params of each of the signatures (and possibly the return_annotations). For example,

  • In strict equality, we assume the both have the same number of params and each pair is compared (in a strict sense) on the basis of their position in the list
  • A looser equality might perform this matching not based on position, but on name (this would be more convenient for meshed.DAGs for example, whose bindings are based on param names, but not (once the meshed.FuncNodes are initialized) on position
  • When merging two signatures (the way it is done now), matching is made on a name-basis and same-name params must be equal in the strictest sense (i.e. also equal in kind, default and annotation). But here, we can have non-matched params on both sides, which will then (as a separate concern) be incorporated in the final aggregate signature.

param binary functions

Several times in the above, we forwarded the problem down to the "pairs of params" level.
This means that there an implicit Callable[[Parameter, Parameter], Any] (param comparison, or more generally, param binary function) going on. The default one is usually strict: That is "values must be equal for each of the four inspect.Parameter attributes: .name, .kind, .annotation, .default. But here again, this strictness might be in the way of expressivity.
Here are some example situations:

  • We might not care about the kind because we're calling our functions with i2.call_forgivingly anyway, as in a DAG
  • When merging, we might decide that only cases where the two annotations or two defaults are non-empty (and conflicting) are problematic, because if only one of them is empty, our protocol is to take the value of the non-empty one.

param attribute binary functions

Most of the time, our param binary function will really be just an aggregate (e.g. all for Booleans, or sum for scores) of functions applied to each of the four attributes individually. Even here, when we said "equality", we differed to what ever that meant for each of the values of the attributes. There's definitely cases where we want to control that.
In fact, probably most of the time it will be through this definition (plus an aggregate function) that we'll define our param binary functions. Here are some examples:

  • When using types as annotations, do we really want to use equality? Might the issubclass be a better idea?
  • The "only two conflicting non-empties" case mentioned in the last section is really a concern at the param attribute level. Here we're saying, return True if any of the attributes are empty.
  • Only some pairs of kinds would be problematic if not equal.
  • In some cases, we might not need strict string equality for matching names, but "both members of a same synset (set of semantically equivalent words)" is enough

The following code (meant to be used as a factory that makes param binary functions through functool.partial) might do the trick for param binary functions through the definition of functions over their attributes:

from operator import eq

def param_binary_func(
    param1, param2, *, name=eq, kind=eq, default=eq, annotation=eq, aggreg=all
):
    return aggreg((
        name(param1.name, param2.name),
        kind(param1.kind, param2.kind),
        default(param1.default, param2.default),
        annotation(param1.annotation, param2.annotation),
    ))

Note that the same param_binary_func can be used to produce precise error messages.
The way to do this is to have the attribute binary functions produce some info dicts
that can then be aggregated in aggreg to produce a final error message (or even
a final error object, which can even be raised) if there is indeed a mismatch at all.
Further more, we might want to make a function that will take a parametrized
param_binary_func and produce such a error raising function from it, using the
specific functions (extracted by Sig) to produce the error message.

@thorwhalen
Copy link
Member Author

thorwhalen commented Jan 13, 2023

Though "binary function" is more precisely indicates the generality we need here, I'm wondering if we should sinfully just call it "comparator" since:

  • it's not an aggregate word like "binary function", so no need for underscore in variable names
  • it's a fairly known concept
  • often (but not always) our binary functions will indeed be comparators (or we can say that the object they return can themselves be endowed with order through a key function).
    Cons:
  • in some cases the output REALLY doesn't feel like a comparator.

Perhaps we can call it "collator" and "comparison->collation"?

I really insist on finding a non-composite word because things get out of hand when adding other words to it.

Any opinions?

@thorwhalen
Copy link
Member Author

thorwhalen commented Jan 20, 2023

Regarding using the parameter comparators to make signature comparators, one way is to do:

graph
    a(sig1) --> matcher
    b(sig2) --> matcher
    matcher --> matches(matches)
    matcher --> remainder(remainder)
    matches -- for each match --> param_comparator
    param_comparator -- collect --> comparison(comparison)
    comparison --> aggreg
    remainder --> aggreg
    aggreg --> output(final output or error)

Or something like that.

But, it could also be:

graph
    sig1 --> cartesian_product
    sig2 --> cartesian_product
    cartesian_product -- for each pair --> param_comparator
    param_comparator --> comparisons_matrix(comparisons_matrix)
    comparisons_matrix --> reduction
    reduction --> result(result)

The latter is cleaner, but quite wasteful, and maybe harder to express some simple cases (our most common cases).

@i2mint i2mint locked and limited conversation to collaborators Apr 17, 2023
@thorwhalen thorwhalen converted this issue into discussion #63 Apr 17, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
design discussion To discuss design options, pros and cons, etc.
Projects
None yet
Development

No branches or pull requests

3 participants