Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element richcmp: never use id() #22029

Closed
mezzarobba opened this issue Dec 6, 2016 · 195 comments
Closed

Element richcmp: never use id() #22029

mezzarobba opened this issue Dec 6, 2016 · 195 comments

Comments

@mezzarobba
Copy link
Member

As discussed in the sage-devel thread starting with <o21nte$6jp$1@blaine.gmane.org> (https://groups.google.com/d/msg/sage-devel/YVFdxPH6avk/4OZUmzLHBgAJ),
coercion_model.richcmp() should not fall back on comparing by type/id.

This branch implements comparisons for Element the same way as in Python 3: a TypeError is raised for uncomparable objects (instead of comparing by id).

Dependencies: #22297, #22344, #22346, #22369, #22370, #22371, #22372, #22373, #22376, #22382, #24968, #26917, #26931, #26933, #26934, #26935, #26936, #26937, #26938, #26947, #27003, #27009, #27010, #27026, #27027, #27029, #27123, #27241

Depends on #27241

CC: @fchapoton

Component: coercion

Keywords: richcmp

Author: Marc Mezzarobba, Jeroen Demeyer

Branch/Commit: e311ab1

Reviewer: Julian Rüth, Marc Mezzarobba

Issue created by migration from https://trac.sagemath.org/ticket/22029

@mezzarobba mezzarobba added this to the sage-7.5 milestone Dec 6, 2016
@jdemeyer
Copy link

jdemeyer commented Dec 6, 2016

comment:1

Do you plan to work on this? If yes, fill in your name as reviewer. If no, let me know.

@jdemeyer

This comment has been minimized.

@jdemeyer

This comment has been minimized.

@mezzarobba
Copy link
Member Author

comment:4

Replying to @jdemeyer:

Do you plan to work on this? If yes, fill in your name as reviewer.

as author? owner?

If no, let me know.

I've started doing experiments, but I'll most likely need help from people who understand the parts of Sage that rely on the current behavior. At the very least I'll try to push a first draft in the next few days.

@mezzarobba
Copy link
Member Author

comment:5

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices, while various other parts of Sage create graphs with vertices that mix Elements with other Python objects.

An option might be to try again with key=id when sort() fails, but (with no global understanding of the code, at least) there's a major risk of getting inconsistent orders with different subsets of vertices.

Another idea would be to do introduce utilities like

# universal_cmp.pyx
cpdef lt(x, y):
    try:
        return x < y
    except TypeError:
        if type(x) is type(y):
            return id(x) < id(y)
        else:
            return type(x) < type(y)
...

cdef class key(object):
    def __init__(self):
        self.value = value
    def __lt__(self, other):
        return lt(self.value, other.value)
    ...

and then use universal_cmp.lt() and sort(key=universal_cmp.key), but this won't work either when one of the types involved only has a partial order, since this order will not be consistent with id.

Yet another variant would be to pass a comparison function (or, equivalently but probably better in view of the transition to python3, a key as above) that wraps cmp() and catches TypeErrors. Not really ideal either, but at least it wouldn't introduce any regression(?), so perhaps that would be enough for this ticket...

Any advice?

@jdemeyer
Copy link

jdemeyer commented Dec 6, 2016

comment:6

Replying to @mezzarobba:

Replying to @jdemeyer:

Do you plan to work on this? If yes, fill in your name as reviewer.

as author? owner?

Sorry, I meant as author.

@mezzarobba
Copy link
Member Author

comment:7

By the way, why did you remove the reference to the sage-devel thread?

@mezzarobba
Copy link
Member Author

Author: Marc Mezzarobba

@jdemeyer
Copy link

jdemeyer commented Dec 6, 2016

comment:8

Replying to @mezzarobba:

By the way, why did you remove the reference to the sage-devel thread?

Because I didn't understand how to interpret it. A hyperlink would be better.

@jdemeyer
Copy link

jdemeyer commented Dec 6, 2016

comment:9

Replying to @mezzarobba:

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices

Why does it rely on sorting vertices?

@mezzarobba
Copy link
Member Author

comment:10

Replying to @jdemeyer:

Replying to @mezzarobba:

By the way, why did you remove the reference to the sage-devel thread?

Because I didn't understand how to interpret it. A hyperlink would be better.

I personally prefer Message-Ids, as they don't depend on a particular archive...

@mezzarobba
Copy link
Member Author

comment:11

Replying to @jdemeyer:

Replying to @mezzarobba:

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices

Why does it rely on sorting vertices?

In sparse_graph and generic_graph, mostly. Some of the occurrences look like they are there for algorithmic purposes, others to make the output more readable, some both at the same time... I didn't look closely enough to say more.

@mezzarobba

This comment has been minimized.

@jdemeyer
Copy link

jdemeyer commented Dec 6, 2016

comment:13

Replying to @mezzarobba:

In sparse_graph and generic_graph, mostly. Some of the occurrences look like they are there for algorithmic purposes, others to make the output more readable, some both at the same time... I didn't look closely enough to say more.

It's also important to note the difference between old-style cmp() and new-style rich comparisons. Ideally, this branch would only affect the latter.

@mezzarobba
Copy link
Member Author

comment:14

Replying to @jdemeyer:

It's also important to note the difference between old-style cmp() and new-style rich comparisons. Ideally, this branch would only affect the latter.

I'm not sure I follow you: cmp() can end up calling Element.__richcmp__(), and that's perhaps the main reason why getting rid of the comparison by id there is not trivial.

@nbruin
Copy link
Contributor

nbruin commented Dec 6, 2016

comment:15

Replying to @mezzarobba:

My main concern right now is that sage.graphs heavily relies on sorting lists of vertices, while various other parts of Sage create graphs with vertices that mix Elements with other Python objects.

An option might be to try again with key=id when sort() fails, but (with no global understanding of the code, at least) there's a major risk of getting inconsistent orders with different subsets of vertices.

If you have no information on the objects you're sorting, I don't think there is anything that can save you. If your objects are guaranteed to be hashable you could try sorting on the hash, but that will leave your results undefined when hashes clash. In any case, if you rely on "first try what the object implements and then use a fallback if that fails", you're doomed.

In fact, if you allow your vertices to be labelled with arbitrary objects, you're already losing transitivity of equality in sage, via the standard

sage: a=GF(3)(1); b =31; c=GF(5)(1)
sage: a==b,b==c,a==c
(True, True, False)

It seems that labels get mangled anyway, though, so I think them not being sorted properly is only a minor concern by comparison (sorry):

sage: a=GF(3)(1); b =31; c=GF(5)(1)
sage: L=[[a,b,c],[a,c,b],[b,a,c],[b,c,a],[c,a,b],[c,b,a]]
sage: for l in L:
....:     G=Graph()
....:     for i in l:
....:         G.add_vertex(i)
....:     print [parent(j) for j in G.vertices()]
....:     
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]
[<type 'int'>, Integer Ring]

It seems to me the only sane approaches are to not rely on sorting (in python it seems being hashable is more ubiquitous than being sortable, so fast lookup is likely better accomplished via hash functions), or to declare that the labels are sortable (making failure a user error).

This stuff causes real problems: since the old python interface for __cmp__ methods required one to provide non-errors for "<" and ">" testing if one wants to implement "==", sage is riddled with apparent implementations of orderings that are in actuality inconsistent. This has cause problems in at least one case, where labels in published tables of elliptic curves were based on non-sensical but assumed-consistent ordering of number field elements:

https://groups.google.com/forum/#!topic/sage-nt/aP5LP09sADY

This has convinced me that the only sane approach is to deprecate "cmp" style interfaces as quickly as possible and instead rely on "richcmp", with errors for ordering if not obviously correct.

@mezzarobba
Copy link
Member Author

Branch: public/ticket/22029-richcmp_fallback

@mezzarobba
Copy link
Member Author

comment:16

Here is a first attempt. All (standard, long) tests should pass unless I made a last-minute mistake. See the commit messages for some comments. I'm not 100% happy with the changes, but I don't think I can do much better in a reasonable amount of time, feel free to commit improvements to the existing branch!


Last 10 new commits:

d45cd53CGraphBackend: fix a segfault when given a non-existing vertex
6e0e1b2Fix use of comparison in simplicial_complex
2925d0fBug in Poset.is_slender(?)
2b0bcc9Some benign doctest failures related to #22029
5f36421Ad-hoc comparison functions for graph vertices
6d728fa{c,sparse}_graph: systematically turn integer-like vertices into ints
fcda2c2graph_plot: sort by id
127a11dremove or fix some meaningless doctests for cmp()
29fe1f8misc fixes after changes to coercion_model.richcmp
749dfbemisc fixes after changes to coercion_model.richcmp

@mezzarobba
Copy link
Member Author

Commit: 749dfbe

@tscrim
Copy link
Collaborator

tscrim commented Dec 14, 2016

comment:17

I am a strong -1 on the removal of the != tests and the changes of foo != bar to (the fugly) not (foo == bar).

The failures you have in rigged_configurations.py is because you moved the definition of case_S. You have also made the cases in the code more difficult to follow. Please revert those changes.

@jdemeyer
Copy link

comment:18

This is bad:

        # (Note that, when y is not a Sage Element, we may have ended up here
        # from a call to cmp(). In this case, and in this case only, it may be
        # better to emulate what Python does and compare by type/id. But we
        # have no way(?) to distinguish this situation from the "normal" case
        # of a comparison operator.)

@mezzarobba
Copy link
Member Author

comment:19

Replying to @tscrim:

I am a strong -1 on the removal of the != tests and the changes of foo != bar to (the fugly) not (foo == bar).

Ugly, I agree, but safer in generic code (because of cases where equality is not decidable, like expressions, or where not enough information is available, like with intervals)... And code that might end up comparing objects of different type is generic code.

The failures you have in rigged_configurations.py is because you moved the definition of case_S. You have also made the cases in the code more difficult to follow. Please revert those changes.

I might do it, but I'd like first to understand in detail the alternative you are suggesting, and to see some evidence of a consensus that it is a better option.

In any case, please feel free to commit improvements yourself, and I'll do my best to review them.

@mezzarobba
Copy link
Member Author

comment:20

Replying to @jdemeyer:

This is bad:

        # (Note that, when y is not a Sage Element, we may have ended up here
        # from a call to cmp(). In this case, and in this case only, it may be
        # better to emulate what Python does and compare by type/id. But we
        # have no way(?) to distinguish this situation from the "normal" case
        # of a comparison operator.)

What do you mean? If I understand right, this is an issue that will solve itself if sage ever switches to Python3...

@mezzarobba
Copy link
Member Author

Changed dependencies from #27221 to #27241

@mezzarobba
Copy link
Member Author

Changed commit from ef01c33 to e311ab1

@mezzarobba
Copy link
Member Author

Changed branch from u/jdemeyer/22029-richcmp to u/mmezzarobba/22029-richcmp

@mezzarobba
Copy link
Member Author

New commits:

a6ceab9pynac-0.7.24 pkg/chksum
e311ab1Element richcmp: never use id()

@mezzarobba
Copy link
Member Author

comment:147

As for the actual content: I'm happy with this version, and I'm ready to give it positive review once the patchbot comes back green with the new pynac release. I'm still not 100% convinced that the way you are handling non-Elements and coercion failures is optimal, but even if not, it shouldn't be too hard to change later if this turns out to be an issue.

@mezzarobba
Copy link
Member Author

comment:148

The tests pass on my box.

@mezzarobba
Copy link
Member Author

Changed reviewer from Julian Rüth to Julian Rüth, Marc Mezzarobba

@jdemeyer
Copy link

comment:149

Replying to @mezzarobba:

I'm still not 100% convinced that the way you are handling non-Elements and coercion failures is optimal

What do you mean exactly? I'm open to suggestions for improvement.

What I want is that comparisons involving the coercion model behave just like comparisons in Python 3 should behave. This means: if the objects cannot be compared, raise an error instead of returning some arbitrary value.

@mezzarobba
Copy link
Member Author

comment:150

Replying to @jdemeyer:

Replying to @mezzarobba:

I'm still not 100% convinced that the way you are handling non-Elements and coercion failures is optimal

What do you mean exactly? I'm open to suggestions for improvement.

We had this discussion two years ago when I started this ticket, and frankly I don't remember all the details. But I think it revolved around the following points:

  • Regarding coercion failures, it might be better to raise an error (however inconvenient that would be!)

    • instead of returning False when two Elements with no common parent are compared using ==,
    • and, perhaps more importantly, instead of returning True when they are compared with !=—the additional point to consider in this case being that returning True would be “wrong” if we haven't proved that the elements are different, while many boolean predicates in Sage return False when they can not decide.
  • Regarding non-Elements:

    • The coercion framework treats some specific types essentially like Element. We need to be consistent with that, but still allow users to define new types that implement their own comparison logic with Sage elements.
    • richcmp() currently does that by calling tp_richcompare() on the non-Element, instead of returning NotImplemented. I'm not 100% comfortable with this choice. I assume that the goal to get the same behavior under Python 2 and 3, but I'm a bit confused here, because (unless I misunderstood you) you argued against that choice two years ago.

@jdemeyer
Copy link

comment:151

Replying to @mezzarobba:

  • Regarding coercion failures, it might be better to raise an error (however inconvenient that would be!)
    • instead of returning False when two Elements with no common parent are compared using ==

I would argue that, when there is no common parent, elements clearly cannot be equal. This is also consistent with Python 3 where uncomparable objects can still be compared by == and !=:

Python 3.6.3 (default, Mar 13 2018, 19:00:30) 
[GCC 6.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 1 == "x"
False
  • and, perhaps more importantly, instead of returning True when they are compared with !=—the additional point to consider in this case being that returning True would be “wrong” if we haven't proved that the elements are different

Again, if those elements don't coerce to a common parent, they cannot be equal.

  • Regarding non-Elements:
    • The coercion framework treats some specific types essentially like Element. We need to be consistent with that

As far as I know, we are consistent.

but still allow users to define new types that implement their own comparison logic with Sage elements.

  • richcmp() currently does that by calling tp_richcompare() on the non-Element, instead of returning NotImplemented.

Exactly.

I'm not 100% comfortable with this choice. I assume that the goal to get the same behavior under Python 2 and 3

Exactly. Returning NotImplemented causes different behavior on Python 2 and Python 3.

I'm a bit confused here, because (unless I misunderstood you) you argued against that choice two years ago.

I don't remember precisely. Anyway, I changed my mind and I'm convinced now that raising an error is the right thing to do.

@vbraun
Copy link
Member

vbraun commented Feb 11, 2019

Changed branch from u/mmezzarobba/22029-richcmp to e311ab1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants