v4.0 #57

wimglenn · 2019-08-06T21:45:53Z

This changes equality comparisons completely to make them more consistent with how all the other data structures in Python stdlib behave.

Comparisons with regular set and frozenset are order agnostic.
Comparisons with other OrderedSet (and other ordered-set-like-things) are order sensitive.
Previous equality comparison with list/deque/sequence was removed (this was unpythonic and had no precedent)
Comparisons with unrelated types return NotImplemented - this is allowing the other type a chance to potentially handle the operation (as opposed to returning False)
Major version number is bumped up because this is a backwards-incompat change.

The behaviour of equality comparison was modeled after the way collections.OrderedDict behaves (order agnostic vs regular dict, order sensitive vs other ordered dict).

…s being bitten by terrible bug in bpo issue8743

…arisons generally behave in Python

rspeer · 2019-08-07T18:36:04Z

I don't agree with breaking the ability to compare to lists. Clearly that's something that people use.

This implementation of OrderedSet (as opposed to the alternate implementation, an OrderedDict where the keys don't matter) turns out to be useful in list-like contexts, much like pandas.Index is. Maybe it should have been called UniqueList or something from the start.

For example, you can use the indexing operator on a list or an OrderedSet, and not on a set, a dict, or an OrderedDict.

I recognize that it's better to return NotImplemented when comparing to an unrelated type.

wimglenn · 2019-08-07T22:49:53Z

Is it really clear that people use that? How?

Different types in Python almost never compare equal, except in a handful of special cases (numeric types, subclasses). The developer must be explicit about it with constructs such as:

list(xs) == list(ys)
all(x==y for x,y in zip(xs, ys))

I was surprised to see the comparison special-cased with Sequences and deque in the source code, I think most Python developers would never have expected such a thing.

Be aware of the mathematical consequences as well, for example breaking transitivity of equality:

>>> OrderedSet([0,1]) == [0,1]
True
>>> OrderedSet([0,1]) == (0,1)
True
>>> [0,1] == (0,1)
False

rspeer · 2019-08-08T18:01:49Z

So I don't think that Python guarantees transitivity of equality, but the order-independent comparison with sets (which I don't have any indication that anyone has wanted) is definitely non-transitive. If it stays implemented:

OrderedSet([1, 2]) == {1, 2} == OrderedSet([2, 1])

The special case with deque, on the other hand, is because someone asked for it.

OrderedSets are sometimes used as a faster equivalent to pd.Index from Pandas, and Pandas also silently converts list into pd.Index in some contexts.

pd.Index implements equality across types, though it does it with NumPy shenanigans where == returns an array of comparisons instead of a bool, and you're expected to run .all() on the result. I don't want to emulate that NumPy gotcha, but regardless, look at how Indexes compare:

>>> pd.Index([1, 2, 3]) == (1, 2, 3)
array([ True,  True,  True])

>>> pd.Index([1, 2, 3]) == [1, 2, 3]
array([ True,  True,  True])

>>> pd.Index([1, 2, 3]) == (1, 2, 3)
array([ True,  True,  True])

>>> pd.Index([1, 2, 3]) == {1, 2, 3}
array([False, False, False])

>>> pd.Index([1, 2, 3]) == [2, 1, 3]
array([False, False,  True])

>>> pd.Index([1, 2, 3]) == OrderedSet([1, 2, 3])
array([ True,  True,  True])

wimglenn · 2019-08-08T19:10:58Z

The behaviour of array-likes (numpy and pandas broadcasting) is somewhat irrelevant in this context, and should not be taken into account for equality comparisons with stdlib containers which are returning a scalar.

If you want to keep that broadcasting behaviour for array-likes, there is nothing stopping you, but perhaps you should make sure to handle it from either side to avoid this bug-prone inconsistency:

>>> pd.Index([1, 2, 3]) == OrderedSet([1, 2, 3])
array([ True,  True,  True])
>>> OrderedSet([1, 2, 3]) == pd.Index([1, 2, 3])
True

As for OrderedSet([1, 2]) == {1, 2} == OrderedSet([2, 1]), the behaviour has precedent with OrderedDict (subclass), which is also transitivity-breaking. But I'm not aware of any equivalent example using three unrelated types in Python 3. I was using OrderedSet as a dependency, and hit some deal-breaking bugs of which the library's __eq__ implementation was the direct cause.
As a general rule, unrelated types just don't compare equal in Python. If it was, as you mentioned, called a UniqueList then I might expect it to compare equal with a list - however that would also come with other expectations (can store non-hashable items, membership checks are O(n), etc) and it may no longer be sensible to base off of Raymond Hettinger's set-like recipe in the first place.

Anyway, maybe neither of us can convince the other - I'm just putting forth my opinion and the arguments supporting it, it's of course up to you how you want to proceed with the library! I'm happy to fork and release the proposed changes under a different name if necessary.

rspeer · 2019-08-08T19:25:58Z

I don't want to implement NumPy's gotcha about equality, or depend on NumPy in any way, so the comparison between pd.Index and OrderedSet is necessarily non-commutative. No matter what decisions we chose about equality in OrderedSet, it could not return an np.array, so it can't be commutative.

Given that the bug you reported that introduced this is also about a non-commutative comparison, I'm thinking that probably the best thing to do is leave it alone. When I think about it, changing the value of a tested comparison from True to False requires not just a major version bump, but also a period of warnings and deprecations, and I think it's just not worth it.

Feel free to fork.

wimglenn added 3 commits August 6, 2019 16:37

Bump up the minimum required Python version metadata, to prevent user…

13e2b15

…s being bitten by terrible bug in bpo issue8743

Rewrote OrderedSet.__eq__ to be more consistent with how other comp…

56f9ee0

…arisons generally behave in Python

Updated the test suite for the changed behaviour of OrderedSet.__eq__

cfbac60

wimglenn mentioned this pull request Aug 6, 2019

Equality comparisons with regular set don't commute #55

Closed

rspeer closed this Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.0 #57

v4.0 #57

wimglenn commented Aug 6, 2019 •

edited

rspeer commented Aug 7, 2019

wimglenn commented Aug 7, 2019 •

edited

rspeer commented Aug 8, 2019

wimglenn commented Aug 8, 2019 •

edited

rspeer commented Aug 8, 2019

v4.0 #57

v4.0 #57

Conversation

wimglenn commented Aug 6, 2019 • edited

rspeer commented Aug 7, 2019

wimglenn commented Aug 7, 2019 • edited

rspeer commented Aug 8, 2019

wimglenn commented Aug 8, 2019 • edited

rspeer commented Aug 8, 2019

wimglenn commented Aug 6, 2019 •

edited

wimglenn commented Aug 7, 2019 •

edited

wimglenn commented Aug 8, 2019 •

edited