New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v4.0 #57
v4.0 #57
Conversation
…s being bitten by terrible bug in bpo issue8743
…arisons generally behave in Python
I don't agree with breaking the ability to compare to lists. Clearly that's something that people use. This implementation of OrderedSet (as opposed to the alternate implementation, an OrderedDict where the keys don't matter) turns out to be useful in list-like contexts, much like pandas.Index is. Maybe it should have been called UniqueList or something from the start. For example, you can use the indexing operator on a list or an OrderedSet, and not on a set, a dict, or an OrderedDict. I recognize that it's better to return NotImplemented when comparing to an unrelated type. |
Is it really clear that people use that? How? Different types in Python almost never compare equal, except in a handful of special cases (numeric types, subclasses). The developer must be explicit about it with constructs such as:
I was surprised to see the comparison special-cased with Sequences and deque in the source code, I think most Python developers would never have expected such a thing. Be aware of the mathematical consequences as well, for example breaking transitivity of equality:
|
So I don't think that Python guarantees transitivity of equality, but the order-independent comparison with sets (which I don't have any indication that anyone has wanted) is definitely non-transitive. If it stays implemented:
The special case with deque, on the other hand, is because someone asked for it. OrderedSets are sometimes used as a faster equivalent to
|
The behaviour of array-likes (numpy and pandas broadcasting) is somewhat irrelevant in this context, and should not be taken into account for equality comparisons with stdlib containers which are returning a scalar. If you want to keep that broadcasting behaviour for array-likes, there is nothing stopping you, but perhaps you should make sure to handle it from either side to avoid this bug-prone inconsistency:
As for Anyway, maybe neither of us can convince the other - I'm just putting forth my opinion and the arguments supporting it, it's of course up to you how you want to proceed with the library! I'm happy to fork and release the proposed changes under a different name if necessary. |
I don't want to implement NumPy's gotcha about equality, or depend on NumPy in any way, so the comparison between pd.Index and OrderedSet is necessarily non-commutative. No matter what decisions we chose about equality in OrderedSet, it could not return an Given that the bug you reported that introduced this is also about a non-commutative comparison, I'm thinking that probably the best thing to do is leave it alone. When I think about it, changing the value of a tested comparison from Feel free to fork. |
This changes equality comparisons completely to make them more consistent with how all the other data structures in Python stdlib behave.
set
andfrozenset
are order agnostic.OrderedSet
(and other ordered-set-like-things) are order sensitive.NotImplemented
- this is allowing the other type a chance to potentially handle the operation (as opposed to returningFalse
)The behaviour of equality comparison was modeled after the way
collections.OrderedDict
behaves (order agnostic vs regular dict, order sensitive vs other ordered dict).