Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can SortedDict add support to quack like a Sequence when desired? #59

Closed
jab opened this issue Dec 7, 2016 · 10 comments
Closed

can SortedDict add support to quack like a Sequence when desired? #59

jab opened this issue Dec 7, 2016 · 10 comments

Comments

@jab
Copy link

jab commented Dec 7, 2016

(I'm guessing you've thought about this already and decided against, but just wanted to make sure.)

The Python glossary defines sequence as:

An iterable which supports efficient element access using integer indices via the __getitem__() special method and defines a __len__() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and bytes. Note that dict also supports __getitem__() and __len__(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.

The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just __getitem__() and __len__(), adding count(), index(), __contains__(), and __reversed__(). Types that implement this expanded interface can be registered explicitly using register().

SortedDicts have a well-defined ordering, and offer efficient element access using integer indices. So ideally a SortedDict would be able to be used anywhere a Sequence was required.

But because of the overloaded semantics of __getitem__, SortedDict of course has to choose to use it for the Mapping-style implementation rather than the Sequence-style, and thus probably gives up any right to claim that it implements Sequence. Seems like a shame!

(Can't help but wonder if some fancier version of register could allow SortedDict to specify iloc as its __getitem__ implementation when being treated as a Sequence, and wonder whether e.g. Haskell or Clojure would never have a problem like this.)

In any case, we're out of luck right? The Keys-, Values-, and ItemsViews that SortedDict offers are as much Sequence support as SortedDict can provide?

Curious to get your thoughts. And wonder if it's worth adding any takeaways from this to the SortedDict docs. e.g. "Because of limitations in Python, SortedDict cannot implement the Sequence interface even though it took all the right vitamins. Instead you must use a SortedDict's KeysView, ValuesView, or ItemsView anywhere you'd want to use a SortedDict as a Sequence."

@jab jab changed the title should SortedDict be registered as a Sequence? can SortedDict add support to quack like a Sequence when desired? Dec 7, 2016
@grantjenks
Copy link
Owner

I don't think it's a Python limitation per se. Certainly it is a result of the overloaded meaning of __getitem__ but I don't think that's much bother. As you've observed, KeysView implements Sequence and I think is documented as so. Some folks have bristled at that which lead me to consider a SequenceView-type. Perhaps in the same way you call ".keys()" you could call ".sequence()" and get the SequenceView. Maybe a better solution is to qualify the KeysView as a SortedKeysView and then the Set and Sequence distinctions make more sense.

Now that I think about it, I guess .iloc is really only there for __delitem__. Enough people are confused by .iloc that I have actually wanted to get rid of it. That one part acts like a MutableSequence though without __setitem__ and insert counterparts which wouldn't make much sense. I drew inspiration from Pandas DataFrames initially. DataFrames with an index allow slicing and deletion by numeric index. Maybe it would've been better to put the __delitem__ method on the KeysView but then the view is not read-only.

Can you describe your SortedDict-as-Sequence use case in more detail?

@jab
Copy link
Author

jab commented Dec 8, 2016

Interesting to hear more of the backstory, thanks for explaining.

Maybe a better solution is to qualify the KeysView as a SortedKeysView and then the Set and Sequence distinctions make more sense.

Sounds good to me, based on what I just read. (Though feel free to take that with a grain of salt until I've had more time to explore the options in the context of real-world use cases.)

Can you describe your SortedDict-as-Sequence use case in more detail?

Not a typical SortedDict use case (so feel free to take this with a grain of salt too, but fwiw): While looking into #57 I initially wondered whether I could infer whether a user-created bidict is ordered or not simply by checking isinstance(_fwd_class, Sequence). (To create a sortedbidict, a user can set _fwd_class = _inv_class = SortedDict. But since SortedDict isn't a Sequence, I'd need some other way to check if a bidict is ordered.)

However, after noticing the "efficient element access using integer indices" part of the Sequence definition, I realized "Sequence" and "Iterable with a well-defined ordering" are not the same thing (otherwise e.g. a linked list would be considered a sequence). I wonder if it's strange that collections.abc doesn't provide a type that signifies "well-defined ordering" without necessarily providing efficient random access.

In any case, I ended up just creating an OrderedBidirectionalMapping base class (which a user implementing a sortedbidict can inherit from), and checking whether a bidict is an instance of that where I needed to.

Curious to hear your thoughts as always.

@jab
Copy link
Author

jab commented Dec 8, 2016

After sleeping on it, I'm really wondering why collections.abc doesn't provide some OrderedMapping interface. That would allow disparate implementations like collections.OrderedDict, bidict.OrderedBidict, sortedcontainers.SortedDict, etc. to all interoperate around the common interface. E.g. A polymorphic __eq__ implementation could check other for implementing OrderedMapping to determine whether to compare items order-sensitively or not. Currently OrderedDict.__eq__ compares order-sensitively only if other is another OrderedDict, but using OrderedMapping would be more polymorphic and allow greater interoperability.

Given that this is missing from the standard library, would it be worth it if I published a library to PyPI which provided OrderedMapping, registered OrderedDict as implementing it (and optionally overrode its __eq__ implementation with a polymorphic one)? Would you be interested in having your libraries optionally depend on this, and if installed, your OrderedMappings could register themselves as such (and use the provided __eq__ implementation, falling back to their own if not installed)? I could do the same with bidict, and then users who needed interop between our different OrderedMapping implementations could just install the optional OrderedMapping dependency to get it. What do you think?

/cc @raymondh for your peripheral vision (though if you think OrderedMapping might have a place in collections.abc, happy to try to submit a patch or move discussion to python-ideas first).

@jab
Copy link
Author

jab commented Dec 8, 2016

I sketched out what this could look like in jab/bidict@cc041d6 (search for "OrderedMapping"). If there is interest, I can factor that out of bidict into a standalone module that I can publish to PyPI, and ideally even a patch to collections.abc.

@jab
Copy link
Author

jab commented Dec 8, 2016

I wonder if it's strange that collections.abc doesn't provide a type that signifies "well-defined ordering" without necessarily providing efficient random access.

Looks like I wasn't the only one: Just noticed collections.abc.Reversible landed in 3.6 and, in promising reversibility, implies well-defined order. Would be nice to have a class that's just Ordered without being Reversible though, for things like singly-linked lists and infinite lists.

Regarding collections.abc.OrderedMapping, I just sketched out a patch and filed https://bugs.python.org/issue28912 to see if it could make sense to add to the standard library.

@grantjenks
Copy link
Owner

Part of the issue is that Ordered is often confused with Sorted. Many people think collections.OrderedDict is a SortedDict until the keys get out of order. The other impediment to a PyPI package with extra ABC types is that I like to avoid dependencies. But if it were in the standard library then I think I would use it.

There are a couple relevant discussions on python-ideas:

  1. https://mail.python.org/pipermail/python-ideas/2015-November/037132.html
  2. https://mail.python.org/pipermail/python-ideas/2015-December/037540.html

Looks like your OrderedBidict should simply inherit MutableMapping and Reversible. Reversible was meant for that.

Personally I would consider Sorted separately and expect methods like bisect, etc. But for me the collections ABCs are more useful as mixins to quickly prototype new types. Using the type system to indicate traits like ordered, sorted, etc. is less appealing.

@jab
Copy link
Author

jab commented Dec 9, 2016

Thanks for linking me to that thread! Just caught up.

I'm optimistic that people who don't know the difference between "sorted" and "ordered" could learn the moment they take an actual look at the descriptions on the tin.

Also, since Reversible implements __subclasshook__, issubclass(orderedbidict, Reversibile) is already True simply because orderedbidict implements __reverse__, without having to do any extra work. __subclasshook__ ftw!

I also realized I initially was conflating ordered-ness with reversibility (which it looks like Serhiy (who wrote the second message you linked to) and others have done too). But reversibility is a stronger claim than ordered-ness, and so I think ordered-ness deserves representation in its own right. Do you think something like the following could prove a useful mixin?

class Ordered(Iterable):
    def __eq__(self, other):
        if not isinstance(other, Ordered):
            return NotImplemented
        return all(i == j for (i, j) in zip(self, other))

I get not being as excited by using the type system to indicate traits. I'm just afraid there's nothing else we can use to interrogate this particular trait of some object, so we can respond appropriately, since the fact that a Collection is ordered does not imply implementing any new methods that are not already covered by some weaker interface like Iterable.

Cool you'd be down to use OrderedMapping if it made it into the standard library. Keeping my eye on https://bugs.python.org/issue28912 :)

@grantjenks
Copy link
Owner

As I ponder it, I don't think it's that useful. But I have wondered before why Sequence does not define __eq__ and this could possibly fill that gap. But it is generally easy to write myself.

It seems odd that "Ordered" (from a naming perspective) provides the ability to equate objects. I would expect it to be "Equatable" in a traits-fashion. And notions of equality are quite overloaded between sequences, mappings, and sets. I have seen a few of these conversations stall-out because common ground is hard to find.

In other languages there's both a traits and type system. I think I'd rather see that separation of concerns rather than further overloading the meaning of types. Traits could have a similar interface to ABCs in the use of "register" to indicate participation. I think if you surveyed the standard/popular libraries then you could find a number of places where types are being used as traits. That would be an interesting survey/proposal to me. With the current push toward typing, I don't know how traits would be received. You might ask some of the mypy team. Jukka, et al. have probably already thought deeply about this.

@jab
Copy link
Author

jab commented Dec 19, 2016

Very interesting! Makes sense. I spent a minute looking for any existing recent discussion on traits vs. types in Python, and didn't find anything. /cc @JukkaL et al., in case this is of interest.

@JukkaL
Copy link

JukkaL commented Dec 20, 2016

I assume here that traits would be inherited only for implementations of methods and/or attributes, and types would be used for static type checking (or generally in type annotations). These terms are a little vague and can mean slightly different things in different contexts. If you meant something else, the rest of this comment will likely not make much sense.

PEP 484 doesn't make a distinction between types and traits since we follow a simple rule: "every class is a type". There was no clear reason for making traits a separate concept essentially for classes that aren't types. For example, int is both a type and a class than can be subclassed, and so is Sequence, etc.

We've discussed something that we've been calling "pure mixins" which are related to traits (python/typing#246). Protocols (python/typing#11) are also a related concept.

Early in the development of mypy I considered using Scala-like traits but I later gave up on this idea. Scala traits are also types, so they are arguably more like Python ABCs than implementation-only traits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants