TypeError when adding two incomparable items with same key to SortedListWithKey #7

Muon · 2014-06-26T07:33:26Z

On Python 3.4.1:

>>> l = SortedListWithKey(key=lambda x: 1)
>>> class A: pass
... 
>>> l.add(A())
>>> l.add(A())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/muon/kirmvenv2/lib/python3.4/site-packages/sortedcontainers/sortedlistwithkey.py", line 21, in add
    self._list.add(pair)
  File "/home/muon/kirmvenv2/lib/python3.4/site-packages/sortedcontainers/sortedlist.py", line 51, in add
    pos = bisect_right(self._maxes, val)
TypeError: unorderable types: A() < A()

This is because in Python 3 comparing incomparable types dies horribly instead of returning True.

The text was updated successfully, but these errors were encountered:

Muon · 2014-06-26T08:03:00Z

I have a disgustingly wonderful "fix"! Tack on a fresh float("nan") after the key. All comparisons with NaNs, including between NaNs, return False.

>>> (1, float("nan"), A()) < (1, float("nan"), A())
False
>>> (0, float("nan"), A()) < (1, float("nan"), A())
True

Freshness is for some reason important, since:

>>> nan = float("nan")
>>> (1, nan, A()) < (1, nan, A())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: A() < A()

Wrapping the key in a tuple like this also works:

>>> ((1, float("nan")), A()) < ((1, float("nan")), A())
False
>>> ((0, float("nan")), A()) < ((1, float("nan")), A())
True

Reported this to CPython at http://bugs.python.org/issue21873.

That does lead to an actual hackish solution, which is to make a

class Dummy:
    def __lt__(self, other):
        return True

And use a single instance of that (can be shared) as comparison fodder. EDIT: Nope, can't be shared. Identity must be screwing it up in CPython at least.

Muon · 2014-06-29T09:35:00Z

Ignore the previous "solution" comments. Who decided we absolutely definitely had to use tuples, no exceptions? 😅

Made a pull request with something a lot saner.

grantjenks · 2014-06-30T04:18:36Z

I had originally accepted the pull request but later undid it when it failed this test case:

>>> lk = sortedcontainers.SortedListWithKey(xrange(10000), lambda val: val % 10)
>>> 1234 in lk
False

grantjenks · 2014-06-30T23:27:03Z

I believe this works but I don't like it much. This inserts items as a threeple:
(key, order, value)
The order is a monotonically increasing integer. It breaks ties on keys and guarantees that values are never compared for "less than". I also added a function _iter which quickly iterates the underlying list for all keys matching a given value.

Benefits:

Never compares values.

Drawbacks:

Inserting into the list is difficult because you need a way to expose the order parameter and you may need some way to re-order items. This is an uncommon scenario but an important limitation to understand.
The implementation is about 10 times slower for functions like contains_. This feels like a major drawback.

I noticed that blist does something similar to the current implementation (using tuples) and relies on valid comparison operators for the value type. I believe it would likewise fail for your test case.

Ultimately I think it depends on what happens in practice. My guess is that having value types which you can't compare is rare but I'd love to know more about your scenario.

There may be a way to speed this up if a hash-table of value: order were maintained but that would require the values are hashable and I haven't tried to write the code.

Here's the code:

# -*- coding: utf-8 -*-
#
# Sorted list with key implementation.

from .sortedlist import SortedList
from collections import MutableSequence
from itertools import chain, count

class SortedListWithKey(MutableSequence):
    def __init__(self, iterable=None, key=lambda val: val, load=100):
        self._key = key
        self._list = SortedList(load=load)
        self._order = count(1)

        if iterable is not None:
            self.update(iterable)

    def clear(self):
        self._order = count(1)
        self._list.clear()

    def add(self, value):
        pair = (self._key(value), next(self._order), value)
        self._list.add(pair)

    def update(self, iterable):
        _key, _order = self._key, self._order
        self._list.update((_key(val), next(_order), val) for val in iterable)

    def _iter(self, value):
        _list = self._list
        key = self._key(value)

        low = (key, 0, value)
        high = (key, next(self._order), value)

        start = _list.bisect_left(low)
        end = _list.bisect_right(high)

        yield start

        if start == end:
            return

        start_pos, start_idx = _list._pos(start)
        end_pos, end_idx = _list._pos(end - 1)

        _lists = _list._lists
        segments = (_lists[pos] for pos in range(start_pos, end_pos + 1))
        iterator = chain.from_iterable(segments)

        # Advance the iterator to the start of the items.

        for rpt in range(start_idx):
            next(iterator)

        for rpt in range(end - start + 1):
            yield next(iterator)

    def __contains__(self, value):
        iterator = self._iter(value)

        next(iterator)

        for key, order, val in iterator:
            if value == val:
                return True

        return False

    def discard(self, value):
        iterator = self._iter(value)
        start = next(iterator)
        for offset, val in enumerate(iterator):
            if value == val[2]:
                del self._list[start + offset]
                return

    def remove(self, value):
        iterator = self._iter(value)
        start = next(iterator)
        for offset, val in enumerate(iterator):
            if value == val[2]:
                del self._list[start + offset]
                return
        raise ValueError

    def __delitem__(self, index):
        del self._list[index]

    def __getitem__(self, index):
        if isinstance(index, slice):
            return list(tup[2] for tup in self._list[index])
        else:
            return self._list[index][2]

    def __setitem__(self, index, value):
        _key, _order = self._key, self._order
        if isinstance(index, slice):
            self._list[index] = list((_key(val), next(_order), val) for val in value)
        else:
            self._list[index] = (_key(value), next(_order), value)

    def __iter__(self):
        return iter(tup[2] for tup in iter(self._list))

    def __reversed__(self):
        return iter(tup[2] for tup in reversed(self._list))

    def __len__(self):
        return len(self._list)

    def bisect_left(self, value):
        pair = (self._key(value), 0, value)
        return self._list.bisect_left(pair)

    def bisect(self, value):
        pair = (self._key(value), 0, value)
        return self._list.bisect(pair)

    def bisect_right(self, value):
        pair = (self._key(value), next(self._order), value)
        return self._list.bisect_right(pair)

    def count(self, value):
        iterator = self._iter(value)
        next(iterator)
        return sum(1 for val in iterator if val[2] == value)

    def append(self, value):
        pair = (self._key(value), next(self._order), value)
        self._list.append(pair)

    def extend(self, iterable):
        _key, _order = self._key, self._order
        self._list.extend((_key(val), next(_order), val) for val in iterable)

    def insert(self, index, value):
        pair = (self._key(value), next(self._order), value)
        self._list.insert(index, pair)

    def pop(self, index=-1):
        return self._list.pop(index)[2]

    def index(self, value, start=None, stop=None):
        _len = self._list._len

        if start == None:
            start = 0
        if start < 0:
            start += _len
        if start < 0:
            start = 0

        if stop == None:
            stop = _len
        if stop < 0:
            stop += _len
        if stop > _len:
            stop = _len

        if stop <= start:
            raise ValueError

        iterator = self._iter(value)
        begin = next(iterator)

        for offset, val in enumerate(iterator):
            if value == val[2] and start <= (begin + offset) < stop:
                return begin + offset

        raise ValueError

    def as_list(self):
        return list(tup[2] for tup in self._list.as_list())

    def __add__(self, that):
        result = SortedListWithKey(key=self._key)
        values = list(self)
        values.extend(that)
        result.update(values)
        return result

    def __iadd__(self, that):
        self.update(that)
        return self

    def __mul__(self, that):
        values = self.as_list() * that
        return SortedListWithKey(values, key=self._key)

    def __imul__(self, that):
        values = self.as_list() * that
        self.clear()
        self.update(values)
        return self

    def __eq__(self, that):
        return ((len(self) == len(that))
                and all(lhs == rhs for lhs, rhs in zip(self, that)))

    def __ne__(self, that):
        return ((len(self) != len(that))
                or any(lhs != rhs for lhs, rhs in zip(self, that)))

    def __lt__(self, that):
        return ((len(self) <= len(that))
                and all(lhs < rhs for lhs, rhs in zip(self, that)))

    def __le__(self, that):
        return ((len(self) <= len(that))
                and all(lhs <= rhs for lhs, rhs in zip(self, that)))

    def __gt__(self, that):
        return ((len(self) >= len(that))
                and all(lhs > rhs for lhs, rhs in zip(self, that)))

    def __ge__(self, that):
        return ((len(self) >= len(that))
                and all(lhs >= rhs for lhs, rhs in zip(self, that)))

    def __repr__(self):
        return 'SortedListWithKey({0}, key={1})'.format(
            repr(self.as_list()), repr(self._key))

    def _check(self):
        self._list._check()

        # Validate the keys.

        for key, order, value in self._list:
            assert self._key(value) == key

        # Validate the orders.

        if len(self) > 0:
            iterator = iter(self._list)
            prev_key, prev_order, value = next(iterator)
            for key, order, value in iterator:
                if prev_key == key:
                    assert prev_order < order
                prev_key = key
                prev_order = order

Muon · 2014-07-01T01:35:01Z

No, I was using blist before (and it worked perfectly), but I wanted to move to something in pure Python, since compiling things is kinda horrible for Windows users. You are correct that blist stores tuples, but note that it uses _sortedbase._i2key() for its comparisons. It has a custom bisect_left() and bisect_right(). Also note that _sortedbase.__contains__() uses _sortedbase._advance().

My use case is: I have a sorted, prioritized list of "actions" which need to be done periodically. They're ordered by rank, but there may be ties.

grantjenks · 2014-07-01T19:24:59Z

Ok, I think I've got something working. Can you try the master branch at 29958d5. I've added a boolean parameter, "value_orderable" which you should set to False for your purposes. When False, the key variation promises not to compare values.

It works like blist in the sense that iteration is required over duplicate keys. And it works a little like your version in that I created a new type, Pair, for insertion. Unlike your pull request though, a Pair only compares keys and it plays nicely with tuple.getitem semantics.

Performance seems reasonable although value_orderable=False takes a pretty big hit. I'd estimate somewhere around 2-10x slower than the w/out key version. The _iter function could be made faster depending on your data. I'm open to other ideas on how to implement it. Fortunately the SortedList type is really fast.

Let me know if that works for you and if so, I'll release it in v0.8.3.

Muon · 2014-07-03T06:07:00Z

Seems to work! Equal key iteration performance is not important for me yet, but how do you think _iter could be made faster? I'm not really familiar with the workings of SortedList itself.

Muon · 2014-07-07T11:47:02Z

@grantjenks When do you plan to unleash the glory of v0.8.3? My setup.py itches.

grantjenks · 2014-07-07T18:13:32Z

Did it this morning. Thanks for the reminder.

v0.8.4 will have a faster implementation so you may want to upgrade again in the near future. But at least you can switch to using PyPI for now.

Thanks for using the module. Let me know if you catch any more bugs.

grantjenks · 2014-07-29T19:18:46Z

v0.8.4 was deployed to PyPI today. It contains a faster implementation for SortedListWithKey in some scenarios and more robust testing. Please upgrade and let me know if it's working for you. Thanks!

Muon mentioned this issue Jun 29, 2014

Fix SortedListWithKey w/ incomparable values on Python 3 #8

Merged

grantjenks closed this as completed in #8 Jun 29, 2014

grantjenks reopened this Jun 30, 2014

grantjenks changed the title ~~TypeError when adding two incomparable items of same rank to SortedListWithKey~~ TypeError when adding two incomparable items with same key to SortedListWithKey Jun 30, 2014

grantjenks closed this as completed Jul 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError when adding two incomparable items with same key to SortedListWithKey #7

TypeError when adding two incomparable items with same key to SortedListWithKey #7

Muon commented Jun 26, 2014

Muon commented Jun 26, 2014

Muon commented Jun 29, 2014

grantjenks commented Jun 30, 2014

grantjenks commented Jun 30, 2014

Muon commented Jul 1, 2014

grantjenks commented Jul 1, 2014

Muon commented Jul 3, 2014

Muon commented Jul 7, 2014

grantjenks commented Jul 7, 2014

grantjenks commented Jul 29, 2014

TypeError when adding two incomparable items with same key to SortedListWithKey #7

TypeError when adding two incomparable items with same key to SortedListWithKey #7

Comments

Muon commented Jun 26, 2014

Muon commented Jun 26, 2014

Muon commented Jun 29, 2014

grantjenks commented Jun 30, 2014

grantjenks commented Jun 30, 2014

Muon commented Jul 1, 2014

grantjenks commented Jul 1, 2014

Muon commented Jul 3, 2014

Muon commented Jul 7, 2014

grantjenks commented Jul 7, 2014

grantjenks commented Jul 29, 2014