Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when adding two incomparable items with same key to SortedListWithKey #7

Closed
Muon opened this issue Jun 26, 2014 · 10 comments · Fixed by #8
Closed

TypeError when adding two incomparable items with same key to SortedListWithKey #7

Muon opened this issue Jun 26, 2014 · 10 comments · Fixed by #8

Comments

@Muon
Copy link
Contributor

Muon commented Jun 26, 2014

On Python 3.4.1:

>>> l = SortedListWithKey(key=lambda x: 1)
>>> class A: pass
... 
>>> l.add(A())
>>> l.add(A())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/muon/kirmvenv2/lib/python3.4/site-packages/sortedcontainers/sortedlistwithkey.py", line 21, in add
    self._list.add(pair)
  File "/home/muon/kirmvenv2/lib/python3.4/site-packages/sortedcontainers/sortedlist.py", line 51, in add
    pos = bisect_right(self._maxes, val)
TypeError: unorderable types: A() < A()

This is because in Python 3 comparing incomparable types dies horribly instead of returning True.

@Muon
Copy link
Contributor Author

Muon commented Jun 26, 2014

I have a disgustingly wonderful "fix"! Tack on a fresh float("nan") after the key. All comparisons with NaNs, including between NaNs, return False.

>>> (1, float("nan"), A()) < (1, float("nan"), A())
False
>>> (0, float("nan"), A()) < (1, float("nan"), A())
True

Freshness is for some reason important, since:

>>> nan = float("nan")
>>> (1, nan, A()) < (1, nan, A())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: A() < A()

Wrapping the key in a tuple like this also works:

>>> ((1, float("nan")), A()) < ((1, float("nan")), A())
False
>>> ((0, float("nan")), A()) < ((1, float("nan")), A())
True

Reported this to CPython at http://bugs.python.org/issue21873.

That does lead to an actual hackish solution, which is to make a

class Dummy:
    def __lt__(self, other):
        return True

And use a single instance of that (can be shared) as comparison fodder. EDIT: Nope, can't be shared. Identity must be screwing it up in CPython at least.

@Muon
Copy link
Contributor Author

Muon commented Jun 29, 2014

Ignore the previous "solution" comments. Who decided we absolutely definitely had to use tuples, no exceptions? 😅

Made a pull request with something a lot saner.

@grantjenks grantjenks reopened this Jun 30, 2014
@grantjenks grantjenks changed the title TypeError when adding two incomparable items of same rank to SortedListWithKey TypeError when adding two incomparable items with same key to SortedListWithKey Jun 30, 2014
@grantjenks
Copy link
Owner

I had originally accepted the pull request but later undid it when it failed this test case:

>>> lk = sortedcontainers.SortedListWithKey(xrange(10000), lambda val: val % 10)
>>> 1234 in lk
False

@grantjenks
Copy link
Owner

I believe this works but I don't like it much. This inserts items as a threeple:
(key, order, value)
The order is a monotonically increasing integer. It breaks ties on keys and guarantees that values are never compared for "less than". I also added a function _iter which quickly iterates the underlying list for all keys matching a given value.

Benefits:

  1. Never compares values.

Drawbacks:

  1. Inserting into the list is difficult because you need a way to expose the order parameter and you may need some way to re-order items. This is an uncommon scenario but an important limitation to understand.
  2. The implementation is about 10 times slower for functions like contains_. This feels like a major drawback.

I noticed that blist does something similar to the current implementation (using tuples) and relies on valid comparison operators for the value type. I believe it would likewise fail for your test case.

Ultimately I think it depends on what happens in practice. My guess is that having value types which you can't compare is rare but I'd love to know more about your scenario.

There may be a way to speed this up if a hash-table of value: order were maintained but that would require the values are hashable and I haven't tried to write the code.

Here's the code:

# -*- coding: utf-8 -*-
#
# Sorted list with key implementation.

from .sortedlist import SortedList
from collections import MutableSequence
from itertools import chain, count

class SortedListWithKey(MutableSequence):
    def __init__(self, iterable=None, key=lambda val: val, load=100):
        self._key = key
        self._list = SortedList(load=load)
        self._order = count(1)

        if iterable is not None:
            self.update(iterable)

    def clear(self):
        self._order = count(1)
        self._list.clear()

    def add(self, value):
        pair = (self._key(value), next(self._order), value)
        self._list.add(pair)

    def update(self, iterable):
        _key, _order = self._key, self._order
        self._list.update((_key(val), next(_order), val) for val in iterable)

    def _iter(self, value):
        _list = self._list
        key = self._key(value)

        low = (key, 0, value)
        high = (key, next(self._order), value)

        start = _list.bisect_left(low)
        end = _list.bisect_right(high)

        yield start

        if start == end:
            return

        start_pos, start_idx = _list._pos(start)
        end_pos, end_idx = _list._pos(end - 1)

        _lists = _list._lists
        segments = (_lists[pos] for pos in range(start_pos, end_pos + 1))
        iterator = chain.from_iterable(segments)

        # Advance the iterator to the start of the items.

        for rpt in range(start_idx):
            next(iterator)

        for rpt in range(end - start + 1):
            yield next(iterator)

    def __contains__(self, value):
        iterator = self._iter(value)

        next(iterator)

        for key, order, val in iterator:
            if value == val:
                return True

        return False

    def discard(self, value):
        iterator = self._iter(value)
        start = next(iterator)
        for offset, val in enumerate(iterator):
            if value == val[2]:
                del self._list[start + offset]
                return

    def remove(self, value):
        iterator = self._iter(value)
        start = next(iterator)
        for offset, val in enumerate(iterator):
            if value == val[2]:
                del self._list[start + offset]
                return
        raise ValueError

    def __delitem__(self, index):
        del self._list[index]

    def __getitem__(self, index):
        if isinstance(index, slice):
            return list(tup[2] for tup in self._list[index])
        else:
            return self._list[index][2]

    def __setitem__(self, index, value):
        _key, _order = self._key, self._order
        if isinstance(index, slice):
            self._list[index] = list((_key(val), next(_order), val) for val in value)
        else:
            self._list[index] = (_key(value), next(_order), value)

    def __iter__(self):
        return iter(tup[2] for tup in iter(self._list))

    def __reversed__(self):
        return iter(tup[2] for tup in reversed(self._list))

    def __len__(self):
        return len(self._list)

    def bisect_left(self, value):
        pair = (self._key(value), 0, value)
        return self._list.bisect_left(pair)

    def bisect(self, value):
        pair = (self._key(value), 0, value)
        return self._list.bisect(pair)

    def bisect_right(self, value):
        pair = (self._key(value), next(self._order), value)
        return self._list.bisect_right(pair)

    def count(self, value):
        iterator = self._iter(value)
        next(iterator)
        return sum(1 for val in iterator if val[2] == value)

    def append(self, value):
        pair = (self._key(value), next(self._order), value)
        self._list.append(pair)

    def extend(self, iterable):
        _key, _order = self._key, self._order
        self._list.extend((_key(val), next(_order), val) for val in iterable)

    def insert(self, index, value):
        pair = (self._key(value), next(self._order), value)
        self._list.insert(index, pair)

    def pop(self, index=-1):
        return self._list.pop(index)[2]

    def index(self, value, start=None, stop=None):
        _len = self._list._len

        if start == None:
            start = 0
        if start < 0:
            start += _len
        if start < 0:
            start = 0

        if stop == None:
            stop = _len
        if stop < 0:
            stop += _len
        if stop > _len:
            stop = _len

        if stop <= start:
            raise ValueError

        iterator = self._iter(value)
        begin = next(iterator)

        for offset, val in enumerate(iterator):
            if value == val[2] and start <= (begin + offset) < stop:
                return begin + offset

        raise ValueError

    def as_list(self):
        return list(tup[2] for tup in self._list.as_list())

    def __add__(self, that):
        result = SortedListWithKey(key=self._key)
        values = list(self)
        values.extend(that)
        result.update(values)
        return result

    def __iadd__(self, that):
        self.update(that)
        return self

    def __mul__(self, that):
        values = self.as_list() * that
        return SortedListWithKey(values, key=self._key)

    def __imul__(self, that):
        values = self.as_list() * that
        self.clear()
        self.update(values)
        return self

    def __eq__(self, that):
        return ((len(self) == len(that))
                and all(lhs == rhs for lhs, rhs in zip(self, that)))

    def __ne__(self, that):
        return ((len(self) != len(that))
                or any(lhs != rhs for lhs, rhs in zip(self, that)))

    def __lt__(self, that):
        return ((len(self) <= len(that))
                and all(lhs < rhs for lhs, rhs in zip(self, that)))

    def __le__(self, that):
        return ((len(self) <= len(that))
                and all(lhs <= rhs for lhs, rhs in zip(self, that)))

    def __gt__(self, that):
        return ((len(self) >= len(that))
                and all(lhs > rhs for lhs, rhs in zip(self, that)))

    def __ge__(self, that):
        return ((len(self) >= len(that))
                and all(lhs >= rhs for lhs, rhs in zip(self, that)))

    def __repr__(self):
        return 'SortedListWithKey({0}, key={1})'.format(
            repr(self.as_list()), repr(self._key))

    def _check(self):
        self._list._check()

        # Validate the keys.

        for key, order, value in self._list:
            assert self._key(value) == key

        # Validate the orders.

        if len(self) > 0:
            iterator = iter(self._list)
            prev_key, prev_order, value = next(iterator)
            for key, order, value in iterator:
                if prev_key == key:
                    assert prev_order < order
                prev_key = key
                prev_order = order

@Muon
Copy link
Contributor Author

Muon commented Jul 1, 2014

No, I was using blist before (and it worked perfectly), but I wanted to move to something in pure Python, since compiling things is kinda horrible for Windows users. You are correct that blist stores tuples, but note that it uses _sortedbase._i2key() for its comparisons. It has a custom bisect_left() and bisect_right(). Also note that _sortedbase.__contains__() uses _sortedbase._advance().

My use case is: I have a sorted, prioritized list of "actions" which need to be done periodically. They're ordered by rank, but there may be ties.

@grantjenks
Copy link
Owner

Ok, I think I've got something working. Can you try the master branch at 29958d5. I've added a boolean parameter, "value_orderable" which you should set to False for your purposes. When False, the key variation promises not to compare values.

It works like blist in the sense that iteration is required over duplicate keys. And it works a little like your version in that I created a new type, Pair, for insertion. Unlike your pull request though, a Pair only compares keys and it plays nicely with tuple.getitem semantics.

Performance seems reasonable although value_orderable=False takes a pretty big hit. I'd estimate somewhere around 2-10x slower than the w/out key version. The _iter function could be made faster depending on your data. I'm open to other ideas on how to implement it. Fortunately the SortedList type is really fast.

Let me know if that works for you and if so, I'll release it in v0.8.3.

@Muon
Copy link
Contributor Author

Muon commented Jul 3, 2014

Seems to work! Equal key iteration performance is not important for me yet, but how do you think _iter could be made faster? I'm not really familiar with the workings of SortedList itself.

@Muon
Copy link
Contributor Author

Muon commented Jul 7, 2014

@grantjenks When do you plan to unleash the glory of v0.8.3? My setup.py itches.

@grantjenks
Copy link
Owner

Did it this morning. Thanks for the reminder.

v0.8.4 will have a faster implementation so you may want to upgrade again in the near future. But at least you can switch to using PyPI for now.

Thanks for using the module. Let me know if you catch any more bugs.

@grantjenks
Copy link
Owner

v0.8.4 was deployed to PyPI today. It contains a faster implementation for SortedListWithKey in some scenarios and more robust testing. Please upgrade and let me know if it's working for you. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants