-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError when adding two incomparable items with same key to SortedListWithKey #7
Comments
I have a disgustingly wonderful "fix"! Tack on a fresh >>> (1, float("nan"), A()) < (1, float("nan"), A())
False
>>> (0, float("nan"), A()) < (1, float("nan"), A())
True Freshness is for some reason important, since: >>> nan = float("nan")
>>> (1, nan, A()) < (1, nan, A())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: A() < A() Wrapping the key in a tuple like this also works: >>> ((1, float("nan")), A()) < ((1, float("nan")), A())
False
>>> ((0, float("nan")), A()) < ((1, float("nan")), A())
True Reported this to CPython at http://bugs.python.org/issue21873. That does lead to an actual hackish solution, which is to make a class Dummy:
def __lt__(self, other):
return True And use a single instance of that (can be shared) as comparison fodder. EDIT: Nope, can't be shared. Identity must be screwing it up in CPython at least. |
Ignore the previous "solution" comments. Who decided we absolutely definitely had to use tuples, no exceptions? 😅 Made a pull request with something a lot saner. |
I had originally accepted the pull request but later undid it when it failed this test case: >>> lk = sortedcontainers.SortedListWithKey(xrange(10000), lambda val: val % 10)
>>> 1234 in lk
False |
I believe this works but I don't like it much. This inserts items as a threeple: Benefits:
Drawbacks:
I noticed that blist does something similar to the current implementation (using tuples) and relies on valid comparison operators for the value type. I believe it would likewise fail for your test case. Ultimately I think it depends on what happens in practice. My guess is that having value types which you can't compare is rare but I'd love to know more about your scenario. There may be a way to speed this up if a hash-table of value: order were maintained but that would require the values are hashable and I haven't tried to write the code. Here's the code: # -*- coding: utf-8 -*-
#
# Sorted list with key implementation.
from .sortedlist import SortedList
from collections import MutableSequence
from itertools import chain, count
class SortedListWithKey(MutableSequence):
def __init__(self, iterable=None, key=lambda val: val, load=100):
self._key = key
self._list = SortedList(load=load)
self._order = count(1)
if iterable is not None:
self.update(iterable)
def clear(self):
self._order = count(1)
self._list.clear()
def add(self, value):
pair = (self._key(value), next(self._order), value)
self._list.add(pair)
def update(self, iterable):
_key, _order = self._key, self._order
self._list.update((_key(val), next(_order), val) for val in iterable)
def _iter(self, value):
_list = self._list
key = self._key(value)
low = (key, 0, value)
high = (key, next(self._order), value)
start = _list.bisect_left(low)
end = _list.bisect_right(high)
yield start
if start == end:
return
start_pos, start_idx = _list._pos(start)
end_pos, end_idx = _list._pos(end - 1)
_lists = _list._lists
segments = (_lists[pos] for pos in range(start_pos, end_pos + 1))
iterator = chain.from_iterable(segments)
# Advance the iterator to the start of the items.
for rpt in range(start_idx):
next(iterator)
for rpt in range(end - start + 1):
yield next(iterator)
def __contains__(self, value):
iterator = self._iter(value)
next(iterator)
for key, order, val in iterator:
if value == val:
return True
return False
def discard(self, value):
iterator = self._iter(value)
start = next(iterator)
for offset, val in enumerate(iterator):
if value == val[2]:
del self._list[start + offset]
return
def remove(self, value):
iterator = self._iter(value)
start = next(iterator)
for offset, val in enumerate(iterator):
if value == val[2]:
del self._list[start + offset]
return
raise ValueError
def __delitem__(self, index):
del self._list[index]
def __getitem__(self, index):
if isinstance(index, slice):
return list(tup[2] for tup in self._list[index])
else:
return self._list[index][2]
def __setitem__(self, index, value):
_key, _order = self._key, self._order
if isinstance(index, slice):
self._list[index] = list((_key(val), next(_order), val) for val in value)
else:
self._list[index] = (_key(value), next(_order), value)
def __iter__(self):
return iter(tup[2] for tup in iter(self._list))
def __reversed__(self):
return iter(tup[2] for tup in reversed(self._list))
def __len__(self):
return len(self._list)
def bisect_left(self, value):
pair = (self._key(value), 0, value)
return self._list.bisect_left(pair)
def bisect(self, value):
pair = (self._key(value), 0, value)
return self._list.bisect(pair)
def bisect_right(self, value):
pair = (self._key(value), next(self._order), value)
return self._list.bisect_right(pair)
def count(self, value):
iterator = self._iter(value)
next(iterator)
return sum(1 for val in iterator if val[2] == value)
def append(self, value):
pair = (self._key(value), next(self._order), value)
self._list.append(pair)
def extend(self, iterable):
_key, _order = self._key, self._order
self._list.extend((_key(val), next(_order), val) for val in iterable)
def insert(self, index, value):
pair = (self._key(value), next(self._order), value)
self._list.insert(index, pair)
def pop(self, index=-1):
return self._list.pop(index)[2]
def index(self, value, start=None, stop=None):
_len = self._list._len
if start == None:
start = 0
if start < 0:
start += _len
if start < 0:
start = 0
if stop == None:
stop = _len
if stop < 0:
stop += _len
if stop > _len:
stop = _len
if stop <= start:
raise ValueError
iterator = self._iter(value)
begin = next(iterator)
for offset, val in enumerate(iterator):
if value == val[2] and start <= (begin + offset) < stop:
return begin + offset
raise ValueError
def as_list(self):
return list(tup[2] for tup in self._list.as_list())
def __add__(self, that):
result = SortedListWithKey(key=self._key)
values = list(self)
values.extend(that)
result.update(values)
return result
def __iadd__(self, that):
self.update(that)
return self
def __mul__(self, that):
values = self.as_list() * that
return SortedListWithKey(values, key=self._key)
def __imul__(self, that):
values = self.as_list() * that
self.clear()
self.update(values)
return self
def __eq__(self, that):
return ((len(self) == len(that))
and all(lhs == rhs for lhs, rhs in zip(self, that)))
def __ne__(self, that):
return ((len(self) != len(that))
or any(lhs != rhs for lhs, rhs in zip(self, that)))
def __lt__(self, that):
return ((len(self) <= len(that))
and all(lhs < rhs for lhs, rhs in zip(self, that)))
def __le__(self, that):
return ((len(self) <= len(that))
and all(lhs <= rhs for lhs, rhs in zip(self, that)))
def __gt__(self, that):
return ((len(self) >= len(that))
and all(lhs > rhs for lhs, rhs in zip(self, that)))
def __ge__(self, that):
return ((len(self) >= len(that))
and all(lhs >= rhs for lhs, rhs in zip(self, that)))
def __repr__(self):
return 'SortedListWithKey({0}, key={1})'.format(
repr(self.as_list()), repr(self._key))
def _check(self):
self._list._check()
# Validate the keys.
for key, order, value in self._list:
assert self._key(value) == key
# Validate the orders.
if len(self) > 0:
iterator = iter(self._list)
prev_key, prev_order, value = next(iterator)
for key, order, value in iterator:
if prev_key == key:
assert prev_order < order
prev_key = key
prev_order = order |
No, I was using blist before (and it worked perfectly), but I wanted to move to something in pure Python, since compiling things is kinda horrible for Windows users. You are correct that blist stores tuples, but note that it uses My use case is: I have a sorted, prioritized list of "actions" which need to be done periodically. They're ordered by rank, but there may be ties. |
Ok, I think I've got something working. Can you try the master branch at 29958d5. I've added a boolean parameter, "value_orderable" which you should set to False for your purposes. When False, the key variation promises not to compare values. It works like blist in the sense that iteration is required over duplicate keys. And it works a little like your version in that I created a new type, Pair, for insertion. Unlike your pull request though, a Pair only compares keys and it plays nicely with tuple.getitem semantics. Performance seems reasonable although value_orderable=False takes a pretty big hit. I'd estimate somewhere around 2-10x slower than the w/out key version. The _iter function could be made faster depending on your data. I'm open to other ideas on how to implement it. Fortunately the SortedList type is really fast. Let me know if that works for you and if so, I'll release it in v0.8.3. |
Seems to work! Equal key iteration performance is not important for me yet, but how do you think |
@grantjenks When do you plan to unleash the glory of v0.8.3? My setup.py itches. |
Did it this morning. Thanks for the reminder. v0.8.4 will have a faster implementation so you may want to upgrade again in the near future. But at least you can switch to using PyPI for now. Thanks for using the module. Let me know if you catch any more bugs. |
v0.8.4 was deployed to PyPI today. It contains a faster implementation for SortedListWithKey in some scenarios and more robust testing. Please upgrade and let me know if it's working for you. Thanks! |
On Python 3.4.1:
This is because in Python 3 comparing incomparable types dies horribly instead of returning
True
.The text was updated successfully, but these errors were encountered: