-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "key" argument to "bisect" module functions #48606
Comments
It'd be helpful of the functions in the "bisect" modules will have a |
Miki, the issue is that bisect calls tend to be made repeatedly, so the |
What about cases where performance is unimportant, or where the key bisect(a, x, key=attrgetter('size')) is easy to write and read. Mightn't this be considered good design, Another thought: if your list is a list of user-defined objects then a Disclaimer: I haven't personally had any need for a key argument on |
I had said "almost always". Sure, if you don't care about performance We're responsible for creating an API that steers most programmers in For user-defined objects, there is no need for a key-attribute since can class UserDefined:
. . .
def cmp(self, other):
return cmp(self.key, other.key) |
One case I've been thinking about is that of maintaining a list of Decimal Perhaps this is a contrived use case, but it doesn't seem totally |
I agree you can get around this with defining __cmp__, however same goes My take on it is that sometimes I need to find something in a big list
I'd prefer if we do implement "key" and add a warning in the docs it I don't see why the "key" function should be called all the time on def bisect(a, x, lo=0, hi=None, key=lambda x: x):
assert low >= 0, "low must be non-negative"
hi = hi or len(a)
x_key = key(x)
while lo < hi:
mid = (lo+hi)//2
if x_key < key(a[mid]): hi = mid
else: lo = mid+1
return lo (I'd also wish for "identity" built in, however this is another subject :) |
Just a reminder that __cmp__ is gone in 3.0. |
I've been bugged by the lack of key= argument for bisect for some time
I was writing a module that keeps a sorted list of items in which items I see the following as reasons why key= would provide benefit:
I've implemented key= and quickly hacked a benchmark to show what
Also, I tried to bench linear searches, but as they had to run in Python In short, key= for bisect would be convenient and neat, really useful in I attach a patch implementing it for trunk, py3k, and the benchmark |
This was closed over a year ago, but since mark.dickinson was asking for convincing use-cases: I'm breaking up a file into line-delimited chunks. These chunks are non-overlapping, contiguous, and tend to be fairly large, so I'm just recording the start line of each chunk in a 2-ple: mapping = [
(10, 'first chunk'),
(50, 'second chunk'),
(60, 'third chunk')
] Lines 10-49 are in the first chunk, lines 50-59 are in the second, lines 60+ are in the third. So: def CategorizeLine(line, mapping):
loc = bisect.bisect([m[0] for m in mapping], line)
if loc == 0:
return None # before first chunk
return mapping[loc-1][1] It Would Be Nice if I could write the second line as: loc = bisect.bisect(mapping, line, key=lambda m:m[0]) The bisect documentation suggests pre-computing the key list, but it seems messy and error-prone to keep a redundant data structure in sync with its source. I could also rewrite my "mapping" data structure to be two parallel lists instead of one list of 2-ples, but this data structure is more readable and extensible and handles empty lists more naturally. |
I'll take another look at this one. |
For what it's worth, after I posted my comment, I realized I could use tuple comparison semantics: loc = bisect.bisect(mapping, (line,)) since my key happens to be at index 0. "key=" would still be nice. |
What would you guys think about adding a class that uses bisect internally so that the keyfunc() never gets called more than once per key? This could be added to the docs as a cut-and-pastable recipe or it could be added to the module directly: sd = SortedCollection(iterable, key=keyfunc)
s[i] --> getitem at position i
len(s) --> length
sd.insert_left(item) --> None
sd.insert_right(item) --> None
sd.find_left(key) --> item
sd.find_right(key) --> item
sd.keyfunc --> the key function
list(sd) --> sorted list of items
list(reversed(sd)) --> reverse sorted list of items
s.clear() The implementation would follow this pattern: def insert_left(self, item):
key = self._keyfunc(item)
i = bisect_left(self._keys, key)
self._keys.insert(i, key)
self._items.insert(i, item)
def find_left(self, key):
i = bisect_left(self._keys, key)
return self._items[i]
def __iter__(self):
return iter(self._items)
def __getitem__(self, i):
return self._items[i] |
Attaching a draft of a sorted collection class. Any comments? |
Looks nice to me, however I wonder if there isn't some overlap with the requests to add a key= kwarg to heapq methods (and the discussion about adding a Heap class there). |
Added a link to the a SortedCollections recipe, added example of how to do searches, and noted the O(n) performance of the insort() functions. See r83786 |
This issue came up on #python IRC, and that combined with the number of times this has been duplicated makes me think that maybe the mention of the SortedCollection recipe should be a little more prominent. Perhaps either moved up by the method list, or something like: "Note: No key argument is available because of performance issues. Please consider either the SortedCollection recipe or altering the objects comparison methods. ? It just seems like it keeps coming up, and in this case came up after the recipe reference was added. |
Thanks for the suggestion. Will look at moving the note higher on the page. |
Fixed in r84383. |
On python-ideas, thread "Optional key to |
I've worked around this in 2.6/2.7 like this: class Arr:
def __getitem__(self, i):
return foo(i) # your key function
def __len__(self):
return 10000000 # your max index value bisect.bisect(Arr(), value, ...) |
Use case: a custom immutable array with a large number of items and indirect key field access. For example ctypes.array, memoryview or ctypes.pointer or any other custom container.
|
In the meantime here is a workaround https://gist.github.com/ericremoreynolds/2d80300dabc70eebc790 |
I've just encountered another case where the lack of a key on bisect has led to more complicated and error-prone code than necessary. Quick summary: we've got a list containing an ordered collection of non-overlapping open intervals on the real line. (In case anyone wants to know, the intervals represent the depths of chunks of interest in a rock core sample.) There's then code for splitting an interval into two pieces at a given point, and for merging two adjacent intervals. Splitting involves (i) finding the relevant interval using a bisect search based on the left endpoint of each interval, then (ii) replacing that interval with two new intervals in the list. The fact that the list is being modified after every bisect search makes it messy to cache the left endpoints, since that cache has to be updated along with the list at every stage. The operations are also relatively rare, which makes it feel inefficient to be computing *all* the left endpoints of the intervals in the first place. Adding comparisons to our interval class is doable, but invasive and unnatural, since the actual class carries other pieces of data and there are many possible meanings for So I think there's a strong case for a key argument in some future version of Python. [Of course, for our app we're on Python 2.7, so this issue won't help us directly. We're probably going to go with either reimplementing bisect or using a proxy array like the one suggested by Eric Reynolds.] |
I'll add a key= variant for Python 3.6. |
I'm also looking for bisect with a key argument (numpy array of records, want to search on one element). However, I don't see bisect in the what's new: https://docs.python.org/3.6/whatsnew/3.6.html ? Any luck with the implementation? |
obviously didn't make it in 3.6 but this still seems desirable. I just saw someone at work propose a trivial port of golang's sort.Search - https://golang.org/pkg/sort/#Search - in Python which caused me to hunt for an issue on bisect key= support. |
Hi everybody, I opened PR 11781 to add a key argument to functions in the bisect As far as I can tell, the key function is at called at most once per item as this import bisect
from collections import defaultdict
class Test:
def __init__(self, value):
self.value = value
cache = defaultdict(int)
def key(e):
cache[e] += 1
assert cache[e] <= 1
return e.value
l = [Test(i) for i in range(10000)]
➜ cpython git:(add-key-argument-to-bisect) ./python.exe
Python 3.8.0a1+ (heads/add-key-argument-to-bisect:b7aaa1adad, Feb 7 2019, 17:33:24)
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bisect
>>> from collections import defaultdict
>>>
>>>
>>> class Test:
... def __init__(self, value):
... self.value = value
...
>>>
>>> cache = defaultdict(int)
>>>
>>> def key(e):
... cache[e] += 1
... assert cache[e] <= 1
... return e.value
...
>>>
>>> l = [Test(i) for i in range(10000)]
>>>
>>> bisect.bisect(l, Test(25), key=key)
26 This argument can be used where the objects are immutable and I have not been able (cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ make distclean && ./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json (cd90f6a was somtime faster than b7aaa1a, sometime slower but they always As I wrote in the discussion of the PR, I suspect the branch predictor to predict For the record, here is the performance when a key function is given: (cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25, key=lambda e: e)" ......................................... It seems to me that adding the key parameter is the best solution possible. |
Hi, there has been renewed interest from @alexchamberlain and @f6v on GitHub for this feature. Would it be possible to get the patch reviewed so we can add it in 3.8? |
I'll look at it this weekend. |
Can anyone add "reverse" support? Key and reverse support are both functional requirement. |
I think it could be done with |
The problem with |
Can you not use functools.cmp_to_key() for this? Here's an example: >>> import bisect, functools
>>> l = [('f', 5), ('d', 3), ('c', 2), ('b', 1), ('a', 0)]
>>> def cmp(a, b):
... if a > b: return -1
... if a < b: return 1
... return 0
...
>>> bisect.bisect(l, ('e', 4), key=functools.cmp_to_key(cmp))
1
>>> l
[('f', 5), ('d', 3), ('c', 2), ('b', 1), ('a', 0)]
>>> bisect.insort(l, ('e', 4), key=functools.cmp_to_key(cmp))
>>> l
[('f', 5), ('e', 4), ('d', 3), ('c', 2), ('b', 1), ('a', 0)] |
I opened a relevant PR, #11781. I believe a key parameter is inferior to a comparison callback. The former is a specific case of the latter, and in my use case would force me to create another class to serve as a comparator for my objects, at which point I might as well wrap them and add __lt__. Furthermore, I believe this is more in-line with similar standard functions in other languages such as C++ (std::sort), Java (PriorityQueue) or Rust (slice.sort_by). |
I did, thanks! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: