Skip to content

Commit

Permalink
bpo-4356: Add key function support to the bisect module (GH-20556)
Browse files Browse the repository at this point in the history
  • Loading branch information
rhettinger authored and adorilson committed Mar 11, 2021
1 parent 845d765 commit ac8ef7f
Show file tree
Hide file tree
Showing 7 changed files with 333 additions and 93 deletions.
118 changes: 90 additions & 28 deletions Doc/library/bisect.rst
Expand Up @@ -21,7 +21,7 @@ example of the algorithm (the boundary conditions are already right!).
The following functions are provided:


.. function:: bisect_left(a, x, lo=0, hi=len(a))
.. function:: bisect_left(a, x, lo=0, hi=len(a), *, key=None)

Locate the insertion point for *x* in *a* to maintain sorted order.
The parameters *lo* and *hi* may be used to specify a subset of the list
Expand All @@ -31,39 +31,106 @@ The following functions are provided:
parameter to ``list.insert()`` assuming that *a* is already sorted.

The returned insertion point *i* partitions the array *a* into two halves so
that ``all(val < x for val in a[lo:i])`` for the left side and
``all(val >= x for val in a[i:hi])`` for the right side.
that ``all(val < x for val in a[lo : i])`` for the left side and
``all(val >= x for val in a[i : hi])`` for the right side.

.. function:: bisect_right(a, x, lo=0, hi=len(a))
*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

.. versionchanged:: 3.10
Added the *key* parameter.


.. function:: bisect_right(a, x, lo=0, hi=len(a), *, key=None)
bisect(a, x, lo=0, hi=len(a))

Similar to :func:`bisect_left`, but returns an insertion point which comes
after (to the right of) any existing entries of *x* in *a*.

The returned insertion point *i* partitions the array *a* into two halves so
that ``all(val <= x for val in a[lo:i])`` for the left side and
``all(val > x for val in a[i:hi])`` for the right side.
that ``all(val <= x for val in a[lo : i])`` for the left side and
``all(val > x for val in a[i : hi])`` for the right side.

*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

.. versionchanged:: 3.10
Added the *key* parameter.


.. function:: insort_left(a, x, lo=0, hi=len(a))
.. function:: insort_left(a, x, lo=0, hi=len(a), *, key=None)

Insert *x* in *a* in sorted order. This is equivalent to
``a.insert(bisect.bisect_left(a, x, lo, hi), x)`` assuming that *a* is
already sorted. Keep in mind that the O(log n) search is dominated by
the slow O(n) insertion step.
Insert *x* in *a* in sorted order.

.. function:: insort_right(a, x, lo=0, hi=len(a))
*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

This function first runs :func:`bisect_left` to locate an insertion point.
Next, it runs the :meth:`insert` method on *a* to insert *x* at the
appropriate position to maintain sort order.

Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
insertion step.

.. versionchanged:: 3.10
Added the *key* parameter.


.. function:: insort_right(a, x, lo=0, hi=len(a), *, key=None)
insort(a, x, lo=0, hi=len(a))

Similar to :func:`insort_left`, but inserting *x* in *a* after any existing
entries of *x*.

*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

This function first runs :func:`bisect_right` to locate an insertion point.
Next, it runs the :meth:`insert` method on *a* to insert *x* at the
appropriate position to maintain sort order.

Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
insertion step.

.. versionchanged:: 3.10
Added the *key* parameter.


Performance Notes
-----------------

When writing time sensitive code using *bisect()* and *insort()*, keep these
thoughts in mind:

* Bisection is effective for searching ranges of values.
For locating specific values, dictionaries are more performant.

* The *insort()* functions are ``O(n)`` because the logarithmic search step
is dominated by the linear time insertion step.

* The search functions are stateless and discard key function results after
they are used. Consequently, if the search functions are used in a loop,
the key function may be called again and again on the same array elements.
If the key function isn't fast, consider wrapping it with
:func:`functools.cache` to avoid duplicate computations. Alternatively,
consider searching an array of precomputed keys to locate the insertion
point (as shown in the examples section below).

.. seealso::

`SortedCollection recipe
<https://code.activestate.com/recipes/577197-sortedcollection/>`_ that uses
bisect to build a full-featured collection class with straight-forward search
methods and support for a key-function. The keys are precomputed to save
unnecessary calls to the key function during searches.
* `Sorted Collections
<http://www.grantjenks.com/docs/sortedcollections/>`_ is a high performance
module that uses *bisect* to managed sorted collections of data.

* The `SortedCollection recipe
<https://code.activestate.com/recipes/577197-sortedcollection/>`_ uses
bisect to build a full-featured collection class with straight-forward search
methods and support for a key-function. The keys are precomputed to save
unnecessary calls to the key function during searches.


Searching Sorted Lists
Expand Down Expand Up @@ -110,8 +177,8 @@ lists::
raise ValueError


Other Examples
--------------
Examples
--------

.. _bisect-example:

Expand All @@ -127,17 +194,12 @@ a 'B', and so on::
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

Unlike the :func:`sorted` function, it does not make sense for the :func:`bisect`
functions to have *key* or *reversed* arguments because that would lead to an
inefficient design (successive calls to bisect functions would not "remember"
all of the previous key lookups).

Instead, it is better to search a list of precomputed keys to find the index
of the record in question::
One technique to avoid repeated calls to a key function is to search a list of
precomputed keys to find the index of a record::

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> keys = [r[1] for r in data] # precomputed list of keys
>>> data.sort(key=lambda r: r[1]) # Or use operator.itemgetter(1).
>>> keys = [r[1] for r in data] # Precompute a list of keys.
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
Expand Down
2 changes: 0 additions & 2 deletions Doc/tools/susp-ignored.csv
Expand Up @@ -111,8 +111,6 @@ howto/urllib2,,:password,"""joe:password@example.com"""
library/ast,,:upper,lower:upper
library/ast,,:step,lower:upper:step
library/audioop,,:ipos,"# factor = audioop.findfactor(in_test[ipos*2:ipos*2+len(out_test)],"
library/bisect,32,:hi,all(val >= x for val in a[i:hi])
library/bisect,42,:hi,all(val > x for val in a[i:hi])
library/configparser,,:home,my_dir: ${Common:home_dir}/twosheds
library/configparser,,:option,${section:option}
library/configparser,,:path,python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python}
Expand Down
66 changes: 48 additions & 18 deletions Lib/bisect.py
@@ -1,18 +1,22 @@
"""Bisection algorithms."""

def insort_right(a, x, lo=0, hi=None):

def insort_right(a, x, lo=0, hi=None, *, key=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the right of the rightmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""

lo = bisect_right(a, x, lo, hi)
if key is None:
lo = bisect_right(a, x, lo, hi)
else:
lo = bisect_right(a, key(x), lo, hi, key=key)
a.insert(lo, x)

def bisect_right(a, x, lo=0, hi=None):

def bisect_right(a, x, lo=0, hi=None, *, key=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e <= x, and all e in
Expand All @@ -27,14 +31,26 @@ def bisect_right(a, x, lo=0, hi=None):
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if x < a[mid]: hi = mid
else: lo = mid+1
# Note, the comparison uses "<" to match the
# __lt__() logic in list.sort() and in heapq.
if key is None:
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
else:
lo = mid + 1
else:
while lo < hi:
mid = (lo + hi) // 2
if x < key(a[mid]):
hi = mid
else:
lo = mid + 1
return lo

def insort_left(a, x, lo=0, hi=None):

def insort_left(a, x, lo=0, hi=None, *, key=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the left of the leftmost x.
Expand All @@ -43,11 +59,13 @@ def insort_left(a, x, lo=0, hi=None):
slice of a to be searched.
"""

lo = bisect_left(a, x, lo, hi)
if key is None:
lo = bisect_left(a, x, lo, hi)
else:
lo = bisect_left(a, key(x), lo, hi, key=key)
a.insert(lo, x)


def bisect_left(a, x, lo=0, hi=None):
def bisect_left(a, x, lo=0, hi=None, *, key=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
Expand All @@ -62,13 +80,25 @@ def bisect_left(a, x, lo=0, hi=None):
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if a[mid] < x: lo = mid+1
else: hi = mid
# Note, the comparison uses "<" to match the
# __lt__() logic in list.sort() and in heapq.
if key is None:
while lo < hi:
mid = (lo + hi) // 2
if a[mid] < x:
lo = mid + 1
else:
hi = mid
else:
while lo < hi:
mid = (lo + hi) // 2
if key(a[mid]) < x:
lo = mid + 1
else:
hi = mid
return lo


# Overwrite above definitions with a fast C implementation
try:
from _bisect import *
Expand Down
57 changes: 57 additions & 0 deletions Lib/test/test_bisect.py
Expand Up @@ -200,6 +200,63 @@ def test_keyword_args(self):
self.module.insort(a=data, x=25, lo=1, hi=3)
self.assertEqual(data, [10, 20, 25, 25, 25, 30, 40, 50])

def test_lookups_with_key_function(self):
mod = self.module

# Invariant: Index with a keyfunc on an array
# should match the index on an array where
# key function has already been applied.

keyfunc = abs
arr = sorted([2, -4, 6, 8, -10], key=keyfunc)
precomputed_arr = list(map(keyfunc, arr))
for x in precomputed_arr:
self.assertEqual(
mod.bisect_left(arr, x, key=keyfunc),
mod.bisect_left(precomputed_arr, x)
)
self.assertEqual(
mod.bisect_right(arr, x, key=keyfunc),
mod.bisect_right(precomputed_arr, x)
)

keyfunc = str.casefold
arr = sorted('aBcDeEfgHhiIiij', key=keyfunc)
precomputed_arr = list(map(keyfunc, arr))
for x in precomputed_arr:
self.assertEqual(
mod.bisect_left(arr, x, key=keyfunc),
mod.bisect_left(precomputed_arr, x)
)
self.assertEqual(
mod.bisect_right(arr, x, key=keyfunc),
mod.bisect_right(precomputed_arr, x)
)

def test_insort(self):
from random import shuffle
mod = self.module

# Invariant: As random elements are inserted in
# a target list, the targetlist remains sorted.
keyfunc = abs
data = list(range(-10, 11)) + list(range(-20, 20, 2))
shuffle(data)
target = []
for x in data:
mod.insort_left(target, x, key=keyfunc)
self.assertEqual(
sorted(target, key=keyfunc),
target
)
target = []
for x in data:
mod.insort_right(target, x, key=keyfunc)
self.assertEqual(
sorted(target, key=keyfunc),
target
)

class TestBisectPython(TestBisect, unittest.TestCase):
module = py_bisect

Expand Down
@@ -0,0 +1 @@
Add a key function to the bisect module.

0 comments on commit ac8ef7f

Please sign in to comment.