Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bisect.* functions accept an optional compare function #41873

Closed
mciura mannequin opened this issue Apr 18, 2005 · 7 comments
Closed

Make bisect.* functions accept an optional compare function #41873

mciura mannequin opened this issue Apr 18, 2005 · 7 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@mciura
Copy link
Mannequin

mciura mannequin commented Apr 18, 2005

BPO 1185383
Nosy @rhettinger

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2006-11-22.18:50:09.000>
created_at = <Date 2005-04-18.19:26:50.000>
labels = ['type-feature', 'library']
title = 'Make bisect.* functions accept an optional compare function'
updated_at = <Date 2006-11-22.18:50:09.000>
user = 'https://bugs.python.org/mciura'

bugs.python.org fields:

activity = <Date 2006-11-22.18:50:09.000>
actor = 'minmax'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2005-04-18.19:26:50.000>
creator = 'mciura'
dependencies = []
files = []
hgrepos = []
issue_num = 1185383
keywords = []
message_count = 7.0
messages = ['54448', '54449', '54450', '54451', '54452', '54453', '54454']
nosy_count = 4.0
nosy_names = ['rhettinger', 'mciura', 'jonaskoelker', 'minmax']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1185383'
versions = []

@mciura
Copy link
Mannequin Author

mciura mannequin commented Apr 18, 2005

The usability of bisect.bisect_right, bisect.bisect_left,
bisect.insort_right and bisect.insort_left would increase
if they accepted an optional less_than function to compare
the keys.

Here is my use case: I have a sorted list of reversed words
and a parallel list of flags associated with these words
(taken from ispell). The list of reversed words is the one
searched; the flags are the result of a search.

Issue #1: Now I cannot use simply a list of tuples
(rev_word,flags) and a custom less_than function that
compares only the first elements of two tuples.

Issue #2: When a word is not found in the list, I'd
like to make
an educated guess as to its flags (this makes more
sense in non-English languages, where e.g. infinitives
have a special ending), using bisect_left and bisect_right:

from bisect import *

less_than = lambda x,y: x[:3]<y[:3]
lp = bisect_left(word_list, given_rev_word, lt=less_than)
rp = bisect_right(word_list, given_rev_word, lt=less_than)
# return the union of flag_list[lp:rp]

An example (given_rev_word = 'abcpqr'):
word_list:
'abbx',
'abcaa', <- lp
'abcdd',
'abcss',
'abdf' <- rp

Currently, the first search could be replaced with
lp = bisect_left(word_list, given_rev_word[:3])
but I can see only non-nice ways to replace the other
search.

Rolling my own class that stores a word and its flags,
with __lt__ depending on some global setting is not
thread-safe, and I find such a solution too heavyweight.

I hope that I expressed myself clearly enough. If not, let
me know, and I'll try to clarify my point.

@mciura mciura mannequin closed this as completed Apr 18, 2005
@mciura mciura mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 18, 2005
@mciura mciura mannequin closed this as completed Apr 18, 2005
@mciura mciura mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 18, 2005
@rhettinger
Copy link
Contributor

Logged In: YES
user_id=80475

A few thoughts:

  • If bisect provided an optional lt argument, it would still
    not be thread-safe. The code for the lt function can call
    arbitrary Python code that runs through the eval-loop and is
    hence subject to a thread-switch.

  • An advantage of the class wrapper approach is that the
    prefix function gets computed just once per word. This is
    not a big deal for the specific case of [:3], but it could
    be a significant benefit for other functions that are
    expensive to compute (because repeated calls to bisect will
    access the lt function more than once).

  • My own approach to the particular use case would be to map
    prefixes to a list of revwords or (revword, flag) pairs:
    d = dict(abb=['abbc'], abc=['abcaa', 'abcdd', 'abcss'],
    abd=['abdf'])
    This gives O(1) lookup time and limits the calls to the
    prefix function to once per word.

Will leave this request open as it may yet be a good idea.
My concern is that it will clutter the code and the API for
only a small benefit.

Also, I'm looking at a more general purpose solution that
would make this feature request moot. This idea is to
create a fast comparison decorator class used like this:

   dwordlist = map(ComparisonDecorator(lambda x: x[:3]),
wordlist)
   lp = bisect_left(dwordlist, given_rev_word)
   rp = bisect_right(dwordlist, given_rev_word)

@jonaskoelker
Copy link
Mannequin

jonaskoelker mannequin commented Apr 28, 2005

Logged In: YES
user_id=1153395

In a similar vein, I'd like to see that a key' keyword argument was added to bisect.* (and perhaps reversed'
too)--it's really annoying to sort something by (not lt)
and not be able to bsearch it. It got added to min/max and
heapq.n(smallest|largest) in 2.5 fwiw ;)

@rhettinger
Copy link
Contributor

Logged In: YES
user_id=80475

Overall, I'm -1 on this RFE.

The comparison to nsmallest() and nlargest() is inaccurate.
They run start-to-finish in one function call. The other
heapq methods do not use key functions because they have to
leave the original data structure unmolested between calls;
hence, there is no ability to limit the key function calls
to one per record.

Likewise, with this request, the key function calls get
wasted. The sort() method calls key() for every record and
tosses the result afterwards. Each subsequect call to
bisect() would need to repeat those calls for a log(N)
subset of the data. Hence, accepting this request would
create an API that encourages a wasteful design.

@jonaskoelker
Copy link
Mannequin

jonaskoelker mannequin commented Jun 10, 2005

Logged In: YES
user_id=1153395

[...] but it could be a significant benefit for other
functions that are expensive to compute (because repeated
calls to bisect will access the lt function more than once).

Each subsequect call to bisect() would need to repeat
those calls for a log(N) subset of the data.

Which is exactly *why* I'm suggesting the key argument: to
save those extra calls to the key function.

Since that sounds counter-intuitive, let me explain: one
sorts (origkey(item), item) pairs, the bisects with
key=itemgetter(0), not calling the expensive origkey.

@rhettinger
Copy link
Contributor

Logged In: YES
user_id=80475

The whole purpose of the key= argument was to avoid having
to build (key, record) tuples. If those are going to be
built anyway, then there is little point to using key=.

Closing this because:

  1. the original use case was better addressed through other
    methods
  2. key= for bisect does not provide the same benefits as it
    does for other functions.
  3. the most recent proposed use is far afield from what key=
    is supposed to do
  4. I think it is bad design and would encourage misguided
    approaches.

@minmax
Copy link
Mannequin

minmax mannequin commented Nov 22, 2006

This really is annoying. If sort() accepts different comparison functions, so should bisect & co.

It makes things that should be easy, very hard.

If you are writting threaded programs, you know that functions passed to sort and bisect have to behave properly.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant