Skip to content
This repository

searchsorted when both arrays are already sorted #47

Open
kwgoodman opened this Issue · 2 comments

3 participants

kwgoodman Tom Aldcroft ml31415
kwgoodman
Owner

Should we add a bn.searchsorted(arr1, arr2) function for the special case where both arr1 and arr2 are already sorted?

Here's a prototype: https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/fastss.pyx

Some issues to consider:

  • Add optional order ('left, 'right') input parameter?
  • Handle NaNs like np.searchsorted does?

For the origin of the idea, see https://groups.google.com/group/bottle-neck/browse_thread/thread/ec37c0e93d6d58cc

Tom Aldcroft

Another issue I thought about is supporting int-type array inputs. Currently only floats are allowed (falls through to np.searchsorted otherwise), and input arrays are cast to float64.

About NaNs, it seems like it would be somewhat uncommon to have a sorted input array with NaNs since they would all be piled up at the end. Handling NaN in the code would be pretty simple, it's just a question of whether reducing performance by doing this check for each element is worth it. The bottleneck team probably have a pretty good idea of how long NaN-checking takes.

ml31415

Having a "x == x" (the nan-check) in the code does generally only quite neglectable harm to the speed, as long as the result of the comparison is always the same. So if there are no nans in the data, there is no speed impact. The processors jump prediction will always be correct for those cases, and in case there are nans of course you would want to have that check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.