searchsorted when both arrays are already sorted #47

Closed
kwgoodman opened this Issue Jun 4, 2012 · 3 comments

Comments

Projects
None yet
3 participants
@kwgoodman
Owner

kwgoodman commented Jun 4, 2012

Should we add a bn.searchsorted(arr1, arr2) function for the special case where both arr1 and arr2 are already sorted?

Here's a prototype: https://github.com/sot/Ska.Numpy/blob/master/Ska/Numpy/fastss.pyx

Some issues to consider:

  • Add optional order ('left, 'right') input parameter?
  • Handle NaNs like np.searchsorted does?

For the origin of the idea, see https://groups.google.com/group/bottle-neck/browse_thread/thread/ec37c0e93d6d58cc

@taldcroft

This comment has been minimized.

Show comment Hide comment
@taldcroft

taldcroft Jun 4, 2012

Another issue I thought about is supporting int-type array inputs. Currently only floats are allowed (falls through to np.searchsorted otherwise), and input arrays are cast to float64.

About NaNs, it seems like it would be somewhat uncommon to have a sorted input array with NaNs since they would all be piled up at the end. Handling NaN in the code would be pretty simple, it's just a question of whether reducing performance by doing this check for each element is worth it. The bottleneck team probably have a pretty good idea of how long NaN-checking takes.

Another issue I thought about is supporting int-type array inputs. Currently only floats are allowed (falls through to np.searchsorted otherwise), and input arrays are cast to float64.

About NaNs, it seems like it would be somewhat uncommon to have a sorted input array with NaNs since they would all be piled up at the end. Handling NaN in the code would be pretty simple, it's just a question of whether reducing performance by doing this check for each element is worth it. The bottleneck team probably have a pretty good idea of how long NaN-checking takes.

@ml31415

This comment has been minimized.

Show comment Hide comment
@ml31415

ml31415 Mar 5, 2013

Having a "x == x" (the nan-check) in the code does generally only quite neglectable harm to the speed, as long as the result of the comparison is always the same. So if there are no nans in the data, there is no speed impact. The processors jump prediction will always be correct for those cases, and in case there are nans of course you would want to have that check.

ml31415 commented Mar 5, 2013

Having a "x == x" (the nan-check) in the code does generally only quite neglectable harm to the speed, as long as the result of the comparison is always the same. So if there are no nans in the data, there is no speed impact. The processors jump prediction will always be correct for those cases, and in case there are nans of course you would want to have that check.

@kwgoodman

This comment has been minimized.

Show comment Hide comment
@kwgoodman

kwgoodman May 21, 2016

Owner

This is a neat idea but I'm closing for now.

Owner

kwgoodman commented May 21, 2016

This is a neat idea but I'm closing for now.

@kwgoodman kwgoodman closed this May 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment