Improve searchsorted #3107

MrBago · 2013-03-01T23:02:32Z

This pull request allows numpy.searchsorted to search non-contigious arrays without making a copy and it allows numpy.searchsorted to search arrays sorted in descending order. This is my first numpy pull request to let me know if I'm doing something wrong in terms of the code or in terms of requesting a PR.

njsmith · 2013-03-01T23:13:24Z

numpy/core/src/multiarray/item_selection.c

@@ -1505,15 +1505,22 @@
    char *parr = PyArray_DATA(arr);
    char *pkey = PyArray_DATA(key);
    npy_intp *pret = (npy_intp *)PyArray_DATA(ret);
-    int elsize = PyArray_DESCR(arr)->elsize;
+    int elsize = PyArray_DESCR(key)->elsize;


What's the reason for this change?

Mostly for clarity. After these changes, elsize is only used to increment the pkey pointer. So I figured just to be consistent I should look up elsize from the key array (even though both arrays should be of the same type).

Oh, we still assume that the key array is contiguous? Is there a reason for that, then? :-)

Two reasons. First my naivete, because the key array could have any shape and any strides I didn't know if there was an easy way to handle that in this context.

The second is that it didn't seem to me that copying the key array was as big of a problem. searchsorted is linear in the size of the key array so making a temporary copy is a relatively small constant multiplier to the run time. Because searchsorted is log(N) in the size of the array being search, making a linear time copy of that array can dominate the run time for large arrays.

Can use PyArray_ITEMSIZE to get elsize.

Heh, and I usually have big arrays of keys and small arrays to look them up in. But in general I agree that the lookup times will dominate the key retrieval time.

Is there a way to step through an array with arbitrary strides that I could
use here for the key array? We could at least stop copying read only
arrays, as @njsmith noted NPY_ARRAY_DEFAULT checks for weight only arrays.
I could include that in this pull request.
On Mar 1, 2013 5:38 PM, "Charles Harris" notifications@github.com wrote:

In numpy/core/src/multiarray/item_selection.c:

@@ -1505,15 +1505,22 @@
char *parr = PyArray_DATA(arr);
char *pkey = PyArray_DATA(key);
npy_intp *pret = (npy_intp *)PyArray_DATA(ret);

int elsize = PyArray_DESCR(arr)->elsize;

int elsize = PyArray_DESCR(key)->elsize;

Heh, and I usually have big arrays of keys and small arrays to look them
up in. But in general I agree that the lookup times will dominate the key
retrieval time.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3107/files#r3213822
.

Using nditer would be the way to go for that, but it would add some overhead.

MrBago · 2013-05-17T18:03:17Z

I've updated the pull request so that searchsorted now takes a sortorder keyword arg instead of auto-detecting the sort order. The keyword arg prevents the ambiguity that would otherwise arise when the user's intentions cannot be automatically determined. I've copied the machinery for the side keyword to add the sortorder keyword, please let me know if you think there is a cleaner way to do this.

charris · 2013-05-18T22:12:36Z

I'm a bit concerned that searching non-contiquous arrays might be slower than making a copy in some cases due to cache effects, at least for medium sized arrays. Have you done any benchmarks? Also, what use case do you have in mind for arrays in descending order? Just wondering how common they might be in practice.

juliantaylor · 2013-05-30T21:55:14Z

I never used searchsorted, but can't you just make a view with stride sign reversed and subtract the result from shape to get descending order?

 np.asarray(d.shape) - d[::-1].searchsorted([1, 3]) - 1

does the sort order need to be a string?
there are only two orders so one could use reverse=True/False like python

MrBago · 2013-05-31T02:02:46Z

@juliantaylor That's how this whole think kind of started. The current implementation in master will copy d when you do d[::-1].searchsorted([1, 3]) (unless d happens to have stride of -1). Once I dug into the code to figure out what was going on, I noticed that modifying it to search descending arrays was trivial so I thought I'd give it a go.

@charris Here is a quick summery of my use case. I tried searching my array the way @juliantaylor suggested, but was surprised that searching an array for a single key is ~10 times slower when the array is not contiguous. Once I figured out what was going on, it wasn't hard to come up with a solution. I just did something like this to avoid many copies:

def dist_from_right(array, value):
    return len(array) - array.searchsorted(value)

array = make_array()
search_array = array[::-1].copy(order='C')

while something:
    ...
    dist_from_right(search_array, value)
    ...

The above code is kind of ugly, but it works. You're right to be concerned about cache effects, I've put a few benchmarks into a gist here, https://gist.github.com/MrBago/7d832248499596356039. These are the results I get on my machine:

master:
contiguous, one key:  0.0932681560516
non-contiguous, one key:  0.780670881271
large stride, one key:  2.27084803581
contiguous, many keys:  1.05450797081
non-contiguous, many keys:  1.07213401794
large stride, many keys:  1.1610429287

improve_searchsorted:
contiguous, one key:  0.0903561115265
non-contiguous, one key:  0.0925250053406
large stride, one key:  0.0927810668945
contiguous, many keys:  1.05225515366
non-contiguous, many keys:  1.06229496002
large stride, many keys:  1.43884396553

With a sufficiently large number of keys and a sufficiently large stride on the array, it's faster to copy before searching.

At this point I've become somewhat ambivalent about this change. I think the 'descending' keyword feels kind of icky. We could just merge the non-copy part of the PR, but there's the trade off between my use case and the one that @charris brought up. We could also just merge the descending part (maybe those should have been separate pull requests). Anyways, let me know what you guys want to do with this.

juliantaylor · 2013-06-02T19:01:42Z

so far I can see this pull fixes the issue with the reversed array copy.
do we still need the searchorder addition then?

it could at least be split into two pull requests as the strided support seems like a + in all cases to me, whereas the new api is debatable.

juliantaylor · 2013-07-09T22:08:34Z

@MrBago do you still want to follow up on this?

MrBago · 2013-07-10T04:32:46Z

I think the searchsorted change is probably not necessary, I can break of the array copy code and make a separate pull request for that if you guys it's an improvement.

seberg · 2013-07-10T08:46:52Z

The no-copy stuff seems like a very good idea to me. Now charris is right of course in case of a huge needle and small haystack it makes sense to do the copy anyway. I see you already did some timings, if you have some time, maybe it would be possible to define some simple heuristic for switching between modes when the needle is much larger then the haystack? At some point the copy will not be noticable anyway...

MrBago · 2013-07-12T07:48:12Z

Simple heuristic? How about size(needle) > size(haystack)? Give me a few days to see if I can think of anything better and write a few benchmarks, then I'll make a new pull request. And since we're on the topic, can you guys think of any good reason to ever copy the haystack? Would iterating over the haystack with the NpyIter machinery be a worthwhile addition the the code? If so I could also try and have a go at that.

seberg · 2013-07-12T09:35:19Z

Good enough. It might be more something like size_needle * log(size_haystack) > constant * size_haystack, but it probably doesn't matter much, since I bet the speed difference is neglegible in most cases anyway. However, it would be nice to have the fast extreme case.

MrBago · 2013-07-12T12:28:28Z

Any thoughts on using NpyIter to loop over the needle, (in my last post I said haystack, I meant needle). Each item in the needle only gets used once so I can't think of a situation where it would be advantageous to copy it.

seberg · 2013-07-12T13:00:09Z

Ah, missed that part. Might add a slight overhead for very small arrays, but overall it should make sense. You could then have the iterator handle the dtype conversion and allocation of the result array as well, so it may even make the code more concise. So if you like to have a look at it, personally I like to see more of the new iterator being used. But it probably doesn't make a big difference, so whichever you like, is great.

charris · 2013-08-04T15:35:32Z

What was the rason for the closure?

MrBago · 2013-08-04T20:34:29Z

I closed this because I accidentally updated this branch while I was still playing with stuff, I'm reopening it now.

Sorry it took me a while to get to this, but here i the current status. This pull request as is allows haystack.searchsorted(needle) to search a strided haystack when haytack is smaller than needle. If needle is larger than haystack the latter is copied for better cache performance. I spent some to see if there was a better way to determine where to switch between the two behaviors, and I found that the interaction between the size of the haystack, needle, strides and datatypes was pretty complex, (not to mention that any constants involved will likely be platform dependent). I'm open to implementing another switching rule, but I can't think of a better one.

I also spent some time implementing searchsorted with NpyIter, but wasn't sure whether to include that here because it pretty much doubles the overhead of searchsorted for small arrays. The performance for large arrays, as far as I can tell, is unchanged. I can easily update this pull request if you guys think it's worth the change despite the hit to small array. Or I can open a new PR so we can discuss it separately. The changes are bellow if anyone wants to take a look.

MrBago/numpy@improve_searchsorted...use_npyiter_in_searchsorted

charris · 2013-08-16T01:23:39Z

The drop of performance in small arrays is something we should look into. Sounds like it is best to leave that change out for the time being.

@njsmith @seberg Are you good to merge this?

juliantaylor · 2013-08-16T07:49:08Z

numpy/core/src/multiarray/item_selection.c

-                                NPY_ARRAY_DEFAULT | NPY_ARRAY_NOTSWAPPED,
-                                NULL);
-    if (ap2 == NULL) {
+    if (PyArray_SIZE(ap2) > PyArray_SIZE(op1)) {


I assume this is the switching logic for cache performance, this needs in source comment on why its done.

juliantaylor · 2013-08-16T07:55:55Z

the searching uses a function pointer for the comparison, has someone tested how much impact this indirection compared to an inline compare for basic types has?
may be worthwhile to look at that for 1.9

charris · 2013-08-16T21:42:40Z

@juliantaylor BTW, you now have commit rights. If you think that's like having bedbugs, let me know and I'll get rid of them.

charris · 2013-08-16T22:29:02Z

@MrBago Almost there, I think. The 1.8.0 branch is Aug 18.

MrBago · 2013-08-16T23:29:52Z

@juliantaylor I updated the comments, let me know if there is anything else. @charris Aug 18th is very soon, will this make it into 1.8?

charris · 2013-08-16T23:58:15Z

Ping travibot.

@MrBago I think your chances are good.

juliantaylor · 2013-08-17T10:11:37Z

can you please rebase against master, that simplifies bisection with the submodules which were added after this pull was opened.

juliantaylor · 2013-08-17T10:18:03Z

numpy/core/src/multiarray/item_selection.c

-                                NPY_ARRAY_DEFAULT | NPY_ARRAY_NOTSWAPPED,
-                                NULL);
-    if (ap2 == NULL) {
+    /* Even though the array to be searched does not need to be contiguous,


the numpy multiline comment style is

/* * comment */

I think it would be easier if the text would describe the code lines, as the variables are not named very well (maybe rename to needle, haystack?)
e.g.
If the needle (ap2) is larger than the haystack (op1) we copy the haystack to a continuous array for improved cache utilization.

juliantaylor · 2013-08-17T10:22:03Z

I found this odd behavior via a google search:

In [1]: import numpy as np
In [2]: d = np.arange(10001, dtype=np.uint64)
In [3]: %timeit np.searchsorted(d, 3583)
10000 loops, best of 3: 124 µs per loop
In [4]: %timeit np.searchsorted(d, np.uint64(3583))
10000 loops, best of 3: 19.2 µs per loop

It appears the python integer causes the haystack to be copied to a double array even it it would fit into uint64.
Do you want to have a go at fixing that too?
maybe after this PR has been merged.

seberg · 2013-08-17T10:46:03Z

@juliantaylor, certainly nothing for this PR, but I don't think there is anything to be done there. I did not check, but it should make an np.int_ which is to be expected. Otherwise the conversion function would have to do value inspection and decide that it can be a uint too, and it would need to know that a uint would be preferable.

seberg · 2013-08-17T10:59:22Z

Ah, so I guess it is cast to float because it is the only safe type to compare the two. It may be possible to do something for the scalar case, since np.result_type() actually suggest casting to uint then (by inspecting the integer value). Not sure if it is worth trying to do anything, since this kind of mechanism only works for scalars anyway?

charris · 2013-08-17T13:43:31Z

@MrBago If you do rebase, you should update your master first.

…uous arrays

…he haystack

… code.

charris · 2013-08-17T17:34:10Z

OK, in it goes. Thanks.

Improve searchsorted

njsmith reviewed Mar 1, 2013
View reviewed changes

MrBago closed this Aug 4, 2013

MrBago reopened this Aug 4, 2013

juliantaylor reviewed Aug 16, 2013
View reviewed changes

charris closed this Aug 16, 2013

charris reopened this Aug 16, 2013

juliantaylor reviewed Aug 17, 2013
View reviewed changes

MrBago added 5 commits August 17, 2013 09:27

RF - allow ndarray.searchsorted to avoid making a copy for non-contig…

77b3df0

…uous arrays

TST - add testing of searchsorted on unaligned arrays

ebcae1b

BF - cast psorter pointer to intp

a06b800

RF - have searchsorted copy haystack when the needle is bigger than t…

bb925ed

…he haystack

MAINT - update comments related to searchsorted to be consistent with…

4674b9e

… code.

charris added a commit that referenced this pull request Aug 17, 2013

Merge pull request #3107 from MrBago/improve_searchsorted

aeb13af

Improve searchsorted

charris merged commit aeb13af into numpy:master Aug 17, 2013

Improve searchsorted #3107

Improve searchsorted #3107

Conversation

MrBago commented Mar 1, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrBago commented May 17, 2013

charris commented May 18, 2013

juliantaylor commented May 30, 2013

MrBago commented May 31, 2013

juliantaylor commented Jun 2, 2013

juliantaylor commented Jul 9, 2013

MrBago commented Jul 10, 2013

seberg commented Jul 10, 2013

MrBago commented Jul 12, 2013

seberg commented Jul 12, 2013

MrBago commented Jul 12, 2013

seberg commented Jul 12, 2013

charris commented Aug 4, 2013

MrBago commented Aug 4, 2013

charris commented Aug 16, 2013

Choose a reason for hiding this comment

juliantaylor commented Aug 16, 2013

charris commented Aug 16, 2013

charris commented Aug 16, 2013

MrBago commented Aug 16, 2013

charris commented Aug 16, 2013

juliantaylor commented Aug 17, 2013

Choose a reason for hiding this comment

juliantaylor commented Aug 17, 2013

seberg commented Aug 17, 2013

seberg commented Aug 17, 2013

charris commented Aug 17, 2013

charris commented Aug 17, 2013