New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve searchsorted #3107
Improve searchsorted #3107
Conversation
@@ -1505,15 +1505,22 @@ | |||
char *parr = PyArray_DATA(arr); | |||
char *pkey = PyArray_DATA(key); | |||
npy_intp *pret = (npy_intp *)PyArray_DATA(ret); | |||
int elsize = PyArray_DESCR(arr)->elsize; | |||
int elsize = PyArray_DESCR(key)->elsize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly for clarity. After these changes, elsize is only used to increment the pkey pointer. So I figured just to be consistent I should look up elsize from the key array (even though both arrays should be of the same type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we still assume that the key
array is contiguous? Is there a reason for that, then? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons. First my naivete, because the key array could have any shape and any strides I didn't know if there was an easy way to handle that in this context.
The second is that it didn't seem to me that copying the key array was as big of a problem. searchsorted is linear in the size of the key array so making a temporary copy is a relatively small constant multiplier to the run time. Because searchsorted is log(N) in the size of the array being search, making a linear time copy of that array can dominate the run time for large arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use PyArray_ITEMSIZE
to get elsize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, and I usually have big arrays of keys and small arrays to look them up in. But in general I agree that the lookup times will dominate the key retrieval time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to step through an array with arbitrary strides that I could
use here for the key array? We could at least stop copying read only
arrays, as @njsmith noted NPY_ARRAY_DEFAULT checks for weight only arrays.
I could include that in this pull request.
On Mar 1, 2013 5:38 PM, "Charles Harris" notifications@github.com wrote:
In numpy/core/src/multiarray/item_selection.c:
@@ -1505,15 +1505,22 @@
char *parr = PyArray_DATA(arr);
char *pkey = PyArray_DATA(key);
npy_intp *pret = (npy_intp *)PyArray_DATA(ret);
- int elsize = PyArray_DESCR(arr)->elsize;
- int elsize = PyArray_DESCR(key)->elsize;
Heh, and I usually have big arrays of keys and small arrays to look them
up in. But in general I agree that the lookup times will dominate the key
retrieval time.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3107/files#r3213822
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using nditer would be the way to go for that, but it would add some overhead.
I've updated the pull request so that searchsorted now takes a sortorder keyword arg instead of auto-detecting the sort order. The keyword arg prevents the ambiguity that would otherwise arise when the user's intentions cannot be automatically determined. I've copied the machinery for the side keyword to add the sortorder keyword, please let me know if you think there is a cleaner way to do this. |
I'm a bit concerned that searching non-contiquous arrays might be slower than making a copy in some cases due to cache effects, at least for medium sized arrays. Have you done any benchmarks? Also, what use case do you have in mind for arrays in descending order? Just wondering how common they might be in practice. |
I never used searchsorted, but can't you just make a view with stride sign reversed and subtract the result from shape to get descending order?
does the sort order need to be a string? |
@juliantaylor That's how this whole think kind of started. The current implementation in master will copy @charris Here is a quick summery of my use case. I tried searching my array the way @juliantaylor suggested, but was surprised that searching an array for a single key is ~10 times slower when the array is not contiguous. Once I figured out what was going on, it wasn't hard to come up with a solution. I just did something like this to avoid many copies: def dist_from_right(array, value):
return len(array) - array.searchsorted(value)
array = make_array()
search_array = array[::-1].copy(order='C')
while something:
...
dist_from_right(search_array, value)
... The above code is kind of ugly, but it works. You're right to be concerned about cache effects, I've put a few benchmarks into a gist here, https://gist.github.com/MrBago/7d832248499596356039. These are the results I get on my machine:
With a sufficiently large number of keys and a sufficiently large stride on the array, it's faster to copy before searching. At this point I've become somewhat ambivalent about this change. I think the 'descending' keyword feels kind of icky. We could just merge the non-copy part of the PR, but there's the trade off between my use case and the one that @charris brought up. We could also just merge the descending part (maybe those should have been separate pull requests). Anyways, let me know what you guys want to do with this. |
so far I can see this pull fixes the issue with the reversed array copy. it could at least be split into two pull requests as the strided support seems like a + in all cases to me, whereas the new api is debatable. |
@MrBago do you still want to follow up on this? |
I think the searchsorted change is probably not necessary, I can break of the array copy code and make a separate pull request for that if you guys it's an improvement. |
The no-copy stuff seems like a very good idea to me. Now charris is right of course in case of a huge needle and small haystack it makes sense to do the copy anyway. I see you already did some timings, if you have some time, maybe it would be possible to define some simple heuristic for switching between modes when the needle is much larger then the haystack? At some point the copy will not be noticable anyway... |
Simple heuristic? How about |
Good enough. It might be more something like |
Any thoughts on using NpyIter to loop over the needle, (in my last post I said haystack, I meant needle). Each item in the needle only gets used once so I can't think of a situation where it would be advantageous to copy it. |
Ah, missed that part. Might add a slight overhead for very small arrays, but overall it should make sense. You could then have the iterator handle the dtype conversion and allocation of the result array as well, so it may even make the code more concise. So if you like to have a look at it, personally I like to see more of the new iterator being used. But it probably doesn't make a big difference, so whichever you like, is great. |
What was the rason for the closure? |
I closed this because I accidentally updated this branch while I was still playing with stuff, I'm reopening it now. Sorry it took me a while to get to this, but here i the current status. This pull request as is allows I also spent some time implementing searchsorted with NpyIter, but wasn't sure whether to include that here because it pretty much doubles the overhead of searchsorted for small arrays. The performance for large arrays, as far as I can tell, is unchanged. I can easily update this pull request if you guys think it's worth the change despite the hit to small array. Or I can open a new PR so we can discuss it separately. The changes are bellow if anyone wants to take a look. MrBago/numpy@improve_searchsorted...use_npyiter_in_searchsorted |
NPY_ARRAY_DEFAULT | NPY_ARRAY_NOTSWAPPED, | ||
NULL); | ||
if (ap2 == NULL) { | ||
if (PyArray_SIZE(ap2) > PyArray_SIZE(op1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is the switching logic for cache performance, this needs in source comment on why its done.
the searching uses a function pointer for the comparison, has someone tested how much impact this indirection compared to an inline compare for basic types has? |
@juliantaylor BTW, you now have commit rights. If you think that's like having bedbugs, let me know and I'll get rid of them. |
@MrBago Almost there, I think. The 1.8.0 branch is Aug 18. |
@juliantaylor I updated the comments, let me know if there is anything else. @charris Aug 18th is very soon, will this make it into 1.8? |
Ping travibot. @MrBago I think your chances are good. |
can you please rebase against master, that simplifies bisection with the submodules which were added after this pull was opened. |
NPY_ARRAY_DEFAULT | NPY_ARRAY_NOTSWAPPED, | ||
NULL); | ||
if (ap2 == NULL) { | ||
/* Even though the array to be searched does not need to be contiguous, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the numpy multiline comment style is
/*
* comment
*/
I think it would be easier if the text would describe the code lines, as the variables are not named very well (maybe rename to needle, haystack?)
e.g.
If the needle (ap2) is larger than the haystack (op1) we copy the haystack to a continuous array for improved cache utilization.
I found this odd behavior via a google search:
It appears the python integer causes the haystack to be copied to a double array even it it would fit into uint64. |
@juliantaylor, certainly nothing for this PR, but I don't think there is anything to be done there. I did not check, but it should make an |
Ah, so I guess it is cast to float because it is the only safe type to compare the two. It may be possible to do something for the scalar case, since |
@MrBago If you do rebase, you should update your master first. |
OK, in it goes. Thanks. |
This pull request allows
numpy.searchsorted
to search non-contigious arrays without making a copy and it allowsnumpy.searchsorted
to search arrays sorted in descending order. This is my first numpy pull request to let me know if I'm doing something wrong in terms of the code or in terms of requesting a PR.