Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve searchsorted #3107

Merged
merged 5 commits into from Aug 17, 2013
Merged

Improve searchsorted #3107

merged 5 commits into from Aug 17, 2013

Conversation

MrBago
Copy link
Contributor

@MrBago MrBago commented Mar 1, 2013

This pull request allows numpy.searchsorted to search non-contigious arrays without making a copy and it allows numpy.searchsorted to search arrays sorted in descending order. This is my first numpy pull request to let me know if I'm doing something wrong in terms of the code or in terms of requesting a PR.

@@ -1505,15 +1505,22 @@
char *parr = PyArray_DATA(arr);
char *pkey = PyArray_DATA(key);
npy_intp *pret = (npy_intp *)PyArray_DATA(ret);
int elsize = PyArray_DESCR(arr)->elsize;
int elsize = PyArray_DESCR(key)->elsize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly for clarity. After these changes, elsize is only used to increment the pkey pointer. So I figured just to be consistent I should look up elsize from the key array (even though both arrays should be of the same type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we still assume that the key array is contiguous? Is there a reason for that, then? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons. First my naivete, because the key array could have any shape and any strides I didn't know if there was an easy way to handle that in this context.

The second is that it didn't seem to me that copying the key array was as big of a problem. searchsorted is linear in the size of the key array so making a temporary copy is a relatively small constant multiplier to the run time. Because searchsorted is log(N) in the size of the array being search, making a linear time copy of that array can dominate the run time for large arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use PyArray_ITEMSIZE to get elsize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, and I usually have big arrays of keys and small arrays to look them up in. But in general I agree that the lookup times will dominate the key retrieval time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to step through an array with arbitrary strides that I could
use here for the key array? We could at least stop copying read only
arrays, as @njsmith noted NPY_ARRAY_DEFAULT checks for weight only arrays.
I could include that in this pull request.
On Mar 1, 2013 5:38 PM, "Charles Harris" notifications@github.com wrote:

In numpy/core/src/multiarray/item_selection.c:

@@ -1505,15 +1505,22 @@
char *parr = PyArray_DATA(arr);
char *pkey = PyArray_DATA(key);
npy_intp *pret = (npy_intp *)PyArray_DATA(ret);

  • int elsize = PyArray_DESCR(arr)->elsize;
  • int elsize = PyArray_DESCR(key)->elsize;

Heh, and I usually have big arrays of keys and small arrays to look them
up in. But in general I agree that the lookup times will dominate the key
retrieval time.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3107/files#r3213822
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using nditer would be the way to go for that, but it would add some overhead.

@MrBago
Copy link
Contributor Author

MrBago commented May 17, 2013

I've updated the pull request so that searchsorted now takes a sortorder keyword arg instead of auto-detecting the sort order. The keyword arg prevents the ambiguity that would otherwise arise when the user's intentions cannot be automatically determined. I've copied the machinery for the side keyword to add the sortorder keyword, please let me know if you think there is a cleaner way to do this.

@charris
Copy link
Member

charris commented May 18, 2013

I'm a bit concerned that searching non-contiquous arrays might be slower than making a copy in some cases due to cache effects, at least for medium sized arrays. Have you done any benchmarks? Also, what use case do you have in mind for arrays in descending order? Just wondering how common they might be in practice.

@juliantaylor
Copy link
Contributor

I never used searchsorted, but can't you just make a view with stride sign reversed and subtract the result from shape to get descending order?

 np.asarray(d.shape) - d[::-1].searchsorted([1, 3]) - 1

does the sort order need to be a string?
there are only two orders so one could use reverse=True/False like python

@MrBago
Copy link
Contributor Author

MrBago commented May 31, 2013

@juliantaylor That's how this whole think kind of started. The current implementation in master will copy d when you do d[::-1].searchsorted([1, 3]) (unless d happens to have stride of -1). Once I dug into the code to figure out what was going on, I noticed that modifying it to search descending arrays was trivial so I thought I'd give it a go.

@charris Here is a quick summery of my use case. I tried searching my array the way @juliantaylor suggested, but was surprised that searching an array for a single key is ~10 times slower when the array is not contiguous. Once I figured out what was going on, it wasn't hard to come up with a solution. I just did something like this to avoid many copies:

def dist_from_right(array, value):
    return len(array) - array.searchsorted(value)

array = make_array()
search_array = array[::-1].copy(order='C')

while something:
    ...
    dist_from_right(search_array, value)
    ...

The above code is kind of ugly, but it works. You're right to be concerned about cache effects, I've put a few benchmarks into a gist here, https://gist.github.com/MrBago/7d832248499596356039. These are the results I get on my machine:

master:
contiguous, one key:  0.0932681560516
non-contiguous, one key:  0.780670881271
large stride, one key:  2.27084803581
contiguous, many keys:  1.05450797081
non-contiguous, many keys:  1.07213401794
large stride, many keys:  1.1610429287

improve_searchsorted:
contiguous, one key:  0.0903561115265
non-contiguous, one key:  0.0925250053406
large stride, one key:  0.0927810668945
contiguous, many keys:  1.05225515366
non-contiguous, many keys:  1.06229496002
large stride, many keys:  1.43884396553

With a sufficiently large number of keys and a sufficiently large stride on the array, it's faster to copy before searching.

At this point I've become somewhat ambivalent about this change. I think the 'descending' keyword feels kind of icky. We could just merge the non-copy part of the PR, but there's the trade off between my use case and the one that @charris brought up. We could also just merge the descending part (maybe those should have been separate pull requests). Anyways, let me know what you guys want to do with this.

@juliantaylor
Copy link
Contributor

so far I can see this pull fixes the issue with the reversed array copy.
do we still need the searchorder addition then?

it could at least be split into two pull requests as the strided support seems like a + in all cases to me, whereas the new api is debatable.

@juliantaylor
Copy link
Contributor

@MrBago do you still want to follow up on this?

@MrBago
Copy link
Contributor Author

MrBago commented Jul 10, 2013

I think the searchsorted change is probably not necessary, I can break of the array copy code and make a separate pull request for that if you guys it's an improvement.

@seberg
Copy link
Member

seberg commented Jul 10, 2013

The no-copy stuff seems like a very good idea to me. Now charris is right of course in case of a huge needle and small haystack it makes sense to do the copy anyway. I see you already did some timings, if you have some time, maybe it would be possible to define some simple heuristic for switching between modes when the needle is much larger then the haystack? At some point the copy will not be noticable anyway...

@MrBago
Copy link
Contributor Author

MrBago commented Jul 12, 2013

Simple heuristic? How about size(needle) > size(haystack)? Give me a few days to see if I can think of anything better and write a few benchmarks, then I'll make a new pull request. And since we're on the topic, can you guys think of any good reason to ever copy the haystack? Would iterating over the haystack with the NpyIter machinery be a worthwhile addition the the code? If so I could also try and have a go at that.

@seberg
Copy link
Member

seberg commented Jul 12, 2013

Good enough. It might be more something like size_needle * log(size_haystack) > constant * size_haystack, but it probably doesn't matter much, since I bet the speed difference is neglegible in most cases anyway. However, it would be nice to have the fast extreme case.

@MrBago
Copy link
Contributor Author

MrBago commented Jul 12, 2013

Any thoughts on using NpyIter to loop over the needle, (in my last post I said haystack, I meant needle). Each item in the needle only gets used once so I can't think of a situation where it would be advantageous to copy it.

@seberg
Copy link
Member

seberg commented Jul 12, 2013

Ah, missed that part. Might add a slight overhead for very small arrays, but overall it should make sense. You could then have the iterator handle the dtype conversion and allocation of the result array as well, so it may even make the code more concise. So if you like to have a look at it, personally I like to see more of the new iterator being used. But it probably doesn't make a big difference, so whichever you like, is great.

@MrBago MrBago closed this Aug 4, 2013
@charris
Copy link
Member

charris commented Aug 4, 2013

What was the rason for the closure?

@MrBago
Copy link
Contributor Author

MrBago commented Aug 4, 2013

I closed this because I accidentally updated this branch while I was still playing with stuff, I'm reopening it now.

Sorry it took me a while to get to this, but here i the current status. This pull request as is allows haystack.searchsorted(needle) to search a strided haystack when haytack is smaller than needle. If needle is larger than haystack the latter is copied for better cache performance. I spent some to see if there was a better way to determine where to switch between the two behaviors, and I found that the interaction between the size of the haystack, needle, strides and datatypes was pretty complex, (not to mention that any constants involved will likely be platform dependent). I'm open to implementing another switching rule, but I can't think of a better one.

I also spent some time implementing searchsorted with NpyIter, but wasn't sure whether to include that here because it pretty much doubles the overhead of searchsorted for small arrays. The performance for large arrays, as far as I can tell, is unchanged. I can easily update this pull request if you guys think it's worth the change despite the hit to small array. Or I can open a new PR so we can discuss it separately. The changes are bellow if anyone wants to take a look.

MrBago/numpy@improve_searchsorted...use_npyiter_in_searchsorted

@MrBago MrBago reopened this Aug 4, 2013
@charris
Copy link
Member

charris commented Aug 16, 2013

The drop of performance in small arrays is something we should look into. Sounds like it is best to leave that change out for the time being.

@njsmith @seberg Are you good to merge this?

NPY_ARRAY_DEFAULT | NPY_ARRAY_NOTSWAPPED,
NULL);
if (ap2 == NULL) {
if (PyArray_SIZE(ap2) > PyArray_SIZE(op1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the switching logic for cache performance, this needs in source comment on why its done.

@juliantaylor
Copy link
Contributor

the searching uses a function pointer for the comparison, has someone tested how much impact this indirection compared to an inline compare for basic types has?
may be worthwhile to look at that for 1.9

@charris
Copy link
Member

charris commented Aug 16, 2013

@juliantaylor BTW, you now have commit rights. If you think that's like having bedbugs, let me know and I'll get rid of them.

@charris
Copy link
Member

charris commented Aug 16, 2013

@MrBago Almost there, I think. The 1.8.0 branch is Aug 18.

@MrBago
Copy link
Contributor Author

MrBago commented Aug 16, 2013

@juliantaylor I updated the comments, let me know if there is anything else. @charris Aug 18th is very soon, will this make it into 1.8?

@charris
Copy link
Member

charris commented Aug 16, 2013

Ping travibot.

@MrBago I think your chances are good.

@charris charris closed this Aug 16, 2013
@charris charris reopened this Aug 16, 2013
@juliantaylor
Copy link
Contributor

can you please rebase against master, that simplifies bisection with the submodules which were added after this pull was opened.

NPY_ARRAY_DEFAULT | NPY_ARRAY_NOTSWAPPED,
NULL);
if (ap2 == NULL) {
/* Even though the array to be searched does not need to be contiguous,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the numpy multiline comment style is

/*
 * comment
 */

I think it would be easier if the text would describe the code lines, as the variables are not named very well (maybe rename to needle, haystack?)
e.g.
If the needle (ap2) is larger than the haystack (op1) we copy the haystack to a continuous array for improved cache utilization.

@juliantaylor
Copy link
Contributor

I found this odd behavior via a google search:

In [1]: import numpy as np
In [2]: d = np.arange(10001, dtype=np.uint64)
In [3]: %timeit np.searchsorted(d, 3583)
10000 loops, best of 3: 124 µs per loop
In [4]: %timeit np.searchsorted(d, np.uint64(3583))
10000 loops, best of 3: 19.2 µs per loop

It appears the python integer causes the haystack to be copied to a double array even it it would fit into uint64.
Do you want to have a go at fixing that too?
maybe after this PR has been merged.

@seberg
Copy link
Member

seberg commented Aug 17, 2013

@juliantaylor, certainly nothing for this PR, but I don't think there is anything to be done there. I did not check, but it should make an np.int_ which is to be expected. Otherwise the conversion function would have to do value inspection and decide that it can be a uint too, and it would need to know that a uint would be preferable.

@seberg
Copy link
Member

seberg commented Aug 17, 2013

Ah, so I guess it is cast to float because it is the only safe type to compare the two. It may be possible to do something for the scalar case, since np.result_type() actually suggest casting to uint then (by inspecting the integer value). Not sure if it is worth trying to do anything, since this kind of mechanism only works for scalars anyway?

@charris
Copy link
Member

charris commented Aug 17, 2013

@MrBago If you do rebase, you should update your master first.

@charris
Copy link
Member

charris commented Aug 17, 2013

OK, in it goes. Thanks.

charris added a commit that referenced this pull request Aug 17, 2013
@charris charris merged commit aeb13af into numpy:master Aug 17, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants