Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider support for vectorized binary search with sorted needles #691

Open
jaredhoberock opened this issue Jan 28, 2013 · 8 comments
Open
Labels
thrust For all items related to Thrust.

Comments

@jaredhoberock
Copy link

When the needles are sorted, we can significantly accelerate the search.

To indicate that the needles are sorted, we can receive an additional Compare parameter:

template<typename ForwardIterator, typename InputIterator, typename OutputIterator, typename Compare1, typename Compare2>
  OutputIterator
    lower_bound(ForwardIterator haystack_first,
                ForwardIterator haystack_last,
                InputIterator needles_first,
                InputIterator needles_last,
                OutputIterator result,
                Compare1 haystack_comp,
                Compare2 needles_comp);
@jaredhoberock
Copy link
Author

The above trick won't work because both the haystack and the needles need to be sorted wrt to the same compare (like, say, merge).

@jaredhoberock
Copy link
Author

It seems difficult to expose this functionality without inventing a new name for it.

boost::flat_map's constructor uses a tag to indicate that a range is sorted for a similar problem:

template<typename InputIterator> 
  flat_map(ordered_unique_range_t, InputIterator, InputIterator, 
           const Pred & = Pred(), 
           const allocator_type & = allocator_type());

D's standard library has no binary search functions analogous to C++ (or functions for doing many searches in bulk like Thrust). Instead, it has a single find function and relies on ranges being wrapped in special SortedRange wrappers:

auto a = [ 1, 2, 3, 42, 52, 64 ];
auto r = assumeSorted(a);
assert(r.canFind(3));
assert(!r.canFind(32));

It doesn't seem to have a bulk "find these needles in this haystack" algorithm.

@jaredhoberock
Copy link
Author

The structure of this problem is very similar to the set operations: both consume two sorted lists and produce a single result. In particular, this operation is close to set_intersection in some ways. However, it doesn't seem possible to express this operation with a lowering to an existing set operation.

Moreover, it seems likely that this algorithm often would be applied in contexts where the data would also be consumed by the set algorithms.

How about set_lower_bound:

template<typename InputIterator1, typename InputIterator2, typename OutputIterator, typename Compare>
  OutputIterator
    set_lower_bound(InputIterator1 first1, InputIterator last1,
                    InputIterator2 first2, InputIterator last2,
                    OutputIterator result,
                    Compare comp);

In this formulation, the [first1,last1) range would be the "needles" range: for each element e in the range [first1,last1), this algorithm writes to the result the lowest position i in the range [first2,last2) such that e could be inserted without disturbing the ordering.

On the other hand, if the result was used to actually do the hypothetical insertion, it would not necessarily be equivalent to the output of any of the standard set algorithms. For example, the insertion would not produce the result of set_union (due to special effort required by the multiset semantics), or even merge (because multiple needles would collide at the same location) without some additional effort.

@jaredhoberock
Copy link
Author

There's also sorted_lower_bound:

template<typename ForwardIterator, typename InputIterator, typename OutputIterator, typename Compare1>
  OutputIterator
    sorted_lower_bound(ForwardIterator haystack_first,
                       ForwardIterator haystack_last,
                       InputIterator needles_first,
                       InputIterator needles_last,
                       OutputIterator result,
                       Compare comp);

I don't feel like this name really captures what's going on here.

@jaredhoberock
Copy link
Author

Here are some ideas from Michael:

My immediate reaction is that a new name is warranted.  Some ideas:

all_lower_bounds -- because we're finding several

simultaneous_lower_bounds -- find them all at once; but this is verbose

each_lower_bound -- yet another variant of that idea

partition_with -- we're using the "needles" to partition the "hay stack"

merge_rank -- we're computing the position of all elements of one sorted sequence in the other.  Like sort, merge can be decomposed into compute ranks + permute.  This is the rank step of that process.

@jaredhoberock
Copy link
Author

lower_merge_rank and upper_merge_rank might be a reasonable compromise, though the merge aspect of the name seems like it would be a red herring.

@andrewcorrigan
Copy link
Contributor

My code uses lower_bound and upper_bound with needles sorted all the time, so I'm definitely excited to hear that there might be further performance to be gained. For what it's worth, I think that lower_merge_rank/upper_merge_rank make sense given the explanation you quoted above.

[off-topic] I really like the search functions using arguments named using 'needles' and 'haystack' Are you considering using those names in the Thrust source code?

@jaredhoberock
Copy link
Author

I came across the "needles" and "haystack" nomenclature in The D Programming Language. Dunno where it originated. We'd probably use it if we add this algorithm.

kshitij12345 referenced this issue in kshitij12345/thrust Mar 24, 2022
…cent_difference

Port adjacent difference into CUB
@jrhemstad jrhemstad added the thrust For all items related to Thrust. label Feb 22, 2023
@jarmak-nv jarmak-nv transferred this issue from NVIDIA/thrust Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thrust For all items related to Thrust.
Projects
Status: Todo
Development

No branches or pull requests

3 participants