-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k-way merge adaptor. #97
Conversation
I::Item: PartialOrd | ||
{ | ||
fn partial_cmp(&self, other: &NonEmpty<I>) -> Option<Ordering> { | ||
self.head.partial_cmp(&other.head).map(Ordering::reverse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of mapping reverse I'd just use other.head.partial_cmp(&self.head) here. Simple and less noise.
What's more important is that it should implement all of lt, le, gt, ge. Implementing the specific comparison operators should have a noticable effect in benchmarks. BinaryHeap uses >
, >=
, <=
it looks like (we just impl all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment somewhere that this type implements comparisons reversed to be used in a min heap would be good.
Hey, this is interesting. We want this in itertools. @frankmcsherry talked to me about the same algorithm, but here you are, with the first PR and that's completely OK. I'm just wondering if we can glean some tricks from his implementation in https://github.com/frankmcsherry/differential-dataflow/blob/master/src/iterators/merge.rs The main trick is that it doesn't use the binary heap at all, so that it can be a bit more efficient. But we don't need to do that optimization now. I don't think there is anything in BinaryHeap that lets us do something similar. I guess the quickcheck tests are still not working? I should fix that. This thing definitely deserves a quickcheck test. |
Cool. Sorry about cutting in line here, I was extra careful to check for open pull requests and issues. @frankmcsherry's implementation does look similar, but might manage to cut a few corners, here and there. Although it does implement half of a heap somewhere in there. I would be interested in seeing the difference in performance. Ideally, I would like to be able to get around the The quickcheck tests are indeed broken, I'll get to working on your remarks now. |
I'm fixing quickcheck, it's going to be without quickcheck_macros; syntax extensions break too often so that's annoying. |
OK. I have tried to address your remarks in the last two commits. |
Great. did you see the idea about implementing lt, le, gt, ge? I think it makes a difference. |
Nope, skipped right past it. I have added explicit implementations, but I do not observe a significant change in the benchmark results. |
Oh 😞. I'm a bit surprised it might be for other element types. Thank you anyway. |
I want to merge this, don't have time today, but I'll get to it. I would remove the kmerge method in the Itertools trait in fact (this is not an operation on a single iterator imo). I would also fix the bounds on Clone to not use NotEmpty at all. |
Yeah, I did not like NonEmpty showing up in the public interface the way it did. Looks like I simply gave up too soon when the compiler would not stop complaining about missing impls. You motivated me to take another stab at it and lo and behold, NonEmpty is gone from the bounds and can now be made private. |
I had an idea for the binary heap in an unrelated algorithm. Since This PR is completely fine though. |
I disagree about That way, one can think of |
@pczarn Your concern about moving around instances of I am not entirely sure what increase-key and decrease-key operations are (have not had a lot of formal CS training), but I think I have accidentally used them in my experiments. I will write more in a separate reply. |
@bluss So, I have been looking at the McSherry implementation and the main difference seems to be in the way it does not pop the largest element, modifies it and pushes it back (incurring a Since I cannot get at the guts of pub fn pop_push<F>(&mut self, f: F)
where F: FnOnce(Option<T>) -> Option<T>; pub fn pop_push_back<F>(&mut self, f: F)
where F: FnOnce(T) -> Option<T>; They pass the top element of the heap to Using this new API cuts run-time in half in the benchmarks. How best to proceed with this? Is this API something that should be made part of the |
It's cool, we don't need to do any optimizations now (#97 (comment)), as long as we have an API that permits them later (which we have). The indirection suggestion is pretty interesting, but mcsherry's improvement is the most important part. Until now I've preferred to not depend on any special case datastructures in itertools. If we include it, I imagine we use a very stripped down version of the binary heap. |
Are there any open questions left here or is getting this merged just a matter of you finding the time to do it? |
There isn't, I was just wondering where the contains-rs discussion would go. |
Heh, wherever it is going, it is not going there fast. |
I'm sorry that it's already been a week, I haven't put in as much time as I used to. We'll merge this, then I can push my quickcheck fix too. |
No need to feel sorry. I assume we both do this in our free time. It was not my intention to rush you, just wanted to make sure I had not missed one of your suggestions again. |
Thank you! Issue #98 is the follow up issue for future improvements. |
Merges an arbitrary number of iterators in ascending order.
Uses
std
'sBinaryHeap
to decide which iterator to take from next. This seems quite heavyweight. Two-way merge benchmarks take roughly ten times longer than the dedicated two-way merge adaptor. Profiling identifiesBinaryHeap
ssift_up
as the hot-spot.Not completely sure about the interface, the double use of IntoIterator in the free-standing function in particular.