-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): implement order aware merge iterator #1925
Conversation
…in HummockIterator
Codecov Report
@@ Coverage Diff @@
## main #1925 +/- ##
==========================================
+ Coverage 70.61% 70.90% +0.28%
==========================================
Files 611 619 +8
Lines 79994 80229 +235
==========================================
+ Hits 56491 56889 +398
+ Misses 23503 23340 -163
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
.partition_point(|table| match Self::Direction::direction() { | ||
DirectionEnum::Forward => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ensure that this match
can be resolved in compile-time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May add const fn
on direction()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the function in the trait cannot be made const
. We may fall back to const generic. You may need something like this @wenym1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added #[inline(always)]
in the direction()
. Can this possibly be resolved in compile-time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just run a simple bench about this discussion. It seems that dispatching using enum returned from the function call, regardless of whether it is inline
or not, can get slightly better performance than using const dispatch.
Bench output is as followed.
const time: [507.05 ps 508.36 ps 509.67 ps]
enum not inline time: [502.60 ps 503.98 ps 505.55 ps]
enum inline time: [504.40 ps 505.58 ps 506.67 ps]
Code is attached below.
use criterion::{black_box, Criterion, criterion_group, criterion_main};
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
enum Num {
One,
Two,
}
trait GetEnum {
fn get_num() -> Num;
}
struct GetEnumNotInline;
impl GetEnum for GetEnumNotInline {
fn get_num() -> Num {
Num::One
}
}
struct GetNumInline;
impl GetEnum for GetNumInline {
#[inline(always)]
fn get_num() -> Num {
Num::One
}
}
fn const_dispatch<const NUM: usize>() {
let mut sum = 0;
for _ in 0..black_box(1000000) {
match NUM {
1 => sum += 1,
2 => sum += 2,
_ => unreachable!(),
}
}
assert_eq!(sum, 1000000);
}
fn enum_dispatch<T: GetEnum>() {
let mut sum = 0;
for _ in 0..black_box(1000000) {
match T::get_num() {
Num::One => sum += 1,
Num::Two => sum += 2,
}
}
assert_eq!(sum, 1000000);
}
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("const", |b| b.iter(const_dispatch::<1>));
c.bench_function("enum not inline", |b| b.iter(enum_dispatch::<GetEnumNotInline>));
c.bench_function("enum inline", |b| b.iter(enum_dispatch::<GetNumInline>));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems okay 🤣. https://godbolt.org/z/xnzc8Pr9z
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having two set of iterators (Ordered, Unordered) looks a little bit like over-design for me. I think we can from now on by default assign an extra id for all iterators in merge iterator. e.g., if unordered, assign all ids = 0. Otherwise, assign ordered ids.
let top_node = self.heap.pop().expect("no inner iter"); | ||
let mut popped_nodes = vec![]; | ||
|
||
// Take all nodes with the same current key as the top_node out of the heap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to take all nodes out? I believe we can simply pop they by combination of key + extra data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we just pop the nodes with the same current key as the top node, not all nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but why do we need to do this? I think the heap by default supports handling equal keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key stored in the heap is (iter_current_key, order_index), so that if we have multiple same iter_current_key, the iter with the smallest order_index appears at the top.
We need to take out all nodes on the heap top with the same iter_current_key only, and in the view of heap, those keys are all different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we emit the same key (with same epoch) multiple times to the upper UserKeyIterator? Then we can write in the same way as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't see how this optimization could benefit.
The current OrderedMergeIterator
will by default take out the top node so that the second topmost node can be visible and only then we check whether we have advanced all the iterators with the same current key. This is the reason why I separate the next
of the ordered and unordered version, since taking out the top node and re-push it back to the heap will increase the cost, while for the unordered version it is unnecessary to bear this cost. Now with this optimization, for both ordered and unordered version we don't have to take out the top node, and then they can share the same logic, and the performance of the unordered version will be almost the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now with this optimization, for both ordered and unordered version we don't have to take out the top node, and then they can share the same logic, and the performance of the unordered version will be almost the same.
Okay, I totally agree on this point. But consider the correctness issue:
Also MergeIterator should emit all keys of its underlying iterators. Otherwise data will be lost when doing compaction.
The pseudo code above directly compared key -- if heap_top.iter.key() != top_key { break; }
, which means that for a given key, only its latest epoch will be emitted.
This will cause compactor to lose data. Compactor is using MergeIterator, and it requires keys within watermark..latest are all retiained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudo code above directly compared key
Here I mean the full key (user_key, epoch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... so if we want to unify the logic, I'd prefer to emit all keys in both ordered / unordered case. But I'm not sure if this would affect benchmark result. I'm okay with the current implementation, and I just want to follow up on this conversation I randomly recalled just now 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I mean the full key (user_key, epoch).
Then there's no problem 🤓
@@ -88,22 +90,122 @@ impl<'a, const DIRECTION: usize> MergeIteratorInner<'a, DIRECTION> { | |||
|
|||
self.heap = self | |||
.unused_iters | |||
.drain_filter(|i| i.is_valid()) | |||
.map(Node) | |||
.drain_filter(|i| i.iter.is_valid()) | |||
.collect(); | |||
} | |||
} | |||
|
|||
#[async_trait] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the async_trait
itself may also lead to dynamic dispatching. We may investigate this later.
The difference is not only at the iterator id side. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What's changed and what's your intention?
In this PR, we have 3 main changes.
order aware merge iterator
for future spill-to-disk support.HummockIterator
so that we do not have to care about aligning the direction of iterators passed intoMergeIterator
,ConcatIterator
and any other iterators that are built from existing iterators, and instead the compiler can do the check for us.Instead of reimplementing a new merge iterator from scratch, or copy-paste most of code of the original
MergeIteratorInner
, we implement it based on the currentMergeIteratorInner
. The order aware merge iterator shares most of the logic with the original merge iterator, except that, first, it compares the iterators with an additional order index, and second, it has a differentnext
implementation. For the first one, we add a new fieldextra_info
of generic typeNE
(stands forNode Extra
), and this new field will be used as a tie breaker when two iterators has the same current key. TheNE
for original merge iterator will be()
, and for the order aware merge iterator it isusize
, which stores the order index. For the second one, we introduce a new traitMergeIteratorNext
, and the two merge iterators implements their logic separately, and thenext
ofMergeInteratorInner
will statically dispatch the logic.Checklist
Refer to a related PR or issue link (optional)
#1842