Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): implement order aware merge iterator #1925

Merged
merged 10 commits into from
Apr 19, 2022

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Apr 19, 2022

What's changed and what's your intention?

In this PR, we have 3 main changes.

  • Implement an order aware merge iterator for future spill-to-disk support.
  • Enforce marking the direction of HummockIterator so that we do not have to care about aligning the direction of iterators passed into MergeIterator, ConcatIterator and any other iterators that are built from existing iterators, and instead the compiler can do the check for us.
  • Add simple bench for the original merge iterator and the order aware merge iterator

Instead of reimplementing a new merge iterator from scratch, or copy-paste most of code of the original MergeIteratorInner, we implement it based on the current MergeIteratorInner. The order aware merge iterator shares most of the logic with the original merge iterator, except that, first, it compares the iterators with an additional order index, and second, it has a different next implementation. For the first one, we add a new field extra_info of generic type NE (stands for Node Extra), and this new field will be used as a tie breaker when two iterators has the same current key. The NE for original merge iterator will be (), and for the order aware merge iterator it is usize, which stores the order index. For the second one, we introduce a new trait MergeIteratorNext, and the two merge iterators implements their logic separately, and the next of MergeInteratorInner will statically dispatch the logic.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

#1842

@wenym1 wenym1 requested review from hzxa21 and BugenZhao April 19, 2022 04:12
@CLAassistant
Copy link

CLAassistant commented Apr 19, 2022

CLA assistant check
All committers have signed the CLA.

@skyzh skyzh self-requested a review April 19, 2022 04:13
@codecov
Copy link

codecov bot commented Apr 19, 2022

Codecov Report

Merging #1925 (b67271f) into main (b2ada4b) will increase coverage by 0.28%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##             main    #1925      +/-   ##
==========================================
+ Coverage   70.61%   70.90%   +0.28%     
==========================================
  Files         611      619       +8     
  Lines       79994    80229     +235     
==========================================
+ Hits        56491    56889     +398     
+ Misses      23503    23340     -163     
Flag Coverage Δ
rust 70.90% <90.00%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/storage/src/hummock/iterator/concat.rs 92.26% <ø> (ø)
src/storage/src/hummock/mod.rs 89.00% <ø> (ø)
...ge/src/hummock/sstable/reverse_sstable_iterator.rs 95.80% <ø> (ø)
...rc/storage/src/hummock/sstable/sstable_iterator.rs 93.05% <ø> (-1.32%) ⬇️
src/storage/src/hummock/state_store.rs 70.50% <62.85%> (-0.46%) ⬇️
src/storage/src/hummock/iterator/merge_inner.rs 83.78% <81.60%> (-4.46%) ⬇️
src/storage/src/hummock/iterator/merge.rs 95.77% <95.36%> (+1.55%) ⬆️
src/storage/src/hummock/compactor.rs 67.44% <100.00%> (ø)
src/storage/src/hummock/iterator/concat_inner.rs 100.00% <100.00%> (+1.75%) ⬆️
src/storage/src/hummock/iterator/mod.rs 100.00% <100.00%> (ø)
... and 95 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@BugenZhao BugenZhao requested a review from lmatz April 19, 2022 04:34
Comment on lines +106 to +107
.partition_point(|table| match Self::Direction::direction() {
DirectionEnum::Forward => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ensure that this match can be resolved in compile-time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add const fn on direction() function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the function in the trait cannot be made const. We may fall back to const generic. You may need something like this @wenym1

https://github.com/singularity-data/risingwave/blob/4f31c18e99cf0bfe87f958047b8f8aa59dbf9cce/src/stream/src/executor/hash_join.rs#L35-L46

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added #[inline(always)] in the direction(). Can this possibly be resolved in compile-time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just run a simple bench about this discussion. It seems that dispatching using enum returned from the function call, regardless of whether it is inline or not, can get slightly better performance than using const dispatch.

Bench output is as followed.

const                   time:   [507.05 ps 508.36 ps 509.67 ps]                   

enum not inline         time:   [502.60 ps 503.98 ps 505.55 ps]                             

enum inline             time:   [504.40 ps 505.58 ps 506.67 ps]     

Code is attached below.

use criterion::{black_box, Criterion, criterion_group, criterion_main};

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

enum Num {
    One,
    Two,
}

trait GetEnum {
    fn get_num() -> Num;
}

struct GetEnumNotInline;

impl GetEnum for GetEnumNotInline {
    fn get_num() -> Num {
        Num::One
    }
}

struct GetNumInline;

impl GetEnum for GetNumInline {
    #[inline(always)]
    fn get_num() -> Num {
        Num::One
    }
}

fn const_dispatch<const NUM: usize>() {
    let mut sum = 0;
    for _ in 0..black_box(1000000) {
        match NUM {
            1 => sum += 1,
            2 => sum += 2,
            _ => unreachable!(),
        }
    }
    assert_eq!(sum, 1000000);
}

fn enum_dispatch<T: GetEnum>() {
    let mut sum = 0;
    for _ in 0..black_box(1000000) {
        match T::get_num() {
            Num::One => sum += 1,
            Num::Two => sum += 2,
        }
    }
    assert_eq!(sum, 1000000);
}

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("const", |b| b.iter(const_dispatch::<1>));

    c.bench_function("enum not inline", |b| b.iter(enum_dispatch::<GetEnumNotInline>));

    c.bench_function("enum inline", |b| b.iter(enum_dispatch::<GetNumInline>));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two set of iterators (Ordered, Unordered) looks a little bit like over-design for me. I think we can from now on by default assign an extra id for all iterators in merge iterator. e.g., if unordered, assign all ids = 0. Otherwise, assign ordered ids.

src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
let top_node = self.heap.pop().expect("no inner iter");
let mut popped_nodes = vec![];

// Take all nodes with the same current key as the top_node out of the heap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to take all nodes out? I believe we can simply pop they by combination of key + extra data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we just pop the nodes with the same current key as the top node, not all nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why do we need to do this? I think the heap by default supports handling equal keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key stored in the heap is (iter_current_key, order_index), so that if we have multiple same iter_current_key, the iter with the smallest order_index appears at the top.

We need to take out all nodes on the heap top with the same iter_current_key only, and in the view of heap, those keys are all different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we emit the same key (with same epoch) multiple times to the upper UserKeyIterator? Then we can write in the same way as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't see how this optimization could benefit.

The current OrderedMergeIterator will by default take out the top node so that the second topmost node can be visible and only then we check whether we have advanced all the iterators with the same current key. This is the reason why I separate the next of the ordered and unordered version, since taking out the top node and re-push it back to the heap will increase the cost, while for the unordered version it is unnecessary to bear this cost. Now with this optimization, for both ordered and unordered version we don't have to take out the top node, and then they can share the same logic, and the performance of the unordered version will be almost the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now with this optimization, for both ordered and unordered version we don't have to take out the top node, and then they can share the same logic, and the performance of the unordered version will be almost the same.

Okay, I totally agree on this point. But consider the correctness issue:

Also MergeIterator should emit all keys of its underlying iterators. Otherwise data will be lost when doing compaction.

The pseudo code above directly compared key -- if heap_top.iter.key() != top_key { break; }, which means that for a given key, only its latest epoch will be emitted.

This will cause compactor to lose data. Compactor is using MergeIterator, and it requires keys within watermark..latest are all retiained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pseudo code above directly compared key

Here I mean the full key (user_key, epoch).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... so if we want to unify the logic, I'd prefer to emit all keys in both ordered / unordered case. But I'm not sure if this would affect benchmark result. I'm okay with the current implementation, and I just want to follow up on this conversation I randomly recalled just now 🤣

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I mean the full key (user_key, epoch).

Then there's no problem 🤓

src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
src/storage/src/hummock/iterator/merge_inner.rs Outdated Show resolved Hide resolved
@@ -88,22 +90,122 @@ impl<'a, const DIRECTION: usize> MergeIteratorInner<'a, DIRECTION> {

self.heap = self
.unused_iters
.drain_filter(|i| i.is_valid())
.map(Node)
.drain_filter(|i| i.iter.is_valid())
.collect();
}
}

#[async_trait]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the async_trait itself may also lead to dynamic dispatching. We may investigate this later.

@wenym1
Copy link
Contributor Author

wenym1 commented Apr 19, 2022

Having two set of iterators (Ordered, Unordered) looks a little bit like over-design for me. I think we can from now on by default assign an extra id for all iterators in merge iterator. e.g., if unordered, assign all ids = 0. Otherwise, assign ordered ids.

The difference is not only at the iterator id side. The next logic is also different. Of course we can use the order aware version next as the general logic, but the performance will be bad, since it does more unnecessary comparison in the unordered scenario. The added bench shows the performance of order aware version is 25% off compared to unordered version.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenym1 wenym1 merged commit 470a18a into main Apr 19, 2022
@wenym1 wenym1 deleted the yiming/support-spill-to-disk branch April 19, 2022 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants