feat(storage): implement order aware merge iterator #1925

wenym1 · 2022-04-19T04:12:40Z

What's changed and what's your intention?

In this PR, we have 3 main changes.

Implement an order aware merge iterator for future spill-to-disk support.
Enforce marking the direction of HummockIterator so that we do not have to care about aligning the direction of iterators passed into MergeIterator, ConcatIterator and any other iterators that are built from existing iterators, and instead the compiler can do the check for us.
Add simple bench for the original merge iterator and the order aware merge iterator

Instead of reimplementing a new merge iterator from scratch, or copy-paste most of code of the original MergeIteratorInner, we implement it based on the current MergeIteratorInner. The order aware merge iterator shares most of the logic with the original merge iterator, except that, first, it compares the iterators with an additional order index, and second, it has a different next implementation. For the first one, we add a new field extra_info of generic type NE (stands for Node Extra), and this new field will be used as a tie breaker when two iterators has the same current key. The NE for original merge iterator will be (), and for the order aware merge iterator it is usize, which stores the order index. For the second one, we introduce a new trait MergeIteratorNext, and the two merge iterators implements their logic separately, and the next of MergeInteratorInner will statically dispatch the logic.

Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

#1842

…in HummockIterator

CLAassistant · 2022-04-19T04:12:45Z

All committers have signed the CLA.

codecov · 2022-04-19T04:21:09Z

Codecov Report

Merging #1925 (b67271f) into main (b2ada4b) will increase coverage by 0.28%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##             main    #1925      +/-   ##
==========================================
+ Coverage   70.61%   70.90%   +0.28%     
==========================================
  Files         611      619       +8     
  Lines       79994    80229     +235     
==========================================
+ Hits        56491    56889     +398     
+ Misses      23503    23340     -163

Flag	Coverage Δ
rust	`70.90% <90.00%> (+0.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/storage/src/hummock/iterator/concat.rs	`92.26% <ø> (ø)`
src/storage/src/hummock/mod.rs	`89.00% <ø> (ø)`
...ge/src/hummock/sstable/reverse_sstable_iterator.rs	`95.80% <ø> (ø)`
...rc/storage/src/hummock/sstable/sstable_iterator.rs	`93.05% <ø> (-1.32%)`	⬇️
src/storage/src/hummock/state_store.rs	`70.50% <62.85%> (-0.46%)`	⬇️
src/storage/src/hummock/iterator/merge_inner.rs	`83.78% <81.60%> (-4.46%)`	⬇️
src/storage/src/hummock/iterator/merge.rs	`95.77% <95.36%> (+1.55%)`	⬆️
src/storage/src/hummock/compactor.rs	`67.44% <100.00%> (ø)`
src/storage/src/hummock/iterator/concat_inner.rs	`100.00% <100.00%> (+1.75%)`	⬆️
src/storage/src/hummock/iterator/mod.rs	`100.00% <100.00%> (ø)`
... and 95 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

BugenZhao · 2022-04-19T04:38:39Z

src/storage/src/hummock/iterator/concat_inner.rs

+            .partition_point(|table| match Self::Direction::direction() {
+                DirectionEnum::Forward => {


Can we ensure that this match can be resolved in compile-time?

May add const fn on direction() function.

Seems the function in the trait cannot be made const. We may fall back to const generic. You may need something like this @wenym1

https://github.com/singularity-data/risingwave/blob/4f31c18e99cf0bfe87f958047b8f8aa59dbf9cce/src/stream/src/executor/hash_join.rs#L35-L46

I have added #[inline(always)] in the direction(). Can this possibly be resolved in compile-time?

Not sure about that...

Just run a simple bench about this discussion. It seems that dispatching using enum returned from the function call, regardless of whether it is inline or not, can get slightly better performance than using const dispatch.

Bench output is as followed.

const time: [507.05 ps 508.36 ps 509.67 ps] enum not inline time: [502.60 ps 503.98 ps 505.55 ps] enum inline time: [504.40 ps 505.58 ps 506.67 ps]

Code is attached below.

use criterion::{black_box, Criterion, criterion_group, criterion_main}; criterion_group!(benches, criterion_benchmark); criterion_main!(benches); enum Num { One, Two, } trait GetEnum { fn get_num() -> Num; } struct GetEnumNotInline; impl GetEnum for GetEnumNotInline { fn get_num() -> Num { Num::One } } struct GetNumInline; impl GetEnum for GetNumInline { #[inline(always)] fn get_num() -> Num { Num::One } } fn const_dispatch<const NUM: usize>() { let mut sum = 0; for _ in 0..black_box(1000000) { match NUM { 1 => sum += 1, 2 => sum += 2, _ => unreachable!(), } } assert_eq!(sum, 1000000); } fn enum_dispatch<T: GetEnum>() { let mut sum = 0; for _ in 0..black_box(1000000) { match T::get_num() { Num::One => sum += 1, Num::Two => sum += 2, } } assert_eq!(sum, 1000000); } fn criterion_benchmark(c: &mut Criterion) { c.bench_function("const", |b| b.iter(const_dispatch::<1>)); c.bench_function("enum not inline", |b| b.iter(enum_dispatch::<GetEnumNotInline>)); c.bench_function("enum inline", |b| b.iter(enum_dispatch::<GetNumInline>)); }

Seems okay 🤣. https://godbolt.org/z/xnzc8Pr9z

src/storage/src/hummock/iterator/merge_inner.rs

skyzh

Having two set of iterators (Ordered, Unordered) looks a little bit like over-design for me. I think we can from now on by default assign an extra id for all iterators in merge iterator. e.g., if unordered, assign all ids = 0. Otherwise, assign ordered ids.

src/storage/src/hummock/iterator/merge_inner.rs

skyzh · 2022-04-19T05:03:03Z

src/storage/src/hummock/iterator/merge_inner.rs

+        let top_node = self.heap.pop().expect("no inner iter");
+        let mut popped_nodes = vec![];
+
+        // Take all nodes with the same current key as the top_node out of the heap.


Why do we need to take all nodes out? I believe we can simply pop they by combination of key + extra data.

Here we just pop the nodes with the same current key as the top node, not all nodes.

Yes, but why do we need to do this? I think the heap by default supports handling equal keys?

The key stored in the heap is (iter_current_key, order_index), so that if we have multiple same iter_current_key, the iter with the smallest order_index appears at the top.

We need to take out all nodes on the heap top with the same iter_current_key only, and in the view of heap, those keys are all different.

Can we emit the same key (with same epoch) multiple times to the upper UserKeyIterator? Then we can write in the same way as before.

So I don't see how this optimization could benefit.

The current OrderedMergeIterator will by default take out the top node so that the second topmost node can be visible and only then we check whether we have advanced all the iterators with the same current key. This is the reason why I separate the next of the ordered and unordered version, since taking out the top node and re-push it back to the heap will increase the cost, while for the unordered version it is unnecessary to bear this cost. Now with this optimization, for both ordered and unordered version we don't have to take out the top node, and then they can share the same logic, and the performance of the unordered version will be almost the same.

Now with this optimization, for both ordered and unordered version we don't have to take out the top node, and then they can share the same logic, and the performance of the unordered version will be almost the same.

Okay, I totally agree on this point. But consider the correctness issue:

Also MergeIterator should emit all keys of its underlying iterators. Otherwise data will be lost when doing compaction.

The pseudo code above directly compared key -- if heap_top.iter.key() != top_key { break; }, which means that for a given key, only its latest epoch will be emitted.

This will cause compactor to lose data. Compactor is using MergeIterator, and it requires keys within watermark..latest are all retiained.

The pseudo code above directly compared key

Here I mean the full key (user_key, epoch).

... so if we want to unify the logic, I'd prefer to emit all keys in both ordered / unordered case. But I'm not sure if this would affect benchmark result. I'm okay with the current implementation, and I just want to follow up on this conversation I randomly recalled just now 🤣

Here I mean the full key (user_key, epoch).

Then there's no problem 🤓

src/storage/src/hummock/iterator/merge_inner.rs

BugenZhao · 2022-04-19T05:06:59Z

src/storage/src/hummock/iterator/merge_inner.rs

@@ -88,22 +90,122 @@ impl<'a, const DIRECTION: usize> MergeIteratorInner<'a, DIRECTION> {

        self.heap = self
            .unused_iters
-            .drain_filter(|i| i.is_valid())
-            .map(Node)
+            .drain_filter(|i| i.iter.is_valid())
            .collect();
    }
 }

 #[async_trait]


Note that the async_trait itself may also lead to dynamic dispatching. We may investigate this later.

wenym1 · 2022-04-19T05:38:48Z

Having two set of iterators (Ordered, Unordered) looks a little bit like over-design for me. I think we can from now on by default assign an extra id for all iterators in merge iterator. e.g., if unordered, assign all ids = 0. Otherwise, assign ordered ids.

The difference is not only at the iterator id side. The next logic is also different. Of course we can use the order aware version next as the general logic, but the performance will be bad, since it does more unnecessary comparison in the unordered scenario. The added bench shows the performance of order aware version is 25% off compared to unordered version.

src/storage/src/hummock/iterator/merge_inner.rs

BugenZhao

LGTM

wenym1 added 8 commits April 17, 2022 16:39

feat(storage): support order aware merge iterator; enforce direction …

bc9cdf3

…in HummockIterator

test(storage): add unit tests for order aware merge iterator

419402d

Merge branch 'main' into yiming/support-spill-to-disk

52e282b

fix failpoints compile error after merge updated main

4c09f3c

refactor HummockIterator direction

6587af7

add merge iter benchmark

0749a82

add inline in HummockIteratorDirection

d40e2ed

use static dispatch in merge iterator next call

2fe70ff

wenym1 requested review from hzxa21 and BugenZhao April 19, 2022 04:12

skyzh self-requested a review April 19, 2022 04:13

github-actions bot added the type/feature label Apr 19, 2022

BugenZhao requested a review from lmatz April 19, 2022 04:34

BugenZhao reviewed Apr 19, 2022

View reviewed changes

skyzh reviewed Apr 19, 2022

View reviewed changes

BugenZhao reviewed Apr 19, 2022

View reviewed changes

modify according to PR comments

f3f7159

hzxa21 reviewed Apr 19, 2022

View reviewed changes

src/storage/src/hummock/iterator/merge_inner.rs Show resolved Hide resolved

separate cmp for ordered and unordered node

b67271f

hzxa21 approved these changes Apr 19, 2022

View reviewed changes

BugenZhao approved these changes Apr 19, 2022

View reviewed changes

lmatz approved these changes Apr 19, 2022

View reviewed changes

skyzh approved these changes Apr 19, 2022

View reviewed changes

wenym1 merged commit 470a18a into main Apr 19, 2022

wenym1 deleted the yiming/support-spill-to-disk branch April 19, 2022 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): implement order aware merge iterator #1925

feat(storage): implement order aware merge iterator #1925

wenym1 commented Apr 19, 2022

CLAassistant commented Apr 19, 2022 •

edited

codecov bot commented Apr 19, 2022 •

edited

BugenZhao Apr 19, 2022

skyzh Apr 19, 2022

BugenZhao Apr 19, 2022

wenym1 Apr 19, 2022

BugenZhao Apr 19, 2022

wenym1 Apr 19, 2022

BugenZhao Apr 19, 2022

skyzh left a comment

skyzh Apr 19, 2022

wenym1 Apr 19, 2022

skyzh Apr 24, 2022

wenym1 Apr 24, 2022

skyzh Apr 24, 2022

wenym1 Apr 24, 2022

skyzh Apr 24, 2022

wenym1 Apr 24, 2022

skyzh Apr 24, 2022

skyzh Apr 24, 2022

BugenZhao Apr 19, 2022

wenym1 commented Apr 19, 2022

BugenZhao left a comment

		.partition_point(\|table\| match Self::Direction::direction() {
		DirectionEnum::Forward => {

feat(storage): implement order aware merge iterator #1925

feat(storage): implement order aware merge iterator #1925

Conversation

wenym1 commented Apr 19, 2022

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

CLAassistant commented Apr 19, 2022 • edited

codecov bot commented Apr 19, 2022 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skyzh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenym1 commented Apr 19, 2022

BugenZhao left a comment

Choose a reason for hiding this comment

CLAassistant commented Apr 19, 2022 •

edited

codecov bot commented Apr 19, 2022 •

edited