Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(array): refactor StreamChunk & introduce record iterator #1916

Merged
merged 10 commits into from Apr 19, 2022

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Apr 18, 2022

What's changed and what's your intention?

In this PR, we refactor StreamChunk with ops and data_chunk instead of unzipping the data chunk into parts, so that we can reuse some functionalities of DataChunk.

https://github.com/singularity-data/risingwave/blob/1228f9272c6e7bcce9db86f7f7c982a1bc5032e0/src/common/src/array/stream_chunk.rs#L72-L79

Then, we remove the StreamChunkRowRef introduced in #1813 and provide a record iterator based on DataChunkRowRef whose item is...

https://github.com/singularity-data/risingwave/blob/1cc6400140e40843de46f79f2a575da6989a206d/src/common/src/array/stream_chunk_iter.rs#L74-L82

Some executors can be benefited from this record-style interface. For now, we keep the row interface for compatibility.

Benchmark

test array::stream_chunk_iter::tests::bench_record_iterator                 ... bench:      45,728 ns/iter (+/- 1,268)
test array::stream_chunk_iter::tests::bench_rows_iterator                   ... bench:      39,444 ns/iter (+/- 865)
test array::stream_chunk_iter::tests::bench_rows_iterator_from_records      ... bench:      73,246 ns/iter (+/- 2,349)
test array::stream_chunk_iter::tests::bench_rows_iterator_vec_of_datum_refs ... bench:     254,524 ns/iter (+/- 11,879)

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@codecov
Copy link

codecov bot commented Apr 18, 2022

Codecov Report

Merging #1916 (59d2eb0) into main (cdfac58) will increase coverage by 0.25%.
The diff coverage is 51.61%.

@@            Coverage Diff             @@
##             main    #1916      +/-   ##
==========================================
+ Coverage   70.62%   70.88%   +0.25%     
==========================================
  Files         611      617       +6     
  Lines       80064    79990      -74     
==========================================
+ Hits        56543    56697     +154     
+ Misses      23521    23293     -228     
Flag Coverage Δ
rust 70.88% <51.61%> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/lib.rs 100.00% <ø> (ø)
src/common/src/test_utils/test_stream_chunk.rs 40.45% <7.46%> (-28.38%) ⬇️
src/common/src/array/stream_chunk_iter.rs 57.54% <54.25%> (-41.24%) ⬇️
src/common/src/array/data_chunk_iter.rs 86.75% <100.00%> (+7.62%) ⬆️
src/common/src/array/stream_chunk.rs 89.07% <100.00%> (+5.03%) ⬆️
src/stream/src/executor/debug/update_check.rs 87.35% <100.00%> (ø)
src/stream/src/executor_v2/hop_window.rs 80.55% <100.00%> (ø)
src/stream/src/executor_v2/top_n.rs 90.50% <100.00%> (-0.06%) ⬇️
src/stream/src/executor_v2/top_n_appendonly.rs 93.26% <100.00%> (-0.06%) ⬇️
src/batch/src/executor/stream_scan.rs 0.00% <0.00%> (-45.14%) ⬇️
... and 55 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao requested a review from st1page April 19, 2022 05:17
@yuhao-su
Copy link
Contributor

LGTM!

Copy link
Contributor

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM.

And a question: what's the benefit from doing so?

src/common/src/array/stream_chunk_iter.rs Show resolved Hide resolved
src/common/src/array/stream_chunk.rs Outdated Show resolved Hide resolved
@BugenZhao
Copy link
Member Author

And a question: what's the benefit from doing so?

Actually the main purpose of this PR is to reuse the row-level iterator of data chunk in #1905 for stream chunk. For record, some executors may use this interface to handle "update" more elegantly, but in the future. 😁

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao requested a review from fuyufjh April 19, 2022 06:46
@BugenZhao BugenZhao enabled auto-merge (squash) April 19, 2022 06:58
@BugenZhao BugenZhao merged commit ecb8d95 into main Apr 19, 2022
@BugenZhao BugenZhao deleted the bz/stream-record-ref branch April 19, 2022 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants