Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(streaming): Support hash based parallelized chain node #1846

Merged
merged 10 commits into from
Apr 15, 2022

Conversation

zbzbw
Copy link
Contributor

@zbzbw zbzbw commented Apr 14, 2022

What's changed and what's your intention?

This PR implements parallelized chain node in a very straightforward and naive way:

  1. Changed chain node's distribution, and assign each chain with one upstream materialize actor.
  2. For batch query node, we split data into multiple parts by hashing. So each executor will still scan the whole table

We should change the batch query node to scan the table by range after we figured out how to split table into partitions in a good way.

p.s. Dashboard has some small issue now when resolving mv on mv.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

One step forward of #619

Bowen Zhou added 2 commits April 14, 2022 14:42
Signed-off-by: Bowen Zhou <bowenzhou@singularity-data.com>
Signed-off-by: Bowen Zhou <bowenzhou@singularity-data.com>
zbzbw and others added 2 commits April 14, 2022 18:37
@zbzbw
Copy link
Contributor Author

zbzbw commented Apr 14, 2022

Some illustrations.

create table t1 (v1 int not null, v2 int not null, v3 int not null);
create materialized view mv1 as select a.v1 as av1, b.v1 as bv1 from t1 a, t1 b where a.v2<>b.v2 and a.v3=b.v3;

31649933292_ pic
21649933251_ pic

@zbzbw zbzbw requested a review from TennyZhuang April 14, 2022 10:57
@zbzbw zbzbw marked this pull request as ready for review April 14, 2022 10:57
Signed-off-by: Bowen Zhou <bowenzhou@singularity-data.com>
@codecov
Copy link

codecov bot commented Apr 14, 2022

Codecov Report

Merging #1846 (e488bf5) into main (c14ebe4) will decrease coverage by 0.02%.
The diff coverage is 40.65%.

@@            Coverage Diff             @@
##             main    #1846      +/-   ##
==========================================
- Coverage   70.86%   70.84%   -0.03%     
==========================================
  Files         611      611              
  Lines       79591    79667      +76     
==========================================
+ Hits        56403    56440      +37     
- Misses      23188    23227      +39     
Flag Coverage Δ
rust 70.84% <40.65%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/meta/src/barrier/command.rs 58.87% <ø> (+0.54%) ⬆️
src/meta/src/rpc/service/ddl_service.rs 0.00% <0.00%> (ø)
src/meta/src/rpc/service/stream_service.rs 0.00% <0.00%> (ø)
src/meta/src/stream/stream_manager.rs 72.50% <ø> (ø)
src/stream/src/executor/batch_query.rs 0.00% <0.00%> (ø)
src/stream/src/executor_v2/v1_compat.rs 37.44% <0.00%> (-1.12%) ⬇️
src/meta/src/stream/graph/stream_graph.rs 62.36% <21.42%> (-4.31%) ⬇️
src/meta/src/stream/fragmenter.rs 83.11% <80.00%> (+0.16%) ⬆️
src/stream/src/executor_v2/batch_query.rs 85.71% <95.65%> (+3.74%) ⬆️
...ntend/src/optimizer/plan_node/stream_table_scan.rs 95.00% <100.00%> (+0.08%) ⬆️
... and 6 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@zbzbw
Copy link
Contributor Author

zbzbw commented Apr 14, 2022

Seems we have too much log now 😢

Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this implementation is correct. Would you please elaborate:

  • What distribution is chain following?
  • What's the distribution of BatchPlanNode and MergeNode separately? (Is BatchPlanNode really using Distribution::HashShard(logical.base.pk_indices.clone()) as its distribution?)
  • What's distribution is chain's dispatcher? How is it determined?

.get_hash_values(self.info.pk_indices.as_ref(), CRC32FastBuilder)
.unwrap();
let n = data_chunk.cardinality();
let (columns, _visibility) = data_chunk.into_parts();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we ensure that data_chunk's visibility is None?

Copy link
Member

@BugenZhao BugenZhao Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern about this as well. By the way, there're also some other executors ignoring the visibility. :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, collect_data_chunk in CellBasedTableRowIter will always return a chunk with None visibility.

By the way, there're also some other executors ignoring the visibility. :(

Added compact in execute_inner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add an assert that visibility is None.

src/stream/src/executor_v2/batch_query.rs Outdated Show resolved Hide resolved
.get_hash_values(self.info.pk_indices.as_ref(), CRC32FastBuilder)
.unwrap();
let n = data_chunk.cardinality();
let (columns, _visibility) = data_chunk.into_parts();
Copy link
Member

@BugenZhao BugenZhao Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern about this as well. By the way, there're also some other executors ignoring the visibility. :(

@@ -42,7 +42,7 @@ impl StreamTableScan {
ctx,
logical.schema().clone(),
logical.base.pk_indices.clone(),
Distribution::Single,
Distribution::HashShard(logical.base.pk_indices.clone()),
Copy link
Member

@BugenZhao BugenZhao Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also change the distribution in the Java frontend? I'm afraid the current e2e result cannot cover some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a workaround for Java frontend in fragmenter (because I'm not familiar with Java part 😅). Will remove this after we deprecate Java frontend.

@zbzbw zbzbw marked this pull request as draft April 15, 2022 03:12
@zbzbw
Copy link
Contributor Author

zbzbw commented Apr 15, 2022

I probably didn't catch up the multi-dispatcher part 😢, will reopen it later.

@BugenZhao
Copy link
Member

According to the dashboard graph, there's still a hash dispatcher after each chain. So it seems the distribution of Chain is not used. The final solution should be inserting exchanges after the BatchQuery or using the partition scan to ensure the same distribution between the chain and the scan. You may check this doc for more details.

Anyway, as long as there's exchange after chain, the result will be correct.

@skyzh
Copy link
Contributor

skyzh commented Apr 15, 2022

I probably didn't catch up the multi-dispatcher part 😢, will reopen it later.

Multi-dispatcher can definitely help the implementation of this PR. But it's not well-tested yet -- at least compute-node doesn't support multi-dispatcher.

A possible approach is to always follow the distribution of upstream materialize executor, and therefore, the dispatcher for them can be "broadcast", and downstream need shuffle after table scan.

Bowen Zhou added 2 commits April 15, 2022 12:04
Signed-off-by: Bowen Zhou <bowenzhou@singularity-data.com>
Signed-off-by: Bowen Zhou <bowenzhou@singularity-data.com>
@yezizp2012
Copy link
Contributor

yezizp2012 commented Apr 15, 2022

You can force a chain singleton to be specified in the StreamManagerService on the meta (these requests come from the Java frontend) to avoid modifying the Java frontend. By this way the workload is minimal and pass the Java e2e test.

@fuyufjh
Copy link
Contributor

fuyufjh commented Apr 15, 2022

We should change the batch query node to scan the table by range after we figured out how to split table into partitions in a good way.

Please just use consistent hashing to partition the batch query. See more on Proposal: Use Consistent Hash Across the System.

Also, in this way, no exchange would be needed for Chain.

Bowen Zhou and others added 3 commits April 15, 2022 15:36
Signed-off-by: Bowen Zhou <bowenzhou@singularity-data.com>
fmt
Signed-off-by: Bowen Zhou <bowenzhou@singularity-data.com>
@zbzbw zbzbw marked this pull request as ready for review April 15, 2022 08:25
@skyzh
Copy link
Contributor

skyzh commented Apr 15, 2022

Also, in this way, no exchange would be needed for Chain.

No, we still need it.

  • We cannot guarantee that upstream has the same actor number as chain actors.
  • Even if we have the same actor, currently we didn't guarantee that materialize node uses hash distribution by hash of materialize key. Currently, it simply follows input's distribution.

https://github.com/singularity-data/risingwave/blob/7f9911a8603b06014bdb46efee55766f49ea5064/src/frontend/src/optimizer/plan_node/stream_materialize.rs#L55

@skyzh
Copy link
Contributor

skyzh commented Apr 15, 2022

Also, materialize stream node is created after enforcing distribution. Need refactor the create MV optimize process to make everything work.

https://github.com/singularity-data/risingwave/blob/7f9911a8603b06014bdb46efee55766f49ea5064/src/frontend/src/optimizer/mod.rs#L200-L206

@yezizp2012
Copy link
Contributor

After offline discussion, we will merge this PR first to make it runnable. After this, @zbzbw will try to refine the batch query scan logic to adjust consistent hashing distribution in depended mv.

@fuyufjh
Copy link
Contributor

fuyufjh commented Apr 15, 2022

We cannot guarantee that upstream has the same actor number as chain actors.

Agree.

Even if we have the same actor, currently we didn't guarantee that materialize node uses hash distribution by hash of materialize key. Currently, it simply follows input's distribution.

Hmmmm... Should be guaranteed, I think.

@fuyufjh
Copy link
Contributor

fuyufjh commented Apr 15, 2022

After offline discussion, we will merge this PR first to make it runnable. After this, @zbzbw will try to refine the batch query scan logic to adjust consistent hashing distribution in depended mv.

For others not in the discussion:

The stream merged from batch query and upstream looks weird to me because it's under a weird distribution, neither distributed by streaming nor batch.

We may refine this later by letting the batch query scan data with the same distribution of the upstream Materialize. In this way, the distribution can be expressed as Hash(xxx) in optimizer, and optimizer can further determine whether it needs another Exchange.

@yezizp2012
Copy link
Contributor

yezizp2012 commented Apr 15, 2022

  • Even if we have the same actor, currently we didn't guarantee that materialize node uses hash distribution by hash of materialize key. Currently, it simply follows input's distribution.

https://github.com/singularity-data/risingwave/blob/7f9911a8603b06014bdb46efee55766f49ea5064/src/frontend/src/optimizer/plan_node/stream_materialize.rs#L55

That's what I concerned before too. 🤔 So we choice to add an exchange right after chain in current implementation.

@skyzh
Copy link
Contributor

skyzh commented Apr 15, 2022

merge?

@zbzbw zbzbw merged commit ae1c231 into main Apr 15, 2022
@zbzbw zbzbw deleted the zbw/parallel-chain-v2 branch April 15, 2022 08:47
@MrCroxx MrCroxx mentioned this pull request Jul 22, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants