Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): refactor multijoin(part 1) #3351

Merged
merged 11 commits into from
Jun 23, 2022

Conversation

st1page
Copy link
Contributor

@st1page st1page commented Jun 20, 2022

What's changed and what's your intention?

this refactor is for #3013. if the multi join will not only be a short-lived operator, we should implement more methods like prune_col pk_derive and clone_with_inputs to like it can be used in the heuristic optimizer.

  • add output_indices field for the logicalMultiJoin node and derive the schema and pk in the LogicalMultiJoin::new also calculate some mappings which might be used in other methods like prune_col and rewrite_for_stream.
  • because the above part makes the LogicalMultiJoin::new more expensive, change the merge multijoin method. implement a multijoin builder which travels the plan tree with Post-Order and build the multijoin

this PR does not change the join-reorder algorithm but some join order in tpch plan is changed, I am still investing what happened.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

@codecov
Copy link

codecov bot commented Jun 20, 2022

Codecov Report

Merging #3351 (166a37b) into main (0536069) will increase coverage by 0.00%.
The diff coverage is 95.60%.

@@           Coverage Diff           @@
##             main    #3351   +/-   ##
=======================================
  Coverage   73.59%   73.59%           
=======================================
  Files         766      765    -1     
  Lines      104914   104937   +23     
=======================================
+ Hits        77210    77231   +21     
- Misses      27704    27706    +2     
Flag Coverage Δ
rust 73.59% <95.60%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/optimizer/plan_node/mod.rs 97.91% <ø> (ø)
...tend/src/optimizer/plan_node/logical_multi_join.rs 84.34% <94.90%> (+8.26%) ⬆️
src/frontend/src/optimizer/mod.rs 94.17% <100.00%> (ø)
...rontend/src/optimizer/plan_node/logical_project.rs 98.97% <100.00%> (+0.03%) ⬆️
src/frontend/src/optimizer/rule/merge_multijoin.rs 100.00% <100.00%> (ø)
...c/frontend/src/optimizer/rule/reorder_multijoin.rs 100.00% <100.00%> (ø)
src/frontend/src/utils/connected_components.rs 96.42% <0.00%> (-2.39%) ⬇️
src/meta/src/hummock/mock_hummock_meta_client.rs 40.56% <0.00%> (-0.95%) ⬇️
src/storage/src/hummock/local_version_manager.rs 84.37% <0.00%> (-0.16%) ⬇️
... and 1 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@@ -35,6 +38,18 @@ pub struct LogicalMultiJoin {
pub base: PlanBase,
inputs: Vec<PlanRef>,
on: Condition,
output_indices: Vec<usize>,
// XXX(st1page): these fields will be used in prune_col and pk_derive soon.
#[allow(unused)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect(dead_code)

ExprImpl::InputRef(input_ref) => Some(input_ref.index),
_ => None,
})
.collect::<Option<Vec<_>>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case will it return None? Shall we report error instead of returning None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all project node is a "projection" operation (all expressions are input_ref), and the None case is general and I will add some comments here.

@st1page st1page requested a review from jon-chuang June 22, 2022 04:44
@skyzh
Copy link
Contributor

skyzh commented Jun 22, 2022

can merge?

@chinawch007
Copy link

Excuse me. I think MultiJoinRule transforms some node about join into multi-join node. Is this a correct understanding?

@st1page
Copy link
Contributor Author

st1page commented Jun 23, 2022

can merge?

I still do not get why the tpch plan is changed... but the plan is correct and e2e has passed so I will update the branch and if ci is ok, I will merge

@st1page
Copy link
Contributor Author

st1page commented Jun 23, 2022

Excuse me. I think MultiJoinRule transforms some node about join into multi-join node. Is this a correct understanding?

Yes, It is for join reorder in our system now. And later we will see if it can help in streaming optimization phase.

@skyzh
Copy link
Contributor

skyzh commented Jun 23, 2022

IIRC mostly join order changes?

@st1page
Copy link
Contributor Author

st1page commented Jun 23, 2022

IIRC mostly join order changes?

All. and the join order does not go worse. It is strange but harmless.

@st1page st1page enabled auto-merge (squash) June 23, 2022 03:08
@skyzh skyzh added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jun 23, 2022
@skyzh skyzh disabled auto-merge June 23, 2022 03:11
@mergify mergify bot merged commit 4514615 into main Jun 23, 2022
@mergify mergify bot deleted the sts/opt_multi_join_output_index branch June 23, 2022 03:11
@chinawch007
Copy link

Excuse me. I think MultiJoinRule transforms some node about join into multi-join node. Is this a correct understanding?

Yes, It is for join reorder in our system now. And later we will see if it can help in streaming optimization phase.

Thanks for answering me. I saw the comments in merge_multijoin.rs, so does MultiJoinRule reallly merges some nodes to one on logical plan tree?

@st1page
Copy link
Contributor Author

st1page commented Jun 23, 2022

does MultiJoinRule reallly merges some nodes to one on logical plan tree

yes, it rewrite the plan tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants