Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): unnesting arbitrary subquery. #2880

Merged
merged 19 commits into from
Jul 5, 2022
Merged

feat(frontend): unnesting arbitrary subquery. #2880

merged 19 commits into from
Jul 5, 2022

Conversation

likg227
Copy link
Contributor

@likg227 likg227 commented May 27, 2022

What's changed and what's your intention?

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • Summarize your change (mandatory)

This pr is still under development, because there are some bugs that have not been fixed and some code still needs to be refined. Nevertheless, welcome to review this pr and comment.

This pr tries to use a more flexible approach to unnest subquery according to this paper. Here's the contributions of this pr:

  1. Translate the original Apply into a corresponding type of Join and a new Cross Apply using a new trait ExprVisitorMut to mutate expression and introduce DAG for correctness.
  2. Add ApplyAgg, ApplyFilter, ApplyProj, ApplyScan to push Apply down. Specifically, when its right input is Scan or Join, ApplyScan will replace the Apply with Join or other operators, so maybe I should give it a better name later.
  3. Add rewrite_agg, rewrite_proj, rewrite_join, rewrite_scan to deal with Apply's left input, when we can't eliminate the DAG.
  • How does this PR work? Need a brief introduction for the changed logic (optional)

It will first translate the Apply in the planner and then use these four rules in optimizer to remove the apply.

  • Describe any limitations of the current code (optional)
  1. We really need to a lot of comments and docs to make it readable.
  2. Add a new traversal approach to apply the four rules above as existing traversal is not good enough to apply these four rules perfectly.
  3. Refine some code using existing utils.
  4. Fix already know bugs and pass all the tests. This test passes locally(./risedev d) but fails on CI(ci-3cn-1fe) cc@yuhao-su.
  5. Think more carefully about the subtle and tricky places of subquery.
  • Add the 'user-facing changes' label if your PR contains changes that are visible to users (optional)

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

close #2768

@likg227 likg227 changed the title feat(frontend): unnesting arbitrary subquery. [WIP]feat(frontend): unnesting arbitrary subquery. May 28, 2022
@likg227 likg227 changed the title [WIP]feat(frontend): unnesting arbitrary subquery. [WIP] feat(frontend): unnesting arbitrary subquery. May 28, 2022
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 811 files.

Valid Invalid Ignored Fixed
806 4 1 0
Click to see the invalid file list
  • src/frontend/src/optimizer/rule/apply_agg.rs
  • src/frontend/src/optimizer/rule/apply_filter.rs
  • src/frontend/src/optimizer/rule/apply_proj.rs
  • src/frontend/src/optimizer/rule/apply_scan.rs

@likg227 likg227 force-pushed the lkg/hyper branch 3 times, most recently from 2610ed7 to a46e457 Compare May 29, 2022 11:21
@codecov
Copy link

codecov bot commented May 29, 2022

Codecov Report

Merging #2880 (8b6bc8d) into main (c71493a) will increase coverage by 0.06%.
The diff coverage is 94.13%.

@@            Coverage Diff             @@
##             main    #2880      +/-   ##
==========================================
+ Coverage   74.35%   74.41%   +0.06%     
==========================================
  Files         776      780       +4     
  Lines      110210   110634     +424     
==========================================
+ Hits        81942    82333     +391     
- Misses      28268    28301      +33     
Flag Coverage Δ
rust 74.41% <94.13%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/expr/correlated_input_ref.rs 82.14% <0.00%> (-9.86%) ⬇️
src/frontend/src/optimizer/rule/translate_apply.rs 88.00% <88.00%> (ø)
src/frontend/src/optimizer/rule/apply_scan.rs 93.84% <93.84%> (ø)
src/frontend/src/binder/query.rs 99.03% <100.00%> (+0.03%) ⬆️
src/frontend/src/binder/select.rs 96.18% <100.00%> (+0.34%) ⬆️
src/frontend/src/binder/set_expr.rs 64.51% <100.00%> (+2.97%) ⬆️
src/frontend/src/expr/mod.rs 85.99% <100.00%> (-2.19%) ⬇️
src/frontend/src/expr/subquery.rs 55.88% <100.00%> (+9.45%) ⬆️
src/frontend/src/optimizer/mod.rs 94.56% <100.00%> (+0.39%) ⬆️
.../frontend/src/optimizer/plan_node/logical_apply.rs 79.78% <100.00%> (+6.71%) ⬆️
... and 16 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@likg227 likg227 marked this pull request as ready for review May 30, 2022 08:02
@likg227 likg227 changed the title [WIP] feat(frontend): unnesting arbitrary subquery. feat(frontend): unnesting arbitrary subquery. May 30, 2022
@skyzh skyzh added the user-facing-changes Contains changes that are visible to users label May 30, 2022
@fuyufjh
Copy link
Contributor

fuyufjh commented May 31, 2022

Generally LGTM

LogicalFilter { predicate: ($1 > $3) }
LogicalApply { type: LeftOuter, on: true }
LogicalFilter { predicate: ($1 > $6) }
LogicalJoin { type: LeftOuter, on: ($2 = $3) AND ($1 = $4) AND ($2 = $5) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the logical plan is changed? I thought unnesting should happen on optimizer_logical_plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logical planner is also updated from generating Apply with outer and condition to only generating condition-less inner Apply.

@@ -23,7 +23,7 @@ use super::{Expr, ExprImpl};
pub struct AggCall {
agg_kind: AggKind,
return_type: DataType,
inputs: Vec<ExprImpl>,
pub inputs: Vec<ExprImpl>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe input_mut()? If we make the field public, code editors will start suggesting .input and .input() at the same time ...

Comment on lines 29 to 31
pub index: usize,
pub data_type: DataType,
pub depth: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems only needed by get_and_change_correlated_input_ref to set index. Given it is a relatively lightweight leaf node, what about *correlated_input_ref = CorrelatedInputRef::new(...) without exposing the fields?

.for_each(|expr| self.mut_expr(expr))
}
fn mut_literal(&mut self, _: &mut Literal) {}
fn change_input_ref(&mut self, _: &mut InputRef) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change_input_ref -> mut_input_ref

Comment on lines 76 to 81
// Use `rewrite_agg`, `rewrite_project`, `rewrite_join` and `rewrite_scan` to remove
// useless columns.
let left = apply.left();
let new_left = left
.as_logical_agg()
.map(|agg| agg.rewrite_agg().unwrap())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just call agg.prune_col(0..agg.apply_left_len)? It seems rewrite_agg, rewrite_project, rewrite_join and rewrite_scan are similar to prune_col and may be buggy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If one child of join output no column, we should eliminate this join and return its another child. This behavior is necessary in this case, but it's not the duty of prune_col. Besides, we can't do this JoinElimination in common case because of the unalignment with PG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had an "offline" sync with @likg227 on this and there are 2 ways to proceed:

Reusing prune_col does produce a different result set than rewrite_agg, and is not the original intention of planning AggDistinct-Project. But it might still be correct for the overall query. We need to carefully revisit the definition of "Domain" and validate the correctness of this approach.

If we do keep the current behavior, it is important to refactor and reuse prune_col as much as possible. Duplicating logic can be error prone and hard to maintain - the current impl of rewrite_scan and rewrite_join is already buggy in some cases, and prune_col already went thru several fixes (#863 #1205 #2328 #2347 #2363 #2889)

Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Decided to approve this as suggested by @st1page

@likg227 likg227 added mergify/can-merge Indicates that the PR can be added to the merge queue and removed mergify/can-merge Indicates that the PR can be added to the merge queue labels Jul 4, 2022
Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@likg227 likg227 added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jul 5, 2022
@mergify
Copy link
Contributor

mergify bot commented Jul 5, 2022

Hey @likg227, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by commenting with @mergifyio requeue. More details can be found on the Queue: Embarked in merge train check-run.

@mergify
Copy link
Contributor

mergify bot commented Jul 5, 2022

Hey @likg227, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by commenting with @mergifyio requeue. More details can be found on the Queue: Embarked in merge train check-run.

@likg227
Copy link
Contributor Author

likg227 commented Jul 5, 2022

@Mergifyio requeue

@mergify
Copy link
Contributor

mergify bot commented Jul 5, 2022

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

@mergify mergify bot merged commit 8c31de0 into main Jul 5, 2022
@mergify mergify bot deleted the lkg/hyper branch July 5, 2022 05:29
@hengm3467
Copy link
Contributor

@likg227 I wasn't able to identify the user-facing changes in this PR. Could you please provide a summary of the changes that are visible to users? Thanks!

@likg227
Copy link
Contributor Author

likg227 commented Jul 18, 2022

@hengm3467 Hi. This pr will change the plan when input sql query has correlated subquery, and user can see it via explain.

nasnoisaac pushed a commit to nasnoisaac/risingwave that referenced this pull request Aug 9, 2022
* translate apply.

* apply_agg
apply_filter
apply_proj
apply_scan

* add optimization to eliminate domain.

* fix bug.

* fix some bugs.

* query plan fix.

* small fix.

* remove failed tests.

* add comments.

* split unnesting into two-phase.

* refine code and add comments.

* add project to ensure mapping.

* rewrite on and rewrite predicate in ApplyFilter.

* small fix.

* add comments.

* dummy commit.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: unexpected dedup during decorrelation
7 participants