-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): unnesting arbitrary subquery. #2880
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 811 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
806 | 4 | 1 | 0 |
Click to see the invalid file list
- src/frontend/src/optimizer/rule/apply_agg.rs
- src/frontend/src/optimizer/rule/apply_filter.rs
- src/frontend/src/optimizer/rule/apply_proj.rs
- src/frontend/src/optimizer/rule/apply_scan.rs
2610ed7
to
a46e457
Compare
Codecov Report
@@ Coverage Diff @@
## main #2880 +/- ##
==========================================
+ Coverage 74.35% 74.41% +0.06%
==========================================
Files 776 780 +4
Lines 110210 110634 +424
==========================================
+ Hits 81942 82333 +391
- Misses 28268 28301 +33
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
Generally LGTM |
LogicalFilter { predicate: ($1 > $3) } | ||
LogicalApply { type: LeftOuter, on: true } | ||
LogicalFilter { predicate: ($1 > $6) } | ||
LogicalJoin { type: LeftOuter, on: ($2 = $3) AND ($1 = $4) AND ($2 = $5) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the logical plan is changed? I thought unnesting should happen on optimizer_logical_plan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical planner is also updated from generating Apply with outer and condition to only generating condition-less inner Apply.
src/frontend/src/expr/agg_call.rs
Outdated
@@ -23,7 +23,7 @@ use super::{Expr, ExprImpl}; | |||
pub struct AggCall { | |||
agg_kind: AggKind, | |||
return_type: DataType, | |||
inputs: Vec<ExprImpl>, | |||
pub inputs: Vec<ExprImpl>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe input_mut()
? If we make the field public, code editors will start suggesting .input
and .input()
at the same time ...
pub index: usize, | ||
pub data_type: DataType, | ||
pub depth: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems only needed by get_and_change_correlated_input_ref
to set index. Given it is a relatively lightweight leaf node, what about *correlated_input_ref = CorrelatedInputRef::new(...)
without exposing the fields?
.for_each(|expr| self.mut_expr(expr)) | ||
} | ||
fn mut_literal(&mut self, _: &mut Literal) {} | ||
fn change_input_ref(&mut self, _: &mut InputRef) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change_input_ref
-> mut_input_ref
// Use `rewrite_agg`, `rewrite_project`, `rewrite_join` and `rewrite_scan` to remove | ||
// useless columns. | ||
let left = apply.left(); | ||
let new_left = left | ||
.as_logical_agg() | ||
.map(|agg| agg.rewrite_agg().unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just call agg.prune_col(0..agg.apply_left_len)
? It seems rewrite_agg
, rewrite_project
, rewrite_join
and rewrite_scan
are similar to prune_col and may be buggy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If one child of join output no column, we should eliminate this join and return its another child. This behavior is necessary in this case, but it's not the duty of prune_col. Besides, we can't do this JoinElimination in common case because of the unalignment with PG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had an "offline" sync with @likg227 on this and there are 2 ways to proceed:
Reusing prune_col
does produce a different result set than rewrite_agg
, and is not the original intention of planning AggDistinct-Project
. But it might still be correct for the overall query. We need to carefully revisit the definition of "Domain" and validate the correctness of this approach.
If we do keep the current behavior, it is important to refactor and reuse prune_col
as much as possible. Duplicating logic can be error prone and hard to maintain - the current impl of rewrite_scan
and rewrite_join
is already buggy in some cases, and prune_col
already went thru several fixes (#863 #1205 #2328 #2347 #2363 #2889)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Decided to approve this as suggested by @st1page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hey @likg227, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by commenting with |
Hey @likg227, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by commenting with |
@Mergifyio requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
@likg227 I wasn't able to identify the user-facing changes in this PR. Could you please provide a summary of the changes that are visible to users? Thanks! |
@hengm3467 Hi. This pr will change the plan when input sql query has correlated subquery, and user can see it via |
* translate apply. * apply_agg apply_filter apply_proj apply_scan * add optimization to eliminate domain. * fix bug. * fix some bugs. * query plan fix. * small fix. * remove failed tests. * add comments. * split unnesting into two-phase. * refine code and add comments. * add project to ensure mapping. * rewrite on and rewrite predicate in ApplyFilter. * small fix. * add comments. * dummy commit. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
What's changed and what's your intention?
Please explain IN DETAIL what the changes are in this PR and why they are needed:
This pr is still under development, because there are some bugs that have not been fixed and some code still needs to be refined. Nevertheless, welcome to review this pr and comment.
This pr tries to use a more flexible approach to unnest subquery according to this paper. Here's the contributions of this pr:
It will first translate the Apply in the planner and then use these four rules in optimizer to remove the apply.
We really need to a lot of comments and docs to make it readable.Refine some code using existing utils.Fix already know bugs and pass all the tests.This test passes locally(./risedev d) but fails on CI(ci-3cn-1fe) cc@yuhao-su.Think more carefully about the subtle and tricky places of subquery.Checklist
I have added necessary unit tests and integration testsRefer to a related PR or issue link (optional)
close #2768