Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(optimizer): refactor logicalAgg::create #3533

Merged
merged 15 commits into from
Jul 2, 2022

Conversation

st1page
Copy link
Contributor

@st1page st1page commented Jun 29, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • add the LogicalProjectBuilder for dedup the expression and build a project
  • refactor and clean the LogicalAgg::create

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

btw, it was for distinct agg rewriting, but now I think maybe we can do the distinct rewriting as a rule?

@st1page st1page requested a review from likg227 June 29, 2022 02:50
@codecov
Copy link

codecov bot commented Jun 29, 2022

Codecov Report

Merging #3533 (36e5413) into main (9d99aa5) will increase coverage by 0.01%.
The diff coverage is 92.70%.

@@            Coverage Diff             @@
##             main    #3533      +/-   ##
==========================================
+ Coverage   74.29%   74.30%   +0.01%     
==========================================
  Files         773      773              
  Lines      109399   109438      +39     
==========================================
+ Hits        81273    81323      +50     
+ Misses      28126    28115      -11     
Flag Coverage Δ
rust 74.30% <92.70%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/optimizer/plan_node/mod.rs 97.91% <ø> (ø)
...rontend/src/optimizer/plan_node/logical_project.rs 98.07% <85.00%> (-0.90%) ⬇️
...rc/frontend/src/optimizer/plan_node/logical_agg.rs 92.88% <93.93%> (-0.05%) ⬇️
src/frontend/src/utils/column_index_mapping.rs 79.64% <100.00%> (+0.94%) ⬆️
src/meta/src/manager/id.rs 95.50% <0.00%> (-0.57%) ⬇️
src/frontend/src/expr/utils.rs 98.99% <0.00%> (-0.26%) ⬇️
src/storage/src/hummock/local_version_manager.rs 81.08% <0.00%> (+1.23%) ⬆️
src/meta/src/hummock/mock_hummock_meta_client.rs 41.50% <0.00%> (+5.66%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@likg227 likg227 requested a review from xiangjinwu June 29, 2022 03:56
struct LogicalAggBuilder {
/// the builder of the input Project
input_proj_builder: LogicalProjectBuilder,
/// the column [0..input_group_key_num) in the project's output is the group key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment may not be true when there are duplicates in group_exprs, and can cause a panic:

with t(v1) as (values (1), (2), (1)) select v1 from t group by v1, v1;

More details in a prior discussion:
#1492 (comment)

(tbh I have not read the code yet. Just tested this case and got a panic. Maybe the root cause is another issue, as we should have added a regression test on duplicate group_exprs ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because the colprune can not handle the plan with same group key and I have fixed it. Also add some regression test.

@st1page st1page changed the title refactor(optimizer): refactor logicalAgg::create to prepare for distinct refactor(optimizer): refactor logicalAgg::create Jun 29, 2022
@st1page st1page requested a review from xiangjinwu July 1, 2022 07:10
Copy link
Contributor

@likg227 likg227 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/frontend/src/optimizer/plan_node/logical_project.rs Outdated Show resolved Hide resolved
/// having clause.
struct LogicalAggBuilder {
/// the builder of the input Project
input_proj_builder: LogicalProjectBuilder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input_proj_builder: LogicalProjectBuilder,
proj_builder: LogicalProjectBuilder,

/// the builder of the input Project
input_proj_builder: LogicalProjectBuilder,
/// the column [0..input_group_key_num) in the project's output is the group key
input_group_key_num: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input_group_key_num: usize,
group_key_num: usize,

st1page and others added 2 commits July 1, 2022 15:52
@st1page st1page added component/frontend Protocol, parsing, binder. mergify/can-merge Indicates that the PR can be added to the merge queue and removed mergify/can-merge Indicates that the PR can be added to the merge queue labels Jul 1, 2022
@skyzh
Copy link
Contributor

skyzh commented Jul 2, 2022

@mergify requeue

@mergify
Copy link
Contributor

mergify bot commented Jul 2, 2022

requeue

❌ This pull request head commit has not been previously disembarked from queue.

@mergify mergify bot merged commit d187c37 into main Jul 2, 2022
@mergify mergify bot deleted the sts/optimizer_refactor_logical_agg_prepare_for_distinct branch July 2, 2022 10:22
Comment on lines +362 to +363
LogicalProject { exprs: [$1, $2, $0, $3] }
LogicalAgg { group_keys: [0, 1, 1], agg_calls: [min($2), max($2)] }
Copy link
Contributor

@xiangjinwu xiangjinwu Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still incorrect. The proj exprs should be [$1, $3, $0, $4], or we need to dedup group_keys to [0, 1].

Repro:

create table t(v1 int, v2 int, v3 int) with ('appendonly' = false);
insert into t values (1, 2, 3);
select v2, min(v1) as min_v1, v3, max(v1) as max_v1 from t group by v3, v2, v2;

Actual incorrect output:

 v2 | min_v1 | v3 | max_v1 
----+--------+----+--------
  2 |      2 |  3 |      1
(1 row)

Or even panic:

explain select v2, sum(v1) from t group by v3, v2, v2;

huangjw806 pushed a commit that referenced this pull request Jul 5, 2022
* add logical project builder for the expression dedup

* refactor logicalAgg::create with logical builder

* add clippy fix

* add comments

* add planner test for dup group key

* fix when exist duplicate group key

* fix prune col for logical agg with dup group key

* conflict

* clippy fix

* Update src/frontend/src/optimizer/plan_node/logical_project.rs

Co-authored-by: Kaige Li <55606560+likg227@users.noreply.github.com>

Co-authored-by: Kaige Li <55606560+likg227@users.noreply.github.com>
Co-authored-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
nasnoisaac pushed a commit to nasnoisaac/risingwave that referenced this pull request Aug 9, 2022
* add logical project builder for the expression dedup

* refactor logicalAgg::create with logical builder

* add clippy fix

* add comments

* add planner test for dup group key

* fix when exist duplicate group key

* fix prune col for logical agg with dup group key

* conflict

* clippy fix

* Update src/frontend/src/optimizer/plan_node/logical_project.rs

Co-authored-by: Kaige Li <55606560+likg227@users.noreply.github.com>

Co-authored-by: Kaige Li <55606560+likg227@users.noreply.github.com>
Co-authored-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/frontend Protocol, parsing, binder. mergify/can-merge Indicates that the PR can be added to the merge queue type/refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants