Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(optimizer): consider impure expression in predicate push down of Project #9133

Merged
merged 17 commits into from
Apr 18, 2023

Conversation

st1page
Copy link
Contributor

@st1page st1page commented Apr 12, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

dev=> create function gcd(int, int) returns int language python as gcd using link 'http://localhost:8815';
dev=> explain with cte as ( select gcd(a,b) as gcd, a+b as plus from t) select * from cte where plus>0 and gcd =1 and (plus>gcd); 
 BatchExchange { order: [], dist: Single }
 └─BatchFilter { predicate: ($expr1 = 1:Int32) AND ($expr2 > $expr1) }
   └─BatchProject { exprs: [UserDefinedFunction { args: [$0, $1], catalog: FunctionCatalog { id: FunctionId(0), name: "gcd", owner: 1, kind: Scalar, arg_types: [Int32, Int32], return_type: Int32, language: "python", identifier: "gcd", link: "http://localhost:8815" } } as $expr1, (t.a + t.b) as $expr2] }
     └─BatchFilter { predicate: ((t.a + t.b) > 0:Int32) }
       └─BatchScan { table: t, columns: [a, b] }
(5 rows)

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@github-actions github-actions bot added the type/fix Bug fix label Apr 12, 2023
@st1page
Copy link
Contributor Author

st1page commented Apr 12, 2023

can we have a kind of mock of udf server to test UDF's plan in planner test? @wangrunji0408

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 3233 files.

Valid Invalid Ignored Fixed
1494 1 1738 0
Click to see the invalid file list
  • src/frontend/src/expr/volatility.rs

src/frontend/src/expr/volatility.rs Outdated Show resolved Hide resolved
@st1page st1page marked this pull request as draft April 12, 2023 10:29
@wangrunji0408
Copy link
Contributor

wangrunji0408 commented Apr 12, 2023

can we have a kind of mock of udf server to test UDF's plan in planner test? @wangrunji0408

Perhaps we can disable the service availability check on create-function in planner test. 🤔

@codecov
Copy link

codecov bot commented Apr 12, 2023

Codecov Report

Merging #9133 (507c92f) into main (6485fee) will decrease coverage by 0.01%.
The diff coverage is 92.00%.

@@            Coverage Diff             @@
##             main    #9133      +/-   ##
==========================================
- Coverage   70.78%   70.78%   -0.01%     
==========================================
  Files        1207     1208       +1     
  Lines      201326   201386      +60     
==========================================
+ Hits       142508   142545      +37     
- Misses      58818    58841      +23     
Flag Coverage Δ
rust 70.78% <92.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/expr/function_call.rs 88.62% <ø> (-0.12%) ⬇️
src/frontend/src/expr/mod.rs 80.03% <50.00%> (-0.30%) ⬇️
src/frontend/src/expr/pure.rs 94.73% <94.73%> (ø)
...c/frontend/src/optimizer/plan_node/logical_join.rs 89.60% <100.00%> (-0.06%) ⬇️
...rontend/src/optimizer/plan_node/logical_project.rs 97.93% <100.00%> (+0.05%) ⬆️

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@st1page st1page changed the title fix(optimizer): consider volatile expression in predicate push down of Project fix(optimizer): consider impure expression in predicate push down of Project Apr 12, 2023
@st1page st1page marked this pull request as ready for review April 12, 2023 12:01
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my - this is scary. Are there any other optimizations we should be careful about?

@st1page
Copy link
Contributor Author

st1page commented Apr 13, 2023

Oh my - this is scary. Are there any other optimizations we should be careful about?

I am not sure 🥵 I think we can only find them case by case currently. Do you have any idea to handle the impure function in the optimizer? @chenzl25

@st1page st1page requested a review from jon-chuang April 13, 2023 08:52
@chenzl25
Copy link
Contributor

Oh my - this is scary. Are there any other optimizations we should be careful about?

I am not sure 🥵 I think we can only find them case by case currently. Do you have any idea to handle the impure function in the optimizer? @chenzl25

Another concern is ApplyProjectTransposeRule where impure functions could be used in the subquery.

@xxchan
Copy link
Member

xxchan commented Apr 13, 2023

I came up with another one: ProjectMerge, which inlines computation and is the reverse of ColumnDedup/CSE. It adds number of calls, while the latter reduces number of calls. 😅

dev=> explain create materialized view mv as  select y, y as y2 from (select abs(x) as y from t);
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [y, y2, t._row_id(hidden)], stream_key: [t._row_id], pk_columns: [t._row_id], pk_conflict: "NoCheck" }
 └─StreamProject { exprs: [Abs(t.x) as $expr1, Abs(t.x) as $expr2, t._row_id] }
   └─StreamTableScan { table: t, columns: [x, _row_id] }
(3 rows)

@chenzl25
Copy link
Contributor

chenzl25 commented Apr 14, 2023

I came up with another one: ProjectMerge, which inlines computation and is the reverse of ColumnDedup/CSE. It adds number of calls, while the latter reduces number of calls. 😅

dev=> explain create materialized view mv as  select y, y as y2 from (select abs(x) as y from t);
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [y, y2, t._row_id(hidden)], stream_key: [t._row_id], pk_columns: [t._row_id], pk_conflict: "NoCheck" }
 └─StreamProject { exprs: [Abs(t.x) as $expr1, Abs(t.x) as $expr2, t._row_id] }
   └─StreamTableScan { table: t, columns: [x, _row_id] }
(3 rows)

Indeed.ProjectMergeRule is supposed to reduce projects to increase performance, but it could add more expressions in the end. Some fixes are needed.

@st1page
Copy link
Contributor Author

st1page commented Apr 14, 2023

How about merging this PR first and we can solve other issues later? @chenzl25 @xxchan

@chenzl25
Copy link
Contributor

How about merging this PR first and we can solve other issues later? @chenzl25 @xxchan

Sure

@st1page st1page added this pull request to the merge queue Apr 18, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 18, 2023
@st1page st1page enabled auto-merge April 18, 2023 05:41
@st1page st1page added this pull request to the merge queue Apr 18, 2023
Merged via the queue into main with commit cd31047 Apr 18, 2023
@st1page st1page deleted the sts/fix_consider_volatility_when_predicate_push_down branch April 18, 2023 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants