Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): Common sub-plan detection. #7865

Merged
merged 28 commits into from Feb 23, 2023
Merged

feat(optimizer): Common sub-plan detection. #7865

merged 28 commits into from Feb 23, 2023

Conversation

wsx-ucb
Copy link
Contributor

@wsx-ucb wsx-ucb commented Feb 13, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Attempt to resolve #7360.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

#6955

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 2745 files.

Valid Invalid Ignored Fixed
1299 1 1445 0
Click to see the invalid file list
  • src/frontend/src/optimizer/plan_node/merge_eq_nodes.rs

@wsx-ucb
Copy link
Contributor Author

wsx-ucb commented Feb 13, 2023

The current design requires PlanRef to implement Eq + Hash.
The code currently does not compile, and there are some issues I would like to discuss:

  1. Simply requiring Eq + Hash on PlanNode violates the object safety rules of Rust.
    As a workaround, DynEq and DynHash are introduced as auxillary traits to achive object safety.
  2. Implementing Eq + Hash for all plan nodes requires deriving those traits for almost all data structure involved.
    But some data structures use HashMap, which has no implementation of Hash, and I restore to BTreeMap when possible to workaround the issue.
  3. The Share nodes seems to insist on using RefCell, which entails interior mutability and has no implementation of Hash as well.
    Are there any reason why such usage is encourage in the otherwise immutable plan node design?

@chenzl25
Copy link
Contributor

  1. The Share nodes seems to insist on using RefCell, which entails interior mutability and has no implementation of Hash as well.
    Are there any reason why such usage is encourage in the otherwise immutable plan node design?

We have a method clone_with_inputs for each plan node, while for Share node, if we clone a new one, it will break DAG into Tree again. To work around , we use replace_input instead.

@wsx-ucb wsx-ucb marked this pull request as ready for review February 20, 2023 05:45
@chenzl25 chenzl25 changed the title [WIP]: Common subplan detection. feat(optimizer): Common sub-plan detection. Feb 20, 2023
chenzl25 and others added 3 commits February 20, 2023 20:49
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Merging #7865 (ef76ff7) into main (b39ae68) will increase coverage by 0.01%.
The diff coverage is 77.46%.

@@            Coverage Diff             @@
##             main    #7865      +/-   ##
==========================================
+ Coverage   71.62%   71.64%   +0.01%     
==========================================
  Files        1131     1132       +1     
  Lines      182002   182188     +186     
==========================================
+ Hits       130361   130529     +168     
- Misses      51641    51659      +18     
Flag Coverage Δ
rust 71.64% <77.46%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/lib.rs 44.00% <ø> (ø)
...c/frontend/src/optimizer/plan_node/batch_delete.rs 78.37% <0.00%> (ø)
...frontend/src/optimizer/plan_node/batch_exchange.rs 70.21% <0.00%> (ø)
...c/frontend/src/optimizer/plan_node/batch_expand.rs 48.83% <0.00%> (ø)
...c/frontend/src/optimizer/plan_node/batch_filter.rs 98.07% <0.00%> (ø)
...ontend/src/optimizer/plan_node/batch_group_topn.rs 65.21% <0.00%> (ø)
...frontend/src/optimizer/plan_node/batch_hash_agg.rs 80.80% <0.00%> (ø)
...rontend/src/optimizer/plan_node/batch_hash_join.rs 91.20% <0.00%> (ø)
...ontend/src/optimizer/plan_node/batch_hop_window.rs 66.15% <0.00%> (ø)
...c/frontend/src/optimizer/plan_node/batch_insert.rs 51.02% <0.00%> (ø)
... and 117 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

| | ├─LogicalScan { table: partsupp, columns: [partsupp.ps_partkey, partsupp.ps_suppkey, partsupp.ps_availqty, partsupp.ps_supplycost] }
| | └─LogicalScan { table: supplier, columns: [supplier.s_suppkey, supplier.s_nationkey] }
| └─LogicalScan { table: nation, output_columns: [nation.n_nationkey], required_columns: [nation.n_nationkey, nation.n_name], predicate: (nation.n_name = 'ARGENTINA':Varchar) }
| └─LogicalShare { id = 373 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPCH Q11 also benefits from this PR.

Comment on lines 208 to 213
optimized_logical_plan: |
LogicalValues { rows: [[1:Int32], [1:Int32]], schema: Schema { fields: [1:Int32:Int32] } }
LogicalUnion { all: true }
├─LogicalShare { id = 90 }
| └─LogicalValues { rows: [[1:Int32]], schema: Schema { fields: [1:Int32:Int32] } }
└─LogicalShare { id = 90 }
└─LogicalValues { rows: [[1:Int32]], schema: Schema { fields: [1:Int32:Int32] } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values is not worth to share.

└─StreamNow { output: [now] }
└─StreamProject { exprs: [now], watermark_columns: [now] }
└─StreamShare { id = 94 }
└─StreamNow { output: [now] }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't share Now as well right?

Copy link
Contributor

@chenzl25 chenzl25 Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can share the common sub-plan with the following properties:

  1. include at least one table scan/ source.

@chenzl25
Copy link
Contributor

chenzl25 commented Feb 22, 2023

Another interesting observation on nexmark q5. We can share more than just HOP window.

explain (trace) create materialized view v as SELECT AuctionBids.auction, AuctionBids.num FROM (
      SELECT
        bid.auction,
        count(*) AS num,
        window_start AS starttime
      FROM
        HOP(bid, date_time, INTERVAL '2' SECOND, INTERVAL '10' SECOND)
      GROUP BY
        window_start,
        bid.auction
    ) AS AuctionBids
    JOIN (
      SELECT
        max(CountBids.num) AS maxn,
        CountBids.starttime_c
      FROM (
        SELECT
          count(*) AS num,
          window_start AS starttime_c
        FROM HOP(bid, date_time, INTERVAL '2' SECOND, INTERVAL '10' SECOND)
        GROUP BY
          bid.auction,
          window_start
      ) AS CountBids
      GROUP BY
        CountBids.starttime_c
    ) AS MaxBids
    ON AuctionBids.starttime = MaxBids.starttime_c AND AuctionBids.num >= MaxBids.maxn;

This snippet can be shared as well. we just need to reorder its group by to ensure they are in the same order.

SELECT
        bid.auction,
        count(*) AS num,
        window_start AS starttime
      FROM
        HOP(bid, date_time, INTERVAL '2' SECOND, INTERVAL '10' SECOND)
      GROUP BY
        window_start,
        bid.auction

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

@chenzl25 chenzl25 added the mergify/can-merge Indicates that the PR can be added to the merge queue label Feb 23, 2023
@mergify
Copy link
Contributor

mergify bot commented Feb 23, 2023

Hey @wsx-ucb, this pull request failed to merge and has been dequeued from the merge train. If you believe your PR failed in the merge train because of a flaky test, requeue it by clicking "Update branch" or pushing an empty commit with git commit --allow-empty -m "rerun" && git push.

@lmatz
Copy link
Contributor

lmatz commented Feb 23, 2023

The failure should have nothing to do with this PR. It's a known issue of Sqlsmith fuzzy testing.
Could someone force merge it after conflicts are resolved?

@chenzl25
Copy link
Contributor

The failure should have nothing to do with this PR. It's a known issue of Sqlsmith fuzzy testing. Could someone force merge it after conflicts are resolved?

Too many conflicts. 🥵

@mergify mergify bot merged commit 13b88a5 into main Feb 23, 2023
@mergify mergify bot deleted the wsx/shared-view branch February 23, 2023 11:26
@chenzl25 chenzl25 mentioned this pull request Feb 26, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve common sub-plan detection
4 participants