Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): split plan into fragments for local execution mode #3032

Merged
merged 3 commits into from
Jun 13, 2022
Merged

Conversation

lmatz
Copy link
Contributor

@lmatz lmatz commented Jun 7, 2022

What's changed and what's your intention?

Split plans into fragments that can be executed in local execution mode.

Move some test cases into .part so that both distributed and local modes can use them.

Will add more test cases and support needed in some future PRs.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#2978

@codecov
Copy link

codecov bot commented Jun 7, 2022

Codecov Report

Merging #3032 (80bb184) into main (2aab325) will decrease coverage by 0.05%.
The diff coverage is 4.00%.

@@            Coverage Diff             @@
##             main    #3032      +/-   ##
==========================================
- Coverage   73.72%   73.66%   -0.06%     
==========================================
  Files         739      739              
  Lines      101713   101792      +79     
==========================================
+ Hits        74985    74986       +1     
- Misses      26728    26806      +78     
Flag Coverage Δ
rust 73.66% <4.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/rpc/service/task_service.rs 0.00% <0.00%> (ø)
src/frontend/src/handler/query.rs 0.00% <0.00%> (ø)
src/frontend/src/scheduler/local.rs 0.00% <0.00%> (ø)
src/frontend/src/scheduler/plan_fragmenter.rs 93.92% <0.00%> (-0.72%) ⬇️
src/frontend/src/scheduler/task_context.rs 0.00% <0.00%> (ø)
src/frontend/src/session.rs 45.47% <33.33%> (-0.21%) ⬇️
src/frontend/src/optimizer/mod.rs 94.17% <66.66%> (-0.38%) ⬇️
src/meta/src/hummock/mock_hummock_meta_client.rs 42.39% <0.00%> (-1.09%) ⬇️
src/meta/src/barrier/mod.rs 69.13% <0.00%> (-0.33%) ⬇️
src/storage/src/hummock/local_version_manager.rs 84.12% <0.00%> (-0.16%) ⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

} else {
// We should only have one child stage of the root stage for now.
assert_eq!(second_stage_id.len(), 1);
let second_stage_id = second_stage_id.iter().next().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The child stages of root maybe more than 1, think about following sql:
select * from t1, t2 where t1.a = t2.a, in this case we have plan like

                   HashJoin
               /                     \
          Exchange      Exchange
              /                        \
         TableScan         TableScan

You can refer to https://github.com/singularity-data/risingwave/blob/408e9fb5249b12b1b457287adc4deba13c301f18/src/frontend/src/scheduler/distributed/stage.rs#L354 for child plan fragment and exchange id mapping.

Copy link
Contributor Author

@lmatz lmatz Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought select * from t1, t2 where t1.a = t2.a is not considered as a point query and thus is executed in the normal distributed mode instead of local execution mode. 🤔

The example from quip doc:
SELECT pk, t1.a, t1.fk, t2.b FROM t1, t2 WHERE t1.fk = t2.pk AND t1.pk = 114514
which is a point query as t1.pk = 114514 is specified and thus it uses a look up join instead of hash join.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choosing the appropriate plan should be the optimizer's responsibility, and the scheduler should not have such strong assumptions about the plan. The local execution mode is optimized for point query, but it doesn't mean the user can't execute hash join/sort-merge join.
The example SELECT pk, t1.a, t1.fk, t2.b FROM t1, t2 WHERE t1.fk = t2.pk AND t1.pk = 114514 picks lookup join only when t2.pk has index or is a primary key on it, and chooses hash join when it doesn't have an index.

@lmatz lmatz force-pushed the lz/local branch 2 times, most recently from 5e7bfa4 to a355dfd Compare June 11, 2022 11:52
@lmatz
Copy link
Contributor Author

lmatz commented Jun 11, 2022

Added two more tests for local mode, i.e. join and range_scan.
join has two stages on the same level below the root stage.

@lmatz
Copy link
Contributor Author

lmatz commented Jun 11, 2022

will support MergeExchange in a separate one

@lmatz lmatz requested a review from liurenjie1024 June 13, 2022 04:12
@lmatz lmatz enabled auto-merge (squash) June 13, 2022 09:59
@lmatz lmatz merged commit 3bca26e into main Jun 13, 2022
@lmatz lmatz deleted the lz/local branch June 13, 2022 10:05
@chinawch007
Copy link

Excuse me, what's the meaning of local execution mode?

@lmatz
Copy link
Contributor Author

lmatz commented Jun 25, 2022

Excuse me, what's the meaning of local execution mode?

Some operators are directly executed on the frontend node instead of the compute nodes.
The downstream stages of an exchange operator are scheduled by the exchange operator instead of the scheduler in the frontend node.

Only those queries of certain execution plans(relatively simple) are classified as queries executed in local execution mode. The time spent on the execution of these queries is typically dozens of milliseconds, thus reducing RPCs becomes profitable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants