New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto Execution Mode Selection #10
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
--- | ||
feature: Auto Execution Mode Selection | ||
authors: | ||
- "DylanChen" | ||
start_date: "2022/11/03" | ||
--- | ||
|
||
# Auto Execution Mode Selection | ||
|
||
## Summary | ||
|
||
This RFC introduces an auto execution mode selection feature for user's batch workloads so that OLTP queries can run in local mode and OLAP queries can run in distributed mode. As a result, users will experience a lower latency automatically. | ||
|
||
## Motivation | ||
|
||
Currently, our system exists 2 kind of execution mode: local and distributed. We want to run OLTP queries in local execution mode, while OLAP queries in distributed execution mode. However we leave the choice to users by `set query_mode = [local | distributed]` which in some way makes our distributed execution useless because most users don't know or understand what those options exactly mean. As a database, we have the ability and should make a good choice for users to reduce their tuning works. | ||
|
||
## Design | ||
|
||
Since we don't have any statistics about table/mv, we are unable to calculate a meaningful cost for queries, so we focus on the oltp workload read pattern to decide what queries should run in local mode. | ||
|
||
### Query should run in local mode: | ||
|
||
- Single table primary key lookup. we can check our physical plan to ensure whether there is only one table scan in the plan tree and this table scan contains a point lookup scan range. | ||
- Single table two side bounded range scan. we can check our physical plan to ensure whether there is only one table scan in the plan tree and this table scan contains a two side bounded scan range. | ||
- Single table Index lookup. we can check our logical plan and physical plan to ensure there is only one table scan in the logical plan tree and there is a lookup join(non covering index) or there is a table scan in the physical plan tree(covering index). | ||
- Select without any table scan. Such as `select 1`. | ||
- Single table with limit n. If n is small, we can also allow them run in local mode. | ||
- Two table IndexNestedLoopJoin. Such as `select * from A join B on A.k = B.k where A.id = 1` If B.k has an index and A side input is proved to be small. | ||
|
||
|
||
|
||
### Query should run in distributed mode | ||
|
||
- Any other plans not exists in above section. | ||
- Example: | ||
- Single Table Scan have no scan range which means it is a full table scan. | ||
- Multi join complex queries like TPCH querys. | ||
|
||
### How to verify | ||
|
||
We expect all queries of Sysbench can run in local mode automatically and all queries of TPCH can run in distributed mode automatically. | ||
|
||
Example of Sysbench query | ||
``` | ||
local stmt_defs = { | ||
point_selects = { | ||
"SELECT c FROM sbtest%u WHERE id=?", | ||
t.INT}, | ||
simple_ranges = { | ||
"SELECT c FROM sbtest%u WHERE id BETWEEN ? AND ?", | ||
t.INT, t.INT}, | ||
sum_ranges = { | ||
"SELECT SUM(k) FROM sbtest%u WHERE id BETWEEN ? AND ?", | ||
t.INT, t.INT}, | ||
order_ranges = { | ||
"SELECT c FROM sbtest%u WHERE id BETWEEN ? AND ? ORDER BY c", | ||
t.INT, t.INT}, | ||
distinct_ranges = { | ||
"SELECT DISTINCT c FROM sbtest%u WHERE id BETWEEN ? AND ? ORDER BY c", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This query seems hard to say it's TP or AP unless |
||
t.INT, t.INT}, | ||
index_updates = { | ||
"UPDATE sbtest%u SET k=k+1 WHERE id=?", | ||
t.INT}, | ||
non_index_updates = { | ||
"UPDATE sbtest%u SET c=? WHERE id=?", | ||
{t.CHAR, 120}, t.INT}, | ||
deletes = { | ||
"DELETE FROM sbtest%u WHERE id=?", | ||
t.INT}, | ||
inserts = { | ||
"INSERT INTO sbtest%u (id, k, c, pad) VALUES (?, ?, ?, ?)", | ||
t.INT, t.INT, {t.CHAR, 120}, {t.CHAR, 60}}, | ||
random_points = { | ||
"SELECT id, k, c, pad | ||
FROM sbtest1 | ||
WHERE k IN (%s)" | ||
} | ||
} | ||
``` | ||
|
||
### New Query Mode | ||
|
||
Add a new type of query mode `auto` and use it by default. User can still choose `local` or `distributed` if they want. | ||
|
||
|
||
## Unresolved questions | ||
|
||
There is no guarantee of how much IO or rows being fetched in local mode. | ||
|
||
Think about two side bounded range scan. We don't know how much rows will return. | ||
|
||
Think about index point lookup if this is a poor index with low cardinality. | ||
|
||
## Future possibilities | ||
|
||
If we have a cost model in the future, we can decide what execution mode should used for the query by its cost. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case is expression only sql, for example
select 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually,
select
any expression without tables?