Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer, storage): pushdown range-filter to storage #786

Merged
merged 17 commits into from
Jul 15, 2023

Conversation

wangrunji0408
Copy link
Member

@wangrunji0408 wangrunji0408 commented Jul 9, 2023

Range-filter scan has been supported in storage #589. But the query engine doesn't make use of it. This PR adds an optimization rule to push range predicates from filters down to scans. To identify range predicates on primary keys (e.g. id > 1), we introduced the range analysis. It will extract a KeyRange from =, >, >=, <, <=, and node, which can be passed into storage later.

Another goal of this PR is to decouple storage from the query engine v1. Currently it depends on BoundExpr from v1 to filter data inside storage. But in fact we don't expect storage to support general filters other than range filters on keys. So this PR removed general filter from storage and added a KeyRange for range filter. After that, the v1 engine can be removed completely.

A simple test on TPC-H dataset:

# disable filter scan rule
> select count(O_ORDERKEY) from orders where O_ORDERKEY < 10;
+---+
| 7 |
+---+
in 0.018s

# enable filter scan rule
> select count(O_ORDERKEY) from orders where O_ORDERKEY < 10;
+---+
| 7 |
+---+
in 0.002s

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for the PR! Seems that SQLLogicTest needs to be updated with make apply_planner_test. Also it seems that we pass Option<KeyRange> to the filter but the default value for that is true? Shall we convert it to None somewhere so that we can avoid unnecessary expression evaluation?

let end = match (&ra.end, &rb.end) {
(Bound::Unbounded, s) | (s, Bound::Unbounded) => s.clone(),
_ => return None,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if both range from one side are bound? (i.e., a < 3 && a < 5) In this case we should not return a None here if we don't have filter executor above scan-filter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will return a None. I didn't merge the bounds here because I thought it would be a bit complicated. For these cases I expected them to be handled in rewrite rules later. For example:

(and (< ?a ?v1) (< ?a ?v2)) => (< ?a (min ?v1 ?v2))

if min.is_some() {
min.unwrap().min(ROWSET_MAX_OUTPUT)
if let Some(min) = min {
min.min(ROWSET_MAX_OUTPUT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be if let Some(ref mut min) = min and *min = min.min(xxx)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole if expression will be assigned to fetch_size. I think it is fine.

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Comment on lines +86 to +87
// required by range-filter scan rule
record_first_key: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this feature is required by range-filter scan now. Should we make it mandatory? or will we support range-filter scan without it? @skyzh

);

let snapshot = if is_sorted {
assert!(opts.filter.is_none(), "MemTxn doesn't support filter scan");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to support range-filter scan (store in order by key) in memory storage?

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@wangrunji0408
Copy link
Member Author

Also it seems that we pass Option<KeyRange> to the filter but the default value for that is true? Shall we convert it to None somewhere so that we can avoid unnecessary expression evaluation?

I've changed the default value to null so that it's more intuitive. In fact, there is no evaluation for both cases. It only takes the Option<KeyRange> out from the filter node when building executor.

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Jul 15, 2023
Merged via the queue into main with commit a0882cd Jul 15, 2023
4 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/decouple-storage-expr branch July 15, 2023 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants