Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add input ref resolver #158

Merged
merged 2 commits into from
Nov 22, 2021
Merged

feat: add input ref resolver #158

merged 2 commits into from
Nov 22, 2021

Conversation

a9QrX3Lu
Copy link
Contributor

No description provided.

Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation looks good to me, and I'll give my approval, as long as it works. But from my perspective, this InputRefResolver doesn't seem very clear in its functionalities.

After reviewing this PR, I came up with an idea. In fact, InputRefResolver is a phase in logical planner that rewrites the BoundExpr to produce a logical plan. I think that the process could be vaguely split into two phases: walk through the expr tree, and produce a plan.

Basically, InputRefResolver should have its internal states, scan_column_list and aggr_list. When walking through the expr tree, the InputRefResolver will add the columns that need to be scanned into the scan_column_list, and produce AggCall based on the position of each column in scan_column_list. That should be exactly the same as what you have done in previous binder, except that we will know exactly which position of column in DataChunk will be read in this phase.

Anyway, after fixing the _agg_calls comment, this PR can be merged. Let's discuss how to do this better later (or maybe just leave it here and refactor it when we need to).

Good work!

src/binder/expression/mod.rs Outdated Show resolved Hide resolved
for expr in group_by.iter_mut() {
self.bind_column_idx_for_expr(&mut expr.kind);
}

let mut has_agg = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has_agg looks fine to me for now, but I think it's better to handle this in logical plan. From my perspective,has_agg decides whether to use an AggExecutor in logical plan or physical plan. Let's leave it here, and decide where to put it later.

use itertools::Itertools;

/// Transform expr referring to input chunk into `BoundInputRef`
fn transform_expr(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea for transform_expr. I think this function should directly output a plan, instead of simply rewriting the expr. The transform_expr function walks through the BoundExpr tree. When it finds an AggCall, it simply turns the child to an AggExecutor and resolve the InputRefs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and when resolving InputRefs, we definitely need your prior work: a select_list and an aggregation_list. When we find a column to scan, we add that column to select_list, which will be directly feed into the SeqScanExecutor. e.g. If select_list is ["x", "y", "z"], the SeqScanExecutor will take these 3 columns as scan targets. And when we need to do aggregation, we find the corresponding position of the column. e.g. we need to do count(x), we find "x" is the first element in select_list, and we use InputRef(0) for this agg call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and in this case, bindings is select_list, and agg_calls is my aggregation_list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above might not be a complete idea. We may discuss this later.

src/physical_planner/input_ref_resolver.rs Outdated Show resolved Hide resolved
----
24 25.5
# query IR
# select sum(v1+v2),sum(v1+v3) from t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this statement would fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's a cast error. Will fix this later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skyzh
Copy link
Member

skyzh commented Nov 21, 2021

@MingjiHan99 would you please take a closer look and see if this looks good to you?

Copy link
Member

@wangrunji0408 wangrunji0408 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Btw, as Mr. Chi mentioned, I also feel that the InputRefResolver can be refactored into a PlanRewriter, where the Vec<ColumnRefId> can be a state of the resolver. It seems that DuckDB is in this style.

src/logical_planner/select.rs Show resolved Hide resolved
src/physical_planner/input_ref_resolver.rs Show resolved Hide resolved
src/physical_planner/input_ref_resolver.rs Show resolved Hide resolved
src/physical_planner/input_ref_resolver.rs Show resolved Hide resolved
@MingjiHan99
Copy link
Collaborator

LGTM.
With the new input ref resolver, we can support more complicated cases including subqueries in the future. @skyzh @wangrunji0408 @pleiadesian

@skyzh skyzh merged commit 87d3512 into main Nov 22, 2021
@skyzh skyzh deleted the wzl-inputref branch November 22, 2021 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

planner: convert ColumnRef to InputRef in physical planner
4 participants