-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add input ref resolver #158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation looks good to me, and I'll give my approval, as long as it works. But from my perspective, this InputRefResolver
doesn't seem very clear in its functionalities.
After reviewing this PR, I came up with an idea. In fact, InputRefResolver
is a phase in logical planner that rewrites the BoundExpr to produce a logical plan. I think that the process could be vaguely split into two phases: walk through the expr tree, and produce a plan.
Basically, InputRefResolver
should have its internal states, scan_column_list
and aggr_list
. When walking through the expr tree, the InputRefResolver
will add the columns that need to be scanned into the scan_column_list
, and produce AggCall
based on the position of each column in scan_column_list
. That should be exactly the same as what you have done in previous binder, except that we will know exactly which position of column in DataChunk will be read in this phase.
Anyway, after fixing the _agg_calls
comment, this PR can be merged. Let's discuss how to do this better later (or maybe just leave it here and refactor it when we need to).
Good work!
for expr in group_by.iter_mut() { | ||
self.bind_column_idx_for_expr(&mut expr.kind); | ||
} | ||
|
||
let mut has_agg = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has_agg
looks fine to me for now, but I think it's better to handle this in logical plan. From my perspective,has_agg
decides whether to use an AggExecutor
in logical plan or physical plan. Let's leave it here, and decide where to put it later.
use itertools::Itertools; | ||
|
||
/// Transform expr referring to input chunk into `BoundInputRef` | ||
fn transform_expr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea for transform_expr
. I think this function should directly output a plan, instead of simply rewriting the expr. The transform_expr
function walks through the BoundExpr
tree. When it finds an AggCall
, it simply turns the child to an AggExecutor
and resolve the InputRef
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and when resolving InputRef
s, we definitely need your prior work: a select_list
and an aggregation_list
. When we find a column to scan, we add that column to select_list
, which will be directly feed into the SeqScanExecutor
. e.g. If select_list
is ["x", "y", "z"]
, the SeqScanExecutor will take these 3 columns as scan targets. And when we need to do aggregation, we find the corresponding position of the column. e.g. we need to do count(x)
, we find "x" is the first element in select_list
, and we use InputRef(0)
for this agg call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and in this case, bindings
is select_list
, and agg_calls
is my aggregation_list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above might not be a complete idea. We may discuss this later.
---- | ||
24 25.5 | ||
# query IR | ||
# select sum(v1+v2),sum(v1+v3) from t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why this statement would fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's a cast error. Will fix this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MingjiHan99 would you please take a closer look and see if this looks good to you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Btw, as Mr. Chi mentioned, I also feel that the InputRefResolver
can be refactored into a PlanRewriter
, where the Vec<ColumnRefId>
can be a state of the resolver. It seems that DuckDB is in this style.
LGTM. |
No description provided.