You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to ColPrunable, which prunes top down based on output columns, we should have a trait DuplicateColPrunable which prunes bottom up based on duplicate columns originating in various operators.
This idea has been proposed several times (references: ?) but has not been implemented. However, ExprRewritable machinery makes it easier to solve this problem easily.
Performance: It has been shown that removing duplicate agg calls can lead to performance improvements: perf: nexmark q17 #7351 (comment). Removing duplicate columns can merge agg calls that are falsely marked as distinct due to aliasing, in addition to being expected to reduce other forms of unnecessary work done in other operators. Furthermore, we can save storage.
Notes:
Duplicate columns can originate in various operator's:
output_indices
Project operator equal exprs
eq keys of inner joins
Duplicate agg calls (While aggs may no longer be a source after perf(agg): reuse existing agg calls while building LogicalAgg #8200, pruning duplicate columns in Agg's input may still cause us to handle duplicate agg calls after inputs have been de-aliased. See below.)
naively adding stream key
Whenever we deduplicate, we will pass col_index_mapping to the parent. The col_index_mapping need not be injective; an old column index can map to a single column multiple times. The upstream operator needs to rewrite all its expressions as well as depend on changed input schema.
MaterializedView{output:[0,1]}Join{ ..,output:[0,0]}
-> /* In this case, we prefer duplicate expressions to always be absorbed by Project */MaterializedView{output:[0,1]}Project{exprs:[$0, $0]}Join{ ..,output:[0]}
->
Implementation details:
In order to rewrite expressions' InputRef to the alias, we will use ExprRewritable to walk all the expressions in a given parent operator.
Since const_eval can reveal new expressions that are equivalent, perhaps we should run this optimization pass after const_eval
Ease of Implementation
The generic implementation is expected to be simple:
implDuplicateColPrunableforLogical_{fnprune_duplicate_cols(self) -> (PlanRef,ColIndexMapping){let(new_input, map) = self.inputs.prune_duplicate_cols();let plan = self.rewrite_with_inputs(new_input, map);letmut new plan = plan.rewrite_exprs(&mutInputRefRewriter::new(map));// Now that we have propagated both schema changes in inputs as well as // de-aliased any input cols, we check if we have any duplicate cols that can be removed.ifletSome(rewritten, mapping) = new_plan.rewrite_without_duplicates(){(rewritten, mapping)}else{(new_plan, identity)}}
Alternately, to enable Logical_ to handle rewriting input with non-injective mapping more generally, we should move plan.rewrite_exprs(&mut InputRefRewriter::new(map)); into rewrite_with_inputs.
Questions:
How do we deal with LogicalShare operator? A: we need to propagate the result of .prune_duplicate_cols() to each parent simultaneously, so that each parent can propagate pruning of duplicated cols. We can cache this result in a DuplicateColPruningContext, similar to ColPruningContext.
The text was updated successfully, but these errors were encountered:
Similar to
ColPrunable
, which prunes top down based on output columns, we should have a traitDuplicateColPrunable
which prunes bottom up based on duplicate columns originating in various operators.This idea has been proposed several times (references: ?) but has not been implemented. However,
ExprRewritable
machinery makes it easier to solve this problem easily.Motivations
ColIndexMapping
forLogicalJoin
#8267, fix(optimizer): handle non-injective changes oflogical_rewrite_for_stream
#8269) do not fix the root cause, which is properly handling non-injective col mappings and efficiently dealing with aliasing.Notes:
output_indices
LogicalAgg
#8200, pruning duplicate columns in Agg's input may still cause us to handle duplicate agg calls after inputs have been de-aliased. See below.)col_index_mapping
to the parent. Thecol_index_mapping
need not be injective; an old column index can map to a single column multiple times. The upstream operator needs to rewrite all its expressions as well as depend on changed input schema.Examples:
Prune based on duplicate
output_indices
:Project can also be pruned
Pruning eq keys in inner joins
Pruning agg calls falsely considered unique
Handling of Materialized View
Implementation details:
InputRef
to the alias, we will useExprRewritable
to walk all the expressions in a given parent operator.Ease of Implementation
The generic implementation is expected to be simple:
Alternately, to enable
Logical_
to handle rewriting input with non-injective mapping more generally, we should moveplan.rewrite_exprs(&mut InputRefRewriter::new(map));
intorewrite_with_inputs
.Questions:
LogicalShare
operator? A: we need to propagate the result of.prune_duplicate_cols()
to each parent simultaneously, so that each parent can propagate pruning of duplicated cols. We can cache this result in aDuplicateColPruningContext
, similar toColPruningContext
.The text was updated successfully, but these errors were encountered: