-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Fix ReplaceAliasingEvalWithProject in case of shadowing #137025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c077983
c8221e3
52e122f
ddea63d
f7880e6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| pr: 137025 | ||
| summary: Fix `ReplaceAliasingEvalWithProject` in case of shadowing | ||
| area: ES|QL | ||
| type: bug | ||
| issues: | ||
| - 137019 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,11 @@ | |
| import org.elasticsearch.xpack.esql.rule.Rule; | ||
|
|
||
| import java.util.ArrayList; | ||
| import java.util.HashSet; | ||
| import java.util.List; | ||
| import java.util.Set; | ||
|
|
||
| import static org.elasticsearch.xpack.esql.optimizer.rules.logical.TemporaryNameUtils.locallyUniqueTemporaryName; | ||
|
|
||
| /** | ||
| * Replace aliasing evals (eval x=a) with a projection which can be further combined / simplified. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we require project on top? It seems we are missing a lot of optimization opportunities. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The example above should be optimized to just this, no project needed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This rule doesn't propagate shadowed, internal columns from an eval. We could do that! But I don't think we do. (Which should be simplified by some other rule to
Great question! That rule is super old, and I don't recall why we don't trigger it always. My hunch is that we wanted it mostly to combine the aliases from the eval with downstream projections. But we could profit from propagating the aliases more generally. That said, on its own, there is no performance difference between a simple alias in an eval vs. in a projection. Both are cheap! They just incRef the underlying block. (Unless that block is sent over the wire. But that could also be tackled on the serialization level.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I would expect us to get to salary = ((salary+1)+1)+1 and then fold the constants in evals eventually. You don't have to address it in this PR, it seems like a bigger change |
||
|
|
@@ -53,47 +57,63 @@ public LogicalPlan apply(LogicalPlan logicalPlan) { | |
| private LogicalPlan rule(Eval eval) { | ||
| LogicalPlan plan = eval; | ||
|
|
||
| // holds simple aliases such as b = a, c = b, d = c | ||
| AttributeMap.Builder<Expression> basicAliasesBuilder = AttributeMap.builder(); | ||
| // same as above but keeps the original expression | ||
| AttributeMap.Builder<NamedExpression> basicAliasSourcesBuilder = AttributeMap.builder(); | ||
| // Mostly, holds simple aliases from the eval, such as b = a, c = b, d = c, so we can resolve them in subsequent eval fields | ||
| AttributeMap.Builder<Expression> renamesToPropagate = AttributeMap.builder(); | ||
| // the aliases for the final projection - mostly, same as above but holds the final aliases rather than the original attributes | ||
| AttributeMap.Builder<NamedExpression> projectionAliases = AttributeMap.builder(); | ||
| // The names of attributes that are required to perform the aliases in a subsequent projection - if the next eval field | ||
| // shadows one of these names, the subsequent projection won't work, so we need to perform a temporary rename. | ||
| Set<String> namesRequiredForProjectionAliases = new HashSet<>(); | ||
|
|
||
| List<Alias> keptFields = new ArrayList<>(); | ||
| List<Alias> newEvalFields = new ArrayList<>(); | ||
|
|
||
| var fields = eval.fields(); | ||
| for (int i = 0, size = fields.size(); i < size; i++) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Optional, but since changing this method "sufficiently"...: |
||
| Alias field = fields.get(i); | ||
| Expression child = field.child(); | ||
| var attribute = field.toAttribute(); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's move this in the "basic renaming" branch. |
||
| // put the aliases in a separate map to separate the underlying resolve from other aliases | ||
| if (child instanceof Attribute) { | ||
| basicAliasesBuilder.put(attribute, child); | ||
| basicAliasSourcesBuilder.put(attribute, field); | ||
| // propagate all previous aliases into the current field | ||
| field = (Alias) field.transformUp(e -> renamesToPropagate.build().resolve(e, e)); | ||
| Expression child = field.child(); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Optional: can be inlined. |
||
|
|
||
| if (child instanceof Attribute renamedAttribute) { | ||
| // Basic renaming - let's do that in the subsequent projection | ||
| renamesToPropagate.put(attribute, renamedAttribute); | ||
| projectionAliases.put(attribute, field); | ||
| namesRequiredForProjectionAliases.add(renamedAttribute.name()); | ||
| } else { | ||
| // be lazy and start replacing name aliases only if needed | ||
| if (basicAliasesBuilder.build().size() > 0) { | ||
| // update the child through the field | ||
| field = (Alias) field.transformUp(e -> basicAliasesBuilder.build().resolve(e, e)); | ||
| // not a basic renaming, needs to remain in the eval | ||
|
|
||
| // The field may shadow one of the attributes that we will need to correctly perform the subsequent projection. | ||
| // If so, rename it in the eval! | ||
| if (namesRequiredForProjectionAliases.contains(field.name())) { | ||
| Alias newField = new Alias(field.source(), locallyUniqueTemporaryName(field.name()), field.child(), null, true); | ||
| Alias reRenamedField = new Alias(field.source(), field.name(), newField.toAttribute(), field.id(), field.synthetic()); | ||
| projectionAliases.put(field.toAttribute(), reRenamedField); | ||
| // the renaming also needs to be propagated to eval fields to the right | ||
| renamesToPropagate.put(field.toAttribute(), newField.toAttribute()); | ||
|
|
||
| field = newField; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Optional: for legibility, I'd reflow this into an |
||
| } | ||
| keptFields.add(field); | ||
|
|
||
| newEvalFields.add(field); | ||
| } | ||
| } | ||
|
|
||
| // at least one alias encountered, move it into a project | ||
| if (basicAliasesBuilder.build().size() > 0) { | ||
| if (renamesToPropagate.build().size() > 0) { | ||
| // preserve the eval output (takes care of shadowing and order) but replace the basic aliases | ||
| List<NamedExpression> projections = new ArrayList<>(eval.output()); | ||
| var basicAliasSources = basicAliasSourcesBuilder.build(); | ||
| var projectionAliasesMap = projectionAliases.build(); | ||
| // replace the removed aliases with their initial definition - however use the output to preserve the shadowing | ||
| for (int i = projections.size() - 1; i >= 0; i--) { | ||
| NamedExpression project = projections.get(i); | ||
| projections.set(i, basicAliasSources.getOrDefault(project, project)); | ||
| projections.set(i, projectionAliasesMap.getOrDefault(project, project)); | ||
| } | ||
|
|
||
| LogicalPlan child = eval.child(); | ||
| if (keptFields.size() > 0) { | ||
| if (newEvalFields.size() > 0) { | ||
| // replace the eval with just the kept fields | ||
| child = new Eval(eval.source(), eval.child(), keptFields); | ||
| child = new Eval(eval.source(), eval.child(), newEvalFields); | ||
| } | ||
| // put the projection in place | ||
| plan = new Project(eval.source(), child, projections); | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw if multiple are found? Just to make the testing less "exciting", in case this method ends up being used a plan with multiple sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the multiple case will likely happen in case of lookup join if the name is both in the main index and the lookup index. We should test this