-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Fix wrong attribute shadowing in pushdown rules #105650
ESQL: Fix wrong attribute shadowing in pushdown rules #105650
Conversation
d2fe543
to
74ec036
Compare
74ec036
to
726fee1
Compare
I tried an alternative solution as well which turned |
@@ -145,6 +145,19 @@ full_name:keyword | emp_no:keyword | b:keyword | |||
Bezalel Simmel | Bezalel | Simmel | |||
; | |||
|
|||
overwriteNameAfterSort#[skip:-8.13.0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should backport this bugfix to 8.13, so I'm setting the skip labels accordingly.
This is unique and should avoid name clashes better.
Also thanks to @luigidellaquila for suggesting this approach, which updates the attributes in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alex-spies, looks pretty good.
I left only one comment about possible improvements on tests
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/eval.csv-spec
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - the code is duplicated 3 times so please encapsulate it.
Add more tests also some with references sort x | eval y = x, x = y
, chained to guarantee the nature of the plan before the rule kicks in.
if (existingAlias == null) { | ||
String tempName = SubstituteSurrogates.rawTemporaryName(attr.name(), "temp_name", (int) attr.id().toLong()); | ||
|
||
Alias tempNameForShadowedAttr = new Alias(Source.EMPTY, tempName, attr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve the source (useful to see where this came from) and set the synthetic parameter to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this should be synthetic. In particular, it should not be computed on a data node, that would just send duplicate columns over the wire.
However, currently this does not just work; if I mark these aliases as synth, the ProjectAwayColumns
physical optimizer rule prunes them, but I get an NPE because now they are seemingly computed nowhere. (Happens precisely here.)
This requires additional investigation to get right. How about I create a separate issue for that? For the time being, I'll mark it as TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please raise an issue to double check the behaviour in ProjectAwayColumns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we go: #105821
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
|
||
AttributeMap<Alias> aliasesForShadowedOrderByAttrs = nonShadowedOrders.replacedAttributes; | ||
@SuppressWarnings("unchecked") | ||
List<Order> newOrder = (List<Order>) (Object) nonShadowedOrders.rewrittenExpressions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(List<Order>) (Object)
? Everything is an object - if anything cast the object to List directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting directly does not compile (Inconvertible types
), but I can make this slightly less ugly by using List<?>
instead of Object
.
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
if (aliasesForShadowedOrderByAttrs.isEmpty() == false) { | ||
List<Alias> newAliases = aliasesForShadowedOrderByAttrs.values().stream().toList(); | ||
|
||
LogicalPlan plan = new Eval(Source.EMPTY, orderBy.child(), newAliases); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the current node source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the source of the OrderBy
is the best option - here, we're renaming some attributes for it. (Current node source contains the already overwritten/shadowed attributes, instead.)
LogicalPlan plan = new Eval(Source.EMPTY, orderBy.child(), newAliases); | ||
plan = eval.replaceChild(plan); | ||
plan = new OrderBy(orderBy.source(), plan, newOrder); | ||
plan = new EsqlProject(Source.EMPTY, plan, eval.output()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Project directly and preserve the source.
AttributeSet generates = new AttributeSet(Expressions.asAttributes(enrichFields)); | ||
// In case of aliases we generate both the alias and its target. | ||
// E.g. in ENRICH policy ON field WITH alias = enrich_field | ||
// we generate both `alias` and `enrich_field`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks incorrect as enrich_field is hidden behind alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's weird that we have to do this. I found a place in the physical optimizer that needs to handle Enrich in a similarly weird way.
To avoid cementing this weirdness, I'll remove this from the PR, at the cost of having to disable the dependency checker again. I'll follow this up with another PR, after investigating the weird handling of attributes in Enrich.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise an issue for the Enrich problem to make sure it doesn't get lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There we go! #105807
x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/expression/NameId.java
Outdated
Show resolved
Hide resolved
I'll add a CSV test for this; the added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your remarks, @costin and @luigidellaquila ! Should be ready for the next round, now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - the improvements are visible. LGTM
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @alex-spies, I've created a changelog YAML for you. |
Fix elastic#105434 Fixes accidental shadowing when pushing down `GROK`/`DISSECT`, `EVAL` or `ENRICH` past a `SORT`. Example for how this works: ``` ... | SORT x | EVAL x = y ... pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into ... | EVAL $$x = x | EVAL x = y | SORT $$x | DROP $$x ... ``` The same logic is applied to `GROK`/`DISSECT` and `ENRICH`. This allows to re-enable the dependency checker (after fixing a small bug in it when handling `ENRICH`).
💚 Backport successful
|
#105808) * ESQL: Fix wrong attribute shadowing in pushdown rules (#105650) Fix #105434 Fixes accidental shadowing when pushing down `GROK`/`DISSECT`, `EVAL` or `ENRICH` past a `SORT`. Example for how this works: ``` ... | SORT x | EVAL x = y ... pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into ... | EVAL $$x = x | EVAL x = y | SORT $$x | DROP $$x ... ``` The same logic is applied to `GROK`/`DISSECT` and `ENRICH`. This allows to re-enable the dependency checker (after fixing a small bug in it when handling `ENRICH`). * Make OptimizerRules compile again
Fix #105434
Fixes accidental shadowing when pushing down
GROK
/DISSECT
,EVAL
orENRICH
past aSORT
.Example for how this works:
The same logic is applied to
GROK
/DISSECT
andENRICH
.This allows to re-enable the dependency checker (after fixing a small bug in it when handling
ENRICH
).