New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

ESQL: Fix wrong attribute shadowing in pushdown rules #105650

Merged

elasticsearchmachine merged 24 commits into elastic:main from alex-spies:fix-pushdownregexeval-shadowing

Feb 26, 2024

Contributor

alex-spies commented Feb 20, 2024 •

edited

Loading

Fixes accidental shadowing when pushing down GROK/DISSECT, EVAL or ENRICH past a SORT.

Example for how this works:

...
| SORT x
| EVAL x = y
...

pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into

...
| EVAL $$x = x
| EVAL x = y
| SORT $$x
| DROP $$x
...

The same logic is applied to GROK/DISSECT and ENRICH.

This allows to re-enable the dependency checker (after fixing a small bug in it when handling ENRICH).

elasticsearchmachine added the v8.14.0 label


          Add reproducing csv tests

cdf6059

alex-spies force-pushed the fix-pushdownregexeval-shadowing branch 3 times, most recently from d2fe543 to 74ec036 Compare

February 20, 2024 14:13

alex-spies added 4 commits

February 20, 2024 16:25


          Fix shadowing when pushing down eval/regex/enrich

71ccfc3


          Add tests

0cea3f7


          Re-enable+fix dependency check

7f7a91c


          Cleanup

726fee1

alex-spies force-pushed the fix-pushdownregexeval-shadowing branch from 74ec036 to 726fee1 Compare

February 20, 2024 15:30

Contributor Author

alex-spies commented Feb 20, 2024

I tried an alternative solution as well which turned SORT x | EVAL x = y into EVAL $$x = y | SORT x | RENAME $$x AS x, but that doesn't work easily for DISSECT, as turning DISSECT some_field "%{x} ..." into DISSECT some_field "%{$$x} ..." requires messing with the DISSECT pattern and references, which appeared quite finicky.


          Update skip labels for tests

e00db6f

alex-spies commented

View reviewed changes

x-pack/plugin/esql/qa/testFixtures/src/main/resources/dissect.csv-spec

@@ @@ -145,6 +145,19 @@ full_name:keyword | emp_no:keyword | b:keyword @@
               Bezalel Simmel    | Bezalel        | Simmel
               ;
+              overwriteNameAfterSort#[skip:-8.13.0]

Contributor Author

alex-spies Feb 20, 2024

I think we should backport this bugfix to 8.13, so I'm setting the skip labels accordingly.

alex-spies added v8.13.1 backport labels

costin added the :Analytics/ES|QL label

alex-spies added 5 commits

February 21, 2024 12:21


          Generalize unit test to DISSECT, GROK and ENRICH

abdeee3


          WIP csv tests

1cbc32d


          Merge remote-tracking branch 'upstream/main' into fix-pushdownregexev…

65d05f3

…al-shadowing


          Add more csv tests

33e3d1b


          Use nameId in temp name for renamed attributes

d61e0cd

This is unique and should avoid name clashes better.

alex-spies marked this pull request as ready for review

February 21, 2024 17:57


          Update LogicalPlan in test javadoc

b33f667

Contributor Author

alex-spies commented Feb 21, 2024

Also thanks to @luigidellaquila for suggesting this approach, which updates the attributes in the SORT, over trying to update the references created by EVAL/GROK/DISSECT/ENRICH.

luigidellaquila reviewed

View reviewed changes

Contributor

luigidellaquila left a comment

Thanks @alex-spies, looks pretty good.
I left only one comment about possible improvements on tests

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java Outdated Show resolved Hide resolved

x-pack/plugin/esql/qa/testFixtures/src/main/resources/eval.csv-spec Outdated Show resolved Hide resolved

astefan requested review from costin and astefan

February 22, 2024 15:55

costin requested changes

View reviewed changes

Member

costin left a comment

Looks good - the code is duplicated 3 times so please encapsulate it.
Add more tests also some with references sort x | eval y = x, x = y, chained to guarantee the nature of the plan before the rule kicks in.

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated

+                                  if (existingAlias == null) {
+                                      String tempName = SubstituteSurrogates.rawTemporaryName(attr.name(), "temp_name", (int) attr.id().toLong());
+                                      Alias tempNameForShadowedAttr = new Alias(Source.EMPTY, tempName, attr);

Member

costin Feb 22, 2024

Preserve the source (useful to see where this came from) and set the synthetic parameter to true.

Contributor Author

alex-spies Feb 23, 2024

I agree this should be synthetic. In particular, it should not be computed on a data node, that would just send duplicate columns over the wire.

However, currently this does not just work; if I mark these aliases as synth, the ProjectAwayColumns physical optimizer rule prunes them, but I get an NPE because now they are seemingly computed nowhere. (Happens precisely here.)

This requires additional investigation to get right. How about I create a separate issue for that? For the time being, I'll mark it as TODO.

Member

costin Feb 23, 2024

Please raise an issue to double check the behaviour in ProjectAwayColumns.

Contributor Author

alex-spies Feb 27, 2024

Here we go: #105821

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated Show resolved Hide resolved

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated Show resolved Hide resolved

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated Show resolved Hide resolved

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated

+                              AttributeMap<Alias> aliasesForShadowedOrderByAttrs = nonShadowedOrders.replacedAttributes;
+                              @SuppressWarnings("unchecked")
+                              List<Order> newOrder = (List<Order>) (Object) nonShadowedOrders.rewrittenExpressions;

Member

costin Feb 22, 2024

(List<Order>) (Object)? Everything is an object - if anything cast the object to List directly.

Contributor Author

alex-spies Feb 23, 2024

Casting directly does not compile (Inconvertible types), but I can make this slightly less ugly by using List<?> instead of Object.

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated Show resolved Hide resolved

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated

+                              if (aliasesForShadowedOrderByAttrs.isEmpty() == false) {
+                                  List<Alias> newAliases = aliasesForShadowedOrderByAttrs.values().stream().toList();
+                                  LogicalPlan plan = new Eval(Source.EMPTY, orderBy.child(), newAliases);

Member

costin Feb 22, 2024

Keep the current node source.

Contributor Author

alex-spies Feb 23, 2024

I think using the source of the OrderBy is the best option - here, we're renaming some attributes for it. (Current node source contains the already overwritten/shadowed attributes, instead.)

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java Outdated

+                                  LogicalPlan plan = new Eval(Source.EMPTY, orderBy.child(), newAliases);
+                                  plan = eval.replaceChild(plan);
+                                  plan = new OrderBy(orderBy.source(), plan, newOrder);
+                                  plan = new EsqlProject(Source.EMPTY, plan, eval.output());

Member

costin Feb 22, 2024

Use Project directly and preserve the source.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/OptimizerRules.java Outdated

+                              AttributeSet generates = new AttributeSet(Expressions.asAttributes(enrichFields));
+                              // In case of aliases we generate both the alias and its target.
+                              // E.g. in ENRICH policy ON field WITH alias = enrich_field
+                              // we generate both `alias` and `enrich_field`.

Member

costin Feb 22, 2024

This looks incorrect as enrich_field is hidden behind alias.

Contributor Author

alex-spies Feb 23, 2024

Yeah, it's weird that we have to do this. I found a place in the physical optimizer that needs to handle Enrich in a similarly weird way.

To avoid cementing this weirdness, I'll remove this from the PR, at the cost of having to disable the dependency checker again. I'll follow this up with another PR, after investigating the weird handling of attributes in Enrich.

Member

costin Feb 23, 2024

Raise an issue for the Enrich problem to make sure it doesn't get lost.

Contributor Author

alex-spies Feb 26, 2024

There we go! #105807

x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/expression/NameId.java Outdated Show resolved Hide resolved

alex-spies added 2 commits

February 23, 2024 11:50


          Merge remote-tracking branch 'upstream/main' into fix-pushdownregexev…

7c2fbdb

…al-shadowing


          Add test case without KEEP at the end

1d88d42

alex-spies added 3 commits

February 23, 2024 12:17


          Fix typo

a948b88


          Remove NameId.toLong

a616e9d


          Factor 3 PushDown rules into single helper

5f2fe3a

Contributor Author

alex-spies commented Feb 23, 2024

Add more tests also some with references sort x | eval y = x, x = y, chained to guarantee the nature of the plan before the rule kicks in.

I'll add a CSV test for this; the added LogicalPlanOptimizerTests.testPushdownWithOverwrittenName already does something like this, albeit inside more complicated expressions for good measure (SORT 13*(emp_no+salary) ASC, -salary DESC | EVAL emp_no = 3*emp_no, salary = -2*emp_no-salary).

alex-spies added 5 commits

February 23, 2024 14:33


          Add csv test for shadowing by chained evaluations

2e67337


          Preserve Source, use computeIfAbsent

382bb57


          Some more cleanup

6db2cf2


          Disable logical DEPENDENCY_CHECK again

ce864c3


          Spotless

278c0d4

alex-spies requested a review from costin

February 23, 2024 16:59

alex-spies commented

View reviewed changes

Contributor Author

alex-spies left a comment

Thanks for your remarks, @costin and @luigidellaquila ! Should be ready for the next round, now :)

costin approved these changes

View reviewed changes

Member

costin left a comment

Thanks - the improvements are visible. LGTM

alex-spies added auto-backport-and-merge and removed backport labels

elasticsearchmachine added the Team:Analytics label

Collaborator

elasticsearchmachine commented Feb 26, 2024

Pinging @elastic/es-analytical-engine (Team:Analytics)

alex-spies added the >bug label


          Update docs/changelog/105650.yaml

04fb597

Collaborator

elasticsearchmachine commented Feb 26, 2024

Hi @alex-spies, I've created a changelog YAML for you.

alex-spies added the auto-merge label


          Merge remote-tracking branch 'upstream/main' into fix-pushdownregexev…

5dc8163

…al-shadowing

elasticsearchmachine merged commit d507072 into elastic:main

14 checks passed

alex-spies deleted the fix-pushdownregexeval-shadowing branch

February 26, 2024 11:38

alex-spies mentioned this pull request

[8.13] ESQL: Fix wrong attribute shadowing in pushdown rules (#105650) #105808

Merged

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request


          ESQL: Fix wrong attribute shadowing in pushdown rules (elastic#105650)

3ef53dc

Fix elastic#105434

Fixes accidental shadowing when pushing down `GROK`/`DISSECT`, `EVAL` or
`ENRICH` past a `SORT`.

Example for how this works:

```
...
| SORT x
| EVAL x = y
...

pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into

...
| EVAL $$x = x
| EVAL x = y
| SORT $$x
| DROP $$x
...
```

The same logic is applied to `GROK`/`DISSECT` and `ENRICH`.

This allows to re-enable the dependency checker (after fixing a small
bug in it when handling `ENRICH`).

Collaborator

elasticsearchmachine commented Feb 26, 2024

💚 Backport successful

Status	Branch	Result
✅	8.13

elasticsearchmachine pushed a commit that referenced this pull request


          [8.13] ESQL: Fix wrong attribute shadowing in pushdown rules (#105650) (

a7ef700

#105808)

* ESQL: Fix wrong attribute shadowing in pushdown rules (#105650)

Fix #105434

Fixes accidental shadowing when pushing down `GROK`/`DISSECT`, `EVAL` or
`ENRICH` past a `SORT`.

Example for how this works:

```
...
| SORT x
| EVAL x = y
...

pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into

...
| EVAL $$x = x
| EVAL x = y
| SORT $$x
| DROP $$x
...
```

The same logic is applied to `GROK`/`DISSECT` and `ENRICH`.

This allows to re-enable the dependency checker (after fixing a small
bug in it when handling `ENRICH`).

* Make OptimizerRules compile again

astefan mentioned this pull request

ESQL: allow sorting by expressions and not only regular fields #107158

Merged

astefan mentioned this pull request

ESQL: sorting on union types creates columns with the same name #109916

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment