Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Fix wrong attribute shadowing in pushdown rules #105650

Merged

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Feb 20, 2024

Fix #105434

Fixes accidental shadowing when pushing down GROK/DISSECT, EVAL or ENRICH past a SORT.

Example for how this works:

...
| SORT x
| EVAL x = y
...

pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into

...
| EVAL $$x = x
| EVAL x = y
| SORT $$x
| DROP $$x
...

The same logic is applied to GROK/DISSECT and ENRICH.

This allows to re-enable the dependency checker (after fixing a small bug in it when handling ENRICH).

@alex-spies alex-spies force-pushed the fix-pushdownregexeval-shadowing branch 3 times, most recently from d2fe543 to 74ec036 Compare February 20, 2024 14:13
@alex-spies alex-spies force-pushed the fix-pushdownregexeval-shadowing branch from 74ec036 to 726fee1 Compare February 20, 2024 15:30
@alex-spies
Copy link
Contributor Author

I tried an alternative solution as well which turned SORT x | EVAL x = y into EVAL $$x = y | SORT x | RENAME $$x AS x, but that doesn't work easily for DISSECT, as turning DISSECT some_field "%{x} ..." into DISSECT some_field "%{$$x} ..." requires messing with the DISSECT pattern and references, which appeared quite finicky.

@@ -145,6 +145,19 @@ full_name:keyword | emp_no:keyword | b:keyword
Bezalel Simmel | Bezalel | Simmel
;

overwriteNameAfterSort#[skip:-8.13.0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should backport this bugfix to 8.13, so I'm setting the skip labels accordingly.

@alex-spies alex-spies marked this pull request as ready for review February 21, 2024 17:57
@alex-spies
Copy link
Contributor Author

Also thanks to @luigidellaquila for suggesting this approach, which updates the attributes in the SORT, over trying to update the references created by EVAL/GROK/DISSECT/ENRICH.

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alex-spies, looks pretty good.
I left only one comment about possible improvements on tests

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - the code is duplicated 3 times so please encapsulate it.
Add more tests also some with references sort x | eval y = x, x = y, chained to guarantee the nature of the plan before the rule kicks in.

if (existingAlias == null) {
String tempName = SubstituteSurrogates.rawTemporaryName(attr.name(), "temp_name", (int) attr.id().toLong());

Alias tempNameForShadowedAttr = new Alias(Source.EMPTY, tempName, attr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserve the source (useful to see where this came from) and set the synthetic parameter to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this should be synthetic. In particular, it should not be computed on a data node, that would just send duplicate columns over the wire.

However, currently this does not just work; if I mark these aliases as synth, the ProjectAwayColumns physical optimizer rule prunes them, but I get an NPE because now they are seemingly computed nowhere. (Happens precisely here.)

This requires additional investigation to get right. How about I create a separate issue for that? For the time being, I'll mark it as TODO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please raise an issue to double check the behaviour in ProjectAwayColumns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go: #105821


AttributeMap<Alias> aliasesForShadowedOrderByAttrs = nonShadowedOrders.replacedAttributes;
@SuppressWarnings("unchecked")
List<Order> newOrder = (List<Order>) (Object) nonShadowedOrders.rewrittenExpressions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(List<Order>) (Object)? Everything is an object - if anything cast the object to List directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting directly does not compile (Inconvertible types), but I can make this slightly less ugly by using List<?> instead of Object.

if (aliasesForShadowedOrderByAttrs.isEmpty() == false) {
List<Alias> newAliases = aliasesForShadowedOrderByAttrs.values().stream().toList();

LogicalPlan plan = new Eval(Source.EMPTY, orderBy.child(), newAliases);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the current node source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the source of the OrderBy is the best option - here, we're renaming some attributes for it. (Current node source contains the already overwritten/shadowed attributes, instead.)

LogicalPlan plan = new Eval(Source.EMPTY, orderBy.child(), newAliases);
plan = eval.replaceChild(plan);
plan = new OrderBy(orderBy.source(), plan, newOrder);
plan = new EsqlProject(Source.EMPTY, plan, eval.output());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Project directly and preserve the source.

AttributeSet generates = new AttributeSet(Expressions.asAttributes(enrichFields));
// In case of aliases we generate both the alias and its target.
// E.g. in ENRICH policy ON field WITH alias = enrich_field
// we generate both `alias` and `enrich_field`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect as enrich_field is hidden behind alias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's weird that we have to do this. I found a place in the physical optimizer that needs to handle Enrich in a similarly weird way.

To avoid cementing this weirdness, I'll remove this from the PR, at the cost of having to disable the dependency checker again. I'll follow this up with another PR, after investigating the weird handling of attributes in Enrich.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise an issue for the Enrich problem to make sure it doesn't get lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There we go! #105807

@alex-spies
Copy link
Contributor Author

Add more tests also some with references sort x | eval y = x, x = y, chained to guarantee the nature of the plan before the rule kicks in.

I'll add a CSV test for this; the added LogicalPlanOptimizerTests.testPushdownWithOverwrittenName already does something like this, albeit inside more complicated expressions for good measure (SORT 13*(emp_no+salary) ASC, -salary DESC | EVAL emp_no = 3*emp_no, salary = -2*emp_no-salary).

Copy link
Contributor Author

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your remarks, @costin and @luigidellaquila ! Should be ready for the next round, now :)

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - the improvements are visible. LGTM

@alex-spies alex-spies added auto-backport-and-merge Automatically create backport pull requests and merge when ready and removed backport labels Feb 26, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies alex-spies added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 26, 2024
@elasticsearchmachine elasticsearchmachine merged commit d507072 into elastic:main Feb 26, 2024
14 checks passed
@alex-spies alex-spies deleted the fix-pushdownregexeval-shadowing branch February 26, 2024 11:38
alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Feb 26, 2024
Fix elastic#105434

Fixes accidental shadowing when pushing down `GROK`/`DISSECT`, `EVAL` or
`ENRICH` past a `SORT`.

Example for how this works:

```
...
| SORT x
| EVAL x = y
...

pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into

...
| EVAL $$x = x
| EVAL x = y
| SORT $$x
| DROP $$x
...
```

The same logic is applied to `GROK`/`DISSECT` and `ENRICH`.

This allows to re-enable the dependency checker (after fixing a small
bug in it when handling `ENRICH`).
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.13

elasticsearchmachine pushed a commit that referenced this pull request Feb 26, 2024
#105808)

* ESQL: Fix wrong attribute shadowing in pushdown rules (#105650)

Fix #105434

Fixes accidental shadowing when pushing down `GROK`/`DISSECT`, `EVAL` or
`ENRICH` past a `SORT`.

Example for how this works:

```
...
| SORT x
| EVAL x = y
...

pushing this down just like that would be incorrect as x is used in the SORT, so we turn this essentially into

...
| EVAL $$x = x
| EVAL x = y
| SORT $$x
| DROP $$x
...
```

The same logic is applied to `GROK`/`DISSECT` and `ENRICH`.

This allows to re-enable the dependency checker (after fixing a small
bug in it when handling `ENRICH`).

* Make OptimizerRules compile again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport-and-merge Automatically create backport pull requests and merge when ready auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.1 v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: PushdownRegexExtract doesn't take into account shadow attributes
4 participants