Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Fix missing refs due to pruning renamed grouping columns #107328

Merged
merged 8 commits into from
Apr 17, 2024

Conversation

alex-spies
Copy link
Contributor

Fix #107083, more specifically this remaining case.
Close #107166.

In some cases, CombineProjections prunes grouping columns if the column was only renamed. E.g.

...
| EVAL x = y, foo = bar
| STATS min(foo) BY x

// This is optimized to something equivalent to:

...
// EVAL pruned as it only renames vars
STATS `min(foo)` = min(bar) BY x

Notice how the x is still in the groupings. Confusingly, the query still got executed correctly in most cases.

This PR propagates the renaming x = y into the groupings. To be correct, we have to avoid propagating the same reference attribute twice into the grouping (otherwise layout errors occur), so this PR also prunes duplicate groupings:

...
| EVAL x = y, z = y
| STATS BY x, z

// This becomes something equivalent to

...
| STATS BY x = y, z = y
// and the corresponding Aggregate plan only has y in its groupings,
// and `x = y` and `z = y` in its aggregates.

@alex-spies alex-spies requested a review from costin April 10, 2024 17:08
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies
Copy link
Contributor Author

@elasticmachine update branch

@alex-spies alex-spies added the auto-backport Automatically create backport pull requests when merged label Apr 11, 2024
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@@ -432,7 +432,7 @@ public void testCombineProjectionWithAggregationAndEval() {
var limit = as(plan, Limit.class);
var agg = as(limit.child(), Aggregate.class);
assertThat(Expressions.names(agg.aggregates()), contains("s", "last_name", "first_name", "k"));
assertThat(Expressions.names(agg.groupings()), contains("last_name", "first_name", "k"));
assertThat(Expressions.names(agg.groupings()), contains("last_name", "first_name"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

for (Expression expr : upperGroupings) {
// All substitutions happen before; groupings must be attributes at this point.
Attribute attr = (Attribute) expr;
replaced.add((Attribute) aliases.resolve(attr, attr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead to a CCE - if there's always an attribute inside the alias (what about c = a + 1), make the aliases be a AttributeMap<Attribute>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't cause CCE in practice, but I can use an AttributeMap<Attribute>. (Still need to cast the expressions under the projection's aliases to attributes, then.)

(I'd prefer not having to cast at all, but our Aggs/Projections don't adequately represent the fact that we don't have general expressions here, but attributes.)

@@ -482,6 +487,27 @@ private static List<NamedExpression> combineProjections(
return replaced;
}

private static List<Expression> combineUpperGroupingsAndLowerProjections(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks quite similar to combineProjections() which is the reused inside projectAggregations().
Have you looked into reusing them on groupings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially wanted to re-use this, but I need to make some assumptions (in the form of casting expressions) that are only true if the upper expressions are groupings. I can probably still combine these, but that might become confusing to read.

@costin costin requested review from fang-xing-esql and removed request for luigidellaquila April 15, 2024 07:01
@alex-spies alex-spies requested a review from costin April 16, 2024 14:41
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm.

a.source(),
p.child(),
// After substitutions groupings can only contain attributes.
combineUpperGroupingsAndLowerProjections((List<? extends Attribute>) (List<?>) a.groupings(), p.projections()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
combineUpperGroupingsAndLowerProjections((List<? extends Attribute>) (List<?>) a.groupings(), p.projections()),
combineUpperGroupingsAndLowerProjections((List<? extends Attribute>) a.groupings(), p.projections()),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, this suggestion is wrong.
Why the double cast though and not call the combining function with a List<Expression> and cast downstream, since groupings are expected as attributes? Same cast as for the projection list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the double casting is ugly indeed, but necessary as long as we have a list.

Initial version cast further downstream, but moved that up and required Attribute instead of Expression per Costin's request.

In case our assumptions about the type end up being wrong, it's going to blow up no matter what and roughly in the same spot; IMHO the way to improve this properly is to bake the invariant that groupings are attributes into our types; but that requires a refactoring that is currently not possible due to reliance on the ql project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, explicitly casting each grouping now per Andrei's suggestion.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion. Otherwise LGTM

alex-spies and others added 2 commits April 17, 2024 15:07
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
@alex-spies alex-spies merged commit adaa476 into elastic:main Apr 17, 2024
14 checks passed
@alex-spies alex-spies deleted the esql-fix-missing-refs-in-aggs branch April 17, 2024 14:04
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.13 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 107328

@alex-spies
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.13

Questions ?

Please refer to the Backport tool documentation

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Apr 18, 2024
…ic#107328)

Sometimes, CombineProjections does not correctly update an aggregation's groupings when combining with a preceding projection.
Fix this by resolving any aliases used in the groupings and de-duplicating them.

---------

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
(cherry picked from commit adaa476)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
elasticsearchmachine pushed a commit that referenced this pull request Apr 18, 2024
…) (#107599)

Sometimes, CombineProjections does not correctly update an aggregation's groupings when combining with a preceding projection.
Fix this by resolving any aliases used in the groupings and de-duplicating them.

---------

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
(cherry picked from commit adaa476)

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.13.3 v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Prune duplicate groups hidden behind aliases DROP of EVAL planning issues
6 participants