-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ES|QL] prune columns when using fork #137907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] prune columns when using fork #137907
Conversation
|
Hi @pmpailis, I've created a changelog YAML for you. |
708291d to
52893e6
Compare
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
run elasticsearch-ci/part-1 |
carlosdelest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Very good testing! 🔝
I have two questions regarding the pruning branches and the need to have a specific AttributeSet for ignoring IDs.
| * This is useful for Fork plans, where branches may have different Attribute IDs but share a common output schema, | ||
| * allowing equality checks of used attributes based on their names. | ||
| */ | ||
| public static class ForkBuilder extends Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you want to avoid IDs, I think you can reuse Attribute.IdIgnoringWrapper.
See here for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into that, but I think that it introduced too much clutter, as it needed to be bidirectional (and preferred to keep PruneColumns as untouched as possible).
Will take another look though as I could very well be missing something, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for a new builder, was to take advantage of pruneUnusedAndAddReferences so as to reuse as much code as possible through the existing check in
if (used.contains(attr)) ...
I'm looking into whether we can change the attributes initially passed to the builder and avoid this new instance.
...in/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneColumnsInForkBranches.java
Outdated
Show resolved
Hide resolved
...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
Show resolved
Hide resolved
…ng functionality in PruneColumns
carlosdelest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM - though I suggest that @ioanatia takes a look at it before merging as she's the FORK expert 👍
| if (childrenChanged) { | ||
| return new Fork(fork.source(), newChildren, prunedForkAttrs); | ||
| } else if (forkOutputChanged) { | ||
| return new Fork(fork.source(), fork.children(), prunedForkAttrs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the scenario where prunedForkAttrs is empty, i.e. when we might prune all columns, should we maybe also drop fork in favor of an empty local relation?
E.g. query
from test
| keep id
| fork
(where true | eval x = 1)
(where true | eval y = 2)
| eval a = 3
| drop x
| drop y
| drop id
| drop _fork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking also into this PR, i guess it would probably be best to output for all rows, instead of just returning empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's the same than the PR you linked - you can double check with @luigidellaquila
| private static LogicalPlan pruneColumnsInFork(Fork fork, AttributeSet.Builder used) { | ||
| // prune the output attributes of fork based on usage from the rest of the plan | ||
| // should exit early for UnionAll | ||
| if (fork instanceof UnionAll) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because we have not yet tested this with UnionAll? I think it's perfectly fine to have this working initially just for FORK and later on to make sure it works for UnionAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, there were some tests failing that were specific to UnionAll, and I was planning to create an issue to add support for UnionAll in a follow-up. Can spend some more time looking into this, now that the code is at a good-enough state, but imo this shouldn't be a blocker (which is why there is this early exit here).
| assertThat(secondBranchProject.projections().size(), equalTo(3)); | ||
| var evalInSecondBranch = as(secondBranchProject.child(), Eval.class); | ||
| var limitInSecondBranch = as(evalInSecondBranch.child(), Limit.class); | ||
| var localRelation = as(limitInSecondBranch.child(), LocalRelation.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for the branch that is using STATS and we see here it's being replaced with a LocalRelation.
AFAICS this is the only test that is using STATS.
Can we get another simple test that is checking how we trim the columns from aggregates:
from employees
| fork (stats a = max(salary), b = min(salary) | KEEP b, a | EVAL x = 1 )
(stats a = max(salary), b = min(salary))
| KEEP a
we should see that the resulted plan still has Aggregate logical plan in the branches, but that they have been trimmed down to only compute the a column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test to account for the above in b2e44bc
| case Aggregate agg -> pruneColumnsInAggregate(agg, usedAttrs, false); | ||
| case InlineJoin inj -> pruneColumnsInInlineJoinRight(inj, usedAttrs, recheck); | ||
| case Eval eval -> pruneColumnsInEval(eval, usedAttrs, recheck); | ||
| case Project project -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use pruneColumnsInProject directly here?
I get that we need to map the fork output attributes to the sub plan attributes.
maybe we could do that outside of the pruneSubPlan such that we don't require to pass forkOutput as an argument to pruneSubPlan, and have usedAttrs represent not Fork output attributes, but output attributes of the subplan? would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a really nice suggestion, thanks @ioanatia ! taking a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ioanatia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! 🚀
In this PR we add support for column pruning for
FORKbranches.The main idea is that we apply pruning in two steps:
FORK's output based on the actual needed attributes inPruneColumnsrule.FORKplan independently throughpruneSubPlan, with the same used params as base (i.e.FORK's output). We then proceed to compute separately any additional needed params for each branch and keep/remove plans as per usual inPruneColumns.Closes #136365