Skip to content

Conversation

@pmpailis
Copy link
Contributor

@pmpailis pmpailis commented Nov 11, 2025

In this PR we add support for column pruning for FORK branches.

The main idea is that we apply pruning in two steps:

  • First, we limit FORK's output based on the actual needed attributes in PruneColumns rule.
  • Then, based on these attributes, we proceed to prune each branch of the FORK plan independently through pruneSubPlan, with the same used params as base (i.e. FORK's output). We then proceed to compute separately any additional needed params for each branch and keep/remove plans as per usual in PruneColumns.

Closes #136365

@pmpailis pmpailis added :Search Relevance/ES|QL Search functionality in ES|QL >bug labels Nov 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @pmpailis, I've created a changelog YAML for you.

@pmpailis pmpailis force-pushed the fix_136365_prune_columns_when_fork branch from 708291d to 52893e6 Compare November 26, 2025 09:22
@pmpailis pmpailis marked this pull request as ready for review November 26, 2025 10:56
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Nov 26, 2025
@pmpailis
Copy link
Contributor Author

run elasticsearch-ci/part-1

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Very good testing! 🔝

I have two questions regarding the pruning branches and the need to have a specific AttributeSet for ignoring IDs.

* This is useful for Fork plans, where branches may have different Attribute IDs but share a common output schema,
* allowing equality checks of used attributes based on their names.
*/
public static class ForkBuilder extends Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you want to avoid IDs, I think you can reuse Attribute.IdIgnoringWrapper.

See here for an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking into that, but I think that it introduced too much clutter, as it needed to be bidirectional (and preferred to keep PruneColumns as untouched as possible).

Will take another look though as I could very well be missing something, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for a new builder, was to take advantage of pruneUnusedAndAddReferences so as to reuse as much code as possible through the existing check in

   if (used.contains(attr)) ...

I'm looking into whether we can change the attributes initially passed to the builder and avoid this new instance.

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM - though I suggest that @ioanatia takes a look at it before merging as she's the FORK expert 👍

if (childrenChanged) {
return new Fork(fork.source(), newChildren, prunedForkAttrs);
} else if (forkOutputChanged) {
return new Fork(fork.source(), fork.children(), prunedForkAttrs);
Copy link
Contributor Author

@pmpailis pmpailis Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scenario where prunedForkAttrs is empty, i.e. when we might prune all columns, should we maybe also drop fork in favor of an empty local relation?

E.g. query

       from test
        | keep id
        | fork
            (where true | eval x = 1)
            (where true | eval y = 2)
        | eval a = 3
        | drop x
        | drop y
        | drop id
        | drop _fork

Copy link
Contributor Author

@pmpailis pmpailis Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking also into this PR, i guess it would probably be best to output for all rows, instead of just returning empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's the same than the PR you linked - you can double check with @luigidellaquila

private static LogicalPlan pruneColumnsInFork(Fork fork, AttributeSet.Builder used) {
// prune the output attributes of fork based on usage from the rest of the plan
// should exit early for UnionAll
if (fork instanceof UnionAll) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because we have not yet tested this with UnionAll? I think it's perfectly fine to have this working initially just for FORK and later on to make sure it works for UnionAll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, there were some tests failing that were specific to UnionAll, and I was planning to create an issue to add support for UnionAll in a follow-up. Can spend some more time looking into this, now that the code is at a good-enough state, but imo this shouldn't be a blocker (which is why there is this early exit here).

assertThat(secondBranchProject.projections().size(), equalTo(3));
var evalInSecondBranch = as(secondBranchProject.child(), Eval.class);
var limitInSecondBranch = as(evalInSecondBranch.child(), Limit.class);
var localRelation = as(limitInSecondBranch.child(), LocalRelation.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for the branch that is using STATS and we see here it's being replaced with a LocalRelation.
AFAICS this is the only test that is using STATS.

Can we get another simple test that is checking how we trim the columns from aggregates:

            from employees
            | fork (stats a = max(salary), b = min(salary) | KEEP b, a | EVAL x = 1 )
                   (stats a = max(salary), b = min(salary))
            | KEEP a

we should see that the resulted plan still has Aggregate logical plan in the branches, but that they have been trimmed down to only compute the a column

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test to account for the above in b2e44bc

case Aggregate agg -> pruneColumnsInAggregate(agg, usedAttrs, false);
case InlineJoin inj -> pruneColumnsInInlineJoinRight(inj, usedAttrs, recheck);
case Eval eval -> pruneColumnsInEval(eval, usedAttrs, recheck);
case Project project -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use pruneColumnsInProject directly here?
I get that we need to map the fork output attributes to the sub plan attributes.
maybe we could do that outside of the pruneSubPlan such that we don't require to pass forkOutput as an argument to pruneSubPlan, and have usedAttrs represent not Fork output attributes, but output attributes of the subplan? would that work?

Copy link
Contributor Author

@pmpailis pmpailis Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a really nice suggestion, thanks @ioanatia ! taking a look!

Copy link
Contributor Author

@pmpailis pmpailis Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a take in this in cc68e74 . As part of the refactoring I ended up removing ForkBuilder and pruneSubPlan altogether. @ioanatia lmk what you think!

Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! 🚀

@pmpailis pmpailis merged commit 80748a4 into elastic:main Dec 2, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL: PruneColumns fails to prune columns when forked

4 participants