ESQL: Enable pushing down LOOKUP JOIN past Project #127776

alex-spies · 2025-05-06T18:22:04Z

Closes #119082

Assume a lookup index with fields language_code, lookup_field. We want to push down a LOOKUP JOIN past an upstream Project, like so:

FROM main_index | KEEP other_field1, language_code | LOOKUP JOIN lookup_index ON language_code 

->

\_Join[LEFT,[language_code]]
  |_Project[[other_field1, language_code]]
  |  \_EsRelation[main_index][language_code, other_field1, other_field2]
   \_EsRelation[lookup_index][LOOKUP][language_code, lookup_field]

Move the Project up from the Join's left hand branch ->

Project[[other_field1, language_code, lookup_field]]
  \_Join[LEFT,[language_code]]
    |_EsRelation[main_index][language_code, other_field1, other_field2]
    \_EsRelation[lookup_index][LOOKUP][language_code, lookup_field]

Pulling up the Project allows us to combine it with other Projects downstream, which may eliminate some lookup fields entirely. An example is the query from #119082:

FROM test
| KEEP languages, emp_no
| EVAL language_code = languages
| LOOKUP JOIN languages_lookup ON language_code
| RENAME language_name AS foo              <- the lookup field is later dropped and shouldn't be loaded at all!
| LOOKUP JOIN languages_lookup ON language_code
| DROP foo

Avoiding the early Projects also allows us to perform field extractions later - the Project ahead of the LOOKUP JOIN otherwise causes InsertFieldExtraction to load any and all fields that we need from the main index before the LOOKUP JOIN.

Like with any pushdown optimization, we have to deal with name conflicts: LOOKUP JOIN shadows any conflicting attributes if the lookup fields have the same name; in this regard, it behaves like ENRICH or EVAL.

Example: Assume the field lookup_field occurs both in lookup_index and in main_index:

FROM main_index | RENAME lookup_field AS ln | LOOKUP JOIN lookup_index ON language_code

\_Join[LEFT,[language_code]]
  |_Project[[language_code, lookup_field AS ln]]
  |  \_EsRelation[main_index][language_code, lookup_field]
   \_EsRelation[lookup_index][LOOKUP][language_code, lookup_field]

Try to move up the Project as before:

Project[[language_code, lookup_field AS ln]]]  ⚡! The original lookup_field from main_index got shadowed!
  \_Join[LEFT,[language_code]]
    |_EsRelation[main_index][language_code, lookup_field]
    \_EsRelation[lookup_index][LOOKUP][language_code, lookup_field]

There are 2 ways to deal with this:

Leave a partial Project or Eval upstream from the Join to rename conflicting attributes to some arbitrary names, then in the new Project that we place downstream from the Join, name them to the desired names.
Change the names of the attributes that LOOKUP JOIN adds.

Option 1. is not ideal, because the renaming before the LOOKUP JOIN can still trigger field extractions. This PR thus goes with 2., which is also the approach our other pushdown rules take, see here.

To implement 2., we leverage the fact that LOOKUP JOIN essentially behaves like ENRICH: thus, we can represent a LOOKUP JOIN as a unary plan node by wrapping it in a dedicated class and then we apply the same pushdown logic as to ENRICH, EVAL etc.

This requires that the (field) attributes that a LOOKUP JOIN adds to the plan can be renamed to arbitrary names, rather than using the physical field names. Ideally, we'd just use temporary qualifiers for this, but this mechanism doesn't exist yet. But! We already have field attributes with arbitrary attribute names and use them for union-typed fields; so we can do the same here and simply rename the field attributes of the EsRelation that represents the lookup index (without actually renaming the corresponding physical fields they refer to).

For this to work, we need to make sure that the compute code of LOOKUP JOIN doesn't rely on FieldAttribute#name (the, potentially arbitrary, attribute name) but rather on FieldAttribute#fieldName (the name of the physical field). There are some places in the code where we don't use #fieldName, yet - these are bugs (and won't work with union types!) and need to be fixed and backported before the bwc tests of this PR can truly pass. This is related to #127521.

alex-spies · 2025-05-06T18:23:01Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java

        List<ValuesSourceReaderOperator.FieldInfo> fields = new ArrayList<>(extractFields.size());
        for (NamedExpression extractField : extractFields) {
+            String physicalName = extractField instanceof FieldAttribute fa ? fa.fieldName()
+                : extractField instanceof Alias a ? ((NamedExpression) a.child()).name()


Needs a comment: alias and reference attribute cases only relevant for ENRICH

alex-spies · 2025-05-06T18:24:35Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

Needs a bunch of additional tests + updating the expectations of the tests inside here.

alex-spies · 2025-05-06T18:26:46Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java

+            // TODO: This probably also led to bugs for LOOKUP JOIN on a union typed field, let's add a test.
+            this(match.exactAttribute().fieldName(), input.channel(), input.type());


The diff touches multiple places that should have used field names but used attribute names, instead.

To make this PR cleaner, I think we should have a separate PR just with these fixes + corresponding tests. This should also address #127521.

…st-project

alex-spies · 2025-06-16T09:37:33Z

This approach would require that we can rename the lookup attributes that LOOKUP JOIN adds to the plan. This is not possible before 8.18.3 (will become possible only with #129355), and thus bwc between 8.18.0-8.18.2 and 8.19 would be broken; the same holds for bwc between 9.0.0-9.0.2 and 9.1.

alex-spies added 4 commits May 2, 2025 19:19

Sketch solution

e400761

Sketch it out some more

958659a

Start another approach

d1961d0

Implement the optimization and add a csv test

d15f5e4

elasticsearchmachine added the v9.1.0 label May 6, 2025

alex-spies commented May 6, 2025

View reviewed changes

alex-spies added 2 commits May 7, 2025 09:33

Update required capability for test

f810bea

Merge remote-tracking branch 'upstream/main' into pushdown-lu-join-pa…

2e93947

…st-project

alex-spies mentioned this pull request May 7, 2025

ESQL: LOOKUP JOIN push down optimizations #119082

Closed

alex-spies mentioned this pull request Jun 12, 2025

ESQL: Fix some more usages of field attribute names in LOOKUP JOIN #129355

Merged

alex-spies closed this Jun 16, 2025

alex-spies mentioned this pull request Jun 16, 2025

ESQL: Pushdown Lookup Join past Project #129503

Merged

alex-spies mentioned this pull request Jul 11, 2025

ESQL: Make query planning/optimization aware of the version of nodes/clusters #131108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ESQL: Enable pushing down LOOKUP JOIN past Project #127776

ESQL: Enable pushing down LOOKUP JOIN past Project #127776

Uh oh!

alex-spies commented May 6, 2025 •

edited

Loading

Uh oh!

alex-spies May 6, 2025

Uh oh!

alex-spies May 6, 2025

Uh oh!

alex-spies May 6, 2025

Uh oh!

alex-spies commented Jun 16, 2025

Uh oh!

Uh oh!

		// TODO: This probably also led to bugs for LOOKUP JOIN on a union typed field, let's add a test.
		this(match.exactAttribute().fieldName(), input.channel(), input.type());

ESQL: Enable pushing down LOOKUP JOIN past Project #127776

ESQL: Enable pushing down LOOKUP JOIN past Project #127776

Uh oh!

Conversation

alex-spies commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-spies May 6, 2025

Choose a reason for hiding this comment

Uh oh!

alex-spies May 6, 2025

Choose a reason for hiding this comment

Uh oh!

alex-spies May 6, 2025

Choose a reason for hiding this comment

Uh oh!

alex-spies commented Jun 16, 2025

Uh oh!

Uh oh!

alex-spies commented May 6, 2025 •

edited

Loading