Skip to content

Conversation

luigidellaquila
Copy link
Contributor

@luigidellaquila luigidellaquila commented Aug 22, 2025

Fixes: #120272

Fixing queries where all the columns from the original index are projected out.
The expected outcome is that the number of rows is preserved

This means that a query like

FROM idx
| LIMIT 3
| KEEP foo
| DROP foo             // drop all the columns
| EVAL bar = 1

that now returns nothing

bar
----

will start returning values:

bar
----
1
1
1

This will also apply to aggregations, eg.

FROM idx
| KEEP foo
| DROP foo             // drop all the columns
| EVAL bar = 1
| LIMIT 3
| STATS count = count(*)

that now returns 0, will start returning the actual count of the rows (3 in this case)

I don't know if it's breaking (IMHO it's a bug, but it still changes the behavior).

TODO

  • need to add all the queries from the original issue to CSV tests

@elasticsearchmachine
Copy link
Collaborator

Hi @luigidellaquila, I've created a changelog YAML for you.

new PruneRedundantSortClauses(),
new PruneLeftJoinOnNullMatchingField()
new PruneLeftJoinOnNullMatchingField(),
new PruneEmptyAggregates()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this because our aggs don't know how to deal with no grouping and no aggs at the same time.

// there cannot be an empty list of fields, we'll ask the simplest and lightest one instead: _index
return new PreAnalysisResult(enrichResolution, IndexResolver.INDEX_METADATA_FIELD, wildcardJoinIndices);
} else {
fieldNames.add(MetadataAttribute.INDEX);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think it would be more correct to add this field after FieldNameUtils::withSubfields is called on the next line. _index.* doesn't actually make sense, as we know there is only on _index metadata field.

@luigidellaquila luigidellaquila marked this pull request as ready for review August 25, 2025 15:06
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
However, I would wait for Andrei/Alex to check it, as the _index fix was initially discarded in favor of a more complete approach when the issue was opened.
This PR covers more areas than the initial approach though, and initially this wasn't a common issue

@astefan astefan requested a review from alex-spies August 26, 2025 13:40
@alex-spies alex-spies requested review from astefan and removed request for alex-spies September 30, 2025 08:15
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good. Some comments:

  1. from employees | keep salary | eval c = 1 | drop c, salary

returns

{
    "took": 29,
    "is_partial": false,
    "documents_found": 100,
    "values_loaded": 0,
    "columns": [],
    "values": [
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
		....

This, to me at least (as a regular user of ESQL), seems surprising. This represents a bunch of "empty" virtual rows so to speak. This seems like an implementation detail that leaks into UX, I would have expected the "no columns and Xrows" use case to be handled in a more user friendly way. This also saves some KBs of comm data if potentially this is performed on a large data set.

Conceptually speaking, we are introducing the notion of "nothing/no columns + X rows", but the impact of this concept in the UX is a bit too intrusive imho.

  1. looking at the logs of this query
ProjectExec[[<all-fields-projected>{r$}#204]]
\_EvalExec[[null[NULL] AS <all-fields-projected>#204]]
  \_EsQueryExec[employees], indexMode[standard], [_doc...

This "no columns + X rows" is represented as "" and it is a query that reaches ES in a particular way. I am wondering if this couldn't be, more performant, be implemented as a count, meaning we would ask from ES a count of rows and, potentially, on the coordinator node to "expand" this result in a "count" number of "empty" rows.

But this can regarded as food for thought for the future.

  1. I would add the following test, as well. This is, essentially, an empty result set but done with, also, dropping all columns:

from employees | keep salary | eval c = 1 | drop c, salary | where false

  1. another test(s) to add are those related to inline stats because it is also using aggregates (which were changed as part of this PR).

A simple query, which adds stuff to rows, not reduces them (like stats does) is
from employees | keep salary | drop salary | eval x = 1 | inline stats count(), max(x) | limit 3

Another one with by as well: from employees | keep salary | drop salary | eval x = 1, y = 1 | inline stats count(), max(x) by y | limit 3

  1. there is also the story around drop * which is not allowed atm.
    Should we allow it now, that we can handle a "drop all" scenario? Again, something to be put in a separate issue and have a discussion maybe about it.

  2. I would also add something that goes to more than one index:

from employees* | keep salary | drop salary | eval x = 1 | stats count()

  1. [LATER EDIT] Some tests that use fork would help:

from employees* | keep emp_no | drop emp_no | fork (stats c=count()) (stats c=count())
from employees* | keep emp_no | drop emp_no | fork (stats c=count()) (stats c=count()) | drop _fork*, c
from employees* | keep emp_no | drop emp_no | fork (stats c=count()) (stats c=count()) | drop _fork*, c | stats c=count()

blocks[i++] = column.values();
}
LocalSupplier supplier = LocalSupplier.of(blocks);
LocalSupplier supplier = LocalSupplier.of(blocks.length > 0 ? new Page(blocks) : new Page(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we could integrate the logic "if the blocks size is 0 then new Page(0) otherwise new Page(blocks)" somehow with LocalSupplier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want this in general, in some cases we want no blocks but new Page(N)

fields.forEach(f -> values.add(f.child().fold(context.foldCtx())));
var blocks = BlockUtils.fromListRow(PlannerUtils.NON_BREAKING_BLOCK_FACTORY, values);
return new LocalRelation(row.source(), row.output(), new CopyingLocalSupplier(blocks));
return new LocalRelation(row.source(), row.output(), new CopyingLocalSupplier(blocks.length == 0 ? new Page(0) : new Page(blocks)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about this common logic where a Page is built this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above


public CopyingLocalSupplier(Block[] blocks) {
delegate = new ImmediateLocalSupplier(blocks);
public CopyingLocalSupplier(Page page) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc of this class needs an update given the change of the constructor?


import java.util.List;

public final class PruneEmptyAggregates extends OptimizerRules.OptimizerRule<Aggregate> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help to have a javadoc on this rule. I mean, it is obvious what it does, but it would be helpful to read about the situations that lead to an Aggregate with no aggregates and no groupings.

// for the moment support pushing count just for one field
List<EsStatsQueryExec.Stat> stats = tuple.v2();
if (stats.size() > 1) {
if (stats.size() != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What lead to this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because now all the stats can be pruned, so also the case with size == 0 has to be taken into consideration

// there cannot be an empty list of fields, we'll ask the simplest and lightest one instead: _index
return new PreAnalysisResult(enrichResolution, IndexResolver.INDEX_METADATA_FIELD, wildcardJoinIndices);
} else {
fieldNames.add(MetadataAttribute.INDEX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think it would be more correct to add this field after FieldNameUtils::withSubfields is called on the next line. _index.* doesn't actually make sense, as we know there is only on _index metadata field.

ChangePointGenerator.INSTANCE,
DissectGenerator.INSTANCE,
DropGenerator.INSTANCE,
DropAllGenerator.INSTANCE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

@luigidellaquila
Copy link
Contributor Author

Thanks for the review @astefan, I just pushed a commit that incorporate your suggestions.

Please let me know if you have further remarks or if you think it's worth discussing it in a wider audience.

When it's done, I'll open two follow-up issues, one to allow DROP * and one for possible optimizations.

@astefan astefan self-requested a review October 2, 2025 11:29
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @luigidellaquila

@luigidellaquila luigidellaquila merged commit a399e29 into elastic:main Oct 2, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: No rows with JOIN/ENRICH/SAMPLE + EVAL + KEEP
4 participants