-
Notifications
You must be signed in to change notification settings - Fork 25.5k
ES|QL: Fix wrong pruning of plans with no output columns #133405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: Fix wrong pruning of plans with no output columns #133405
Conversation
Hi @luigidellaquila, I've created a changelog YAML for you. |
new PruneRedundantSortClauses(), | ||
new PruneLeftJoinOnNullMatchingField() | ||
new PruneLeftJoinOnNullMatchingField(), | ||
new PruneEmptyAggregates() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this because our aggs don't know how to deal with no grouping and no aggs at the same time.
// there cannot be an empty list of fields, we'll ask the simplest and lightest one instead: _index | ||
return new PreAnalysisResult(enrichResolution, IndexResolver.INDEX_METADATA_FIELD, wildcardJoinIndices); | ||
} else { | ||
fieldNames.add(MetadataAttribute.INDEX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think it would be more correct to add this field after FieldNameUtils::withSubfields
is called on the next line. _index.*
doesn't actually make sense, as we know there is only on _index
metadata field.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
However, I would wait for Andrei/Alex to check it, as the _index
fix was initially discarded in favor of a more complete approach when the issue was opened.
This PR covers more areas than the initial approach though, and initially this wasn't a common issue
x-pack/plugin/esql/qa/testFixtures/src/main/resources/drop.csv-spec
Outdated
Show resolved
Hide resolved
...c/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterLookupJoinIT.java
Show resolved
Hide resolved
…to esql/fix_no_columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good. Some comments:
from employees | keep salary | eval c = 1 | drop c, salary
returns
{
"took": 29,
"is_partial": false,
"documents_found": 100,
"values_loaded": 0,
"columns": [],
"values": [
[],
[],
[],
[],
[],
[],
[],
[],
[],
....
This, to me at least (as a regular user of ESQL), seems surprising. This represents a bunch of "empty" virtual rows so to speak. This seems like an implementation detail that leaks into UX, I would have expected the "no columns and Xrows" use case to be handled in a more user friendly way. This also saves some KBs of comm data if potentially this is performed on a large data set.
Conceptually speaking, we are introducing the notion of "nothing/no columns + X rows", but the impact of this concept in the UX is a bit too intrusive imho.
- looking at the logs of this query
ProjectExec[[<all-fields-projected>{r$}#204]]
\_EvalExec[[null[NULL] AS <all-fields-projected>#204]]
\_EsQueryExec[employees], indexMode[standard], [_doc...
This "no columns + X rows" is represented as "" and it is a query that reaches ES in a particular way. I am wondering if this couldn't be, more performant, be implemented as a count
, meaning we would ask from ES a count of rows and, potentially, on the coordinator node to "expand" this result in a "count" number of "empty" rows.
But this can regarded as food for thought for the future.
- I would add the following test, as well. This is, essentially, an empty result set but done with, also, dropping all columns:
from employees | keep salary | eval c = 1 | drop c, salary | where false
- another test(s) to add are those related to
inline stats
because it is also using aggregates (which were changed as part of this PR).
A simple query, which adds stuff to rows, not reduces them (like stats
does) is
from employees | keep salary | drop salary | eval x = 1 | inline stats count(), max(x) | limit 3
Another one with by
as well: from employees | keep salary | drop salary | eval x = 1, y = 1 | inline stats count(), max(x) by y | limit 3
-
there is also the story around
drop *
which is not allowed atm.
Should we allow it now, that we can handle a "drop all" scenario? Again, something to be put in a separate issue and have a discussion maybe about it. -
I would also add something that goes to more than one index:
from employees* | keep salary | drop salary | eval x = 1 | stats count()
- [LATER EDIT] Some tests that use
fork
would help:
from employees* | keep emp_no | drop emp_no | fork (stats c=count()) (stats c=count())
from employees* | keep emp_no | drop emp_no | fork (stats c=count()) (stats c=count()) | drop _fork*, c
from employees* | keep emp_no | drop emp_no | fork (stats c=count()) (stats c=count()) | drop _fork*, c | stats c=count()
blocks[i++] = column.values(); | ||
} | ||
LocalSupplier supplier = LocalSupplier.of(blocks); | ||
LocalSupplier supplier = LocalSupplier.of(blocks.length > 0 ? new Page(blocks) : new Page(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we could integrate the logic "if the blocks size is 0 then new Page(0)
otherwise new Page(blocks)
" somehow with LocalSupplier
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want this in general, in some cases we want no blocks but new Page(N)
fields.forEach(f -> values.add(f.child().fold(context.foldCtx()))); | ||
var blocks = BlockUtils.fromListRow(PlannerUtils.NON_BREAKING_BLOCK_FACTORY, values); | ||
return new LocalRelation(row.source(), row.output(), new CopyingLocalSupplier(blocks)); | ||
return new LocalRelation(row.source(), row.output(), new CopyingLocalSupplier(blocks.length == 0 ? new Page(0) : new Page(blocks))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about this common logic where a Page is built this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
|
||
public CopyingLocalSupplier(Block[] blocks) { | ||
delegate = new ImmediateLocalSupplier(blocks); | ||
public CopyingLocalSupplier(Page page) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The javadoc of this class needs an update given the change of the constructor?
|
||
import java.util.List; | ||
|
||
public final class PruneEmptyAggregates extends OptimizerRules.OptimizerRule<Aggregate> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help to have a javadoc on this rule. I mean, it is obvious what it does, but it would be helpful to read about the situations that lead to an Aggregate with no aggregates and no groupings.
// for the moment support pushing count just for one field | ||
List<EsStatsQueryExec.Stat> stats = tuple.v2(); | ||
if (stats.size() > 1) { | ||
if (stats.size() != 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What lead to this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because now all the stats can be pruned, so also the case with size == 0 has to be taken into consideration
// there cannot be an empty list of fields, we'll ask the simplest and lightest one instead: _index | ||
return new PreAnalysisResult(enrichResolution, IndexResolver.INDEX_METADATA_FIELD, wildcardJoinIndices); | ||
} else { | ||
fieldNames.add(MetadataAttribute.INDEX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think it would be more correct to add this field after FieldNameUtils::withSubfields
is called on the next line. _index.*
doesn't actually make sense, as we know there is only on _index
metadata field.
ChangePointGenerator.INSTANCE, | ||
DissectGenerator.INSTANCE, | ||
DropGenerator.INSTANCE, | ||
DropAllGenerator.INSTANCE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
Thanks for the review @astefan, I just pushed a commit that incorporate your suggestions. Please let me know if you have further remarks or if you think it's worth discussing it in a wider audience. When it's done, I'll open two follow-up issues, one to allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @luigidellaquila
Fixes: #120272
Fixing queries where all the columns from the original index are projected out.
The expected outcome is that the number of rows is preserved
This means that a query like
that now returns nothing
will start returning values:
This will also apply to aggregations, eg.
that now returns
0
, will start returning the actual count of the rows (3
in this case)I don't know if it's breaking (IMHO it's a bug, but it still changes the behavior).
TODO