ESQL: Limit when we push topn to lucene #134497

nik9000 · 2025-09-10T19:23:06Z

Right now we push all topn operations to lucene if possible. But Lucene was not written to handle a topn of 100,000. It's very fast, but it allocates more memory than we'd like. This limits the size of the topns that we push to lucene to the 10,000, which is the default window size limit. We'll run a regular lucene scan with our own in-engine topn instead. That's designed to scan huge numbers of documents. It doesn't have the nice min_competitive optimization. But it tracks memory very well.

elasticsearchmachine · 2025-09-10T19:23:31Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2025-09-10T19:23:31Z

Hi @nik9000, I've created a changelog YAML for you.

dnhatn

LGTM, thanks Nik!

…it_lucene_pushdown

alex-spies

Only quickly skimmed the change to PushTopNToSource. Looks alright to me.

alex-spies · 2025-09-11T08:55:54Z

.../main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushTopNToSource.java

+        if (child instanceof EvalExec evalExec
+            && evalExec.child() instanceof EsQueryExec queryExec
+            && queryExec.canPushSorts()
+            && canPushLimit(topNExec, physicalSettings)) {


Looks right to me, but I think @craigtaverner should have a look, too. I don't remember if it's bad when we somehow cannot push a WHERE dist < 10 | EVAL ST_DISTANCE = (...) to Lucene. Craig, we don't end up with un-executable queries, they're just slow when we cannot push, right?

Correct. Failing to push down ST_DISTANCE will not cause the query to fail, just run slow. And as I understand it this PR will only block pushdown for extremely large LIMIT values, which are an extreme case anyway.

alex-spies · 2025-09-11T08:56:00Z

.../main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushTopNToSource.java

            && queryExec.canPushSorts()
-            && canPushDownOrders(topNExec.order(), lucenePushdownPredicates)) {
+            && canPushDownOrders(topNExec.order(), lucenePushdownPredicates)
+            && canPushLimit(topNExec, physicalSettings)) {


Looks correct.

craigtaverner

LGTM. I also ran some local ST_DISTANCE benchmarks on this branch and saw no changes, which is what I would expect since we don't benchmark very large SORT/LIMIT. But at least it rules out that this change inadvertently breaks ST_DISTANCE pushdown for the cases we've benchmarked.

nik9000 · 2025-09-11T12:40:48Z

But at least it rules out that this change inadvertently breaks ST_DISTANCE pushdown for the cases we've benchmarked.

The thing they broke at first was that they removed the lucene sort with extremely large limits but didn't retain engine sort. I added some more tests to catch that.

But, great news on the benchmark!

Thanks for looking friends. I'll try and merge this.

…it_lucene_pushdown

idegtiarenko · 2025-09-17T06:33:10Z

.../main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushTopNToSource.java


+    private static boolean canPushLimit(TopNExec topn, PhysicalSettings physicalSettings) {
+        return topn.limit() instanceof Literal l && ((Number) l.value()).intValue() <= physicalSettings.luceneTopNLimit();
+    }


NIT: this could be simplified with

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/Foldables.java

Lines 99 to 107 in 2a1176a

public static Integer limitValue(Expression limitField, String sourceText) {

if (limitField instanceof Literal literal) {

Object value = literal.value();

if (value instanceof Integer intValue) {

return intValue;

}

}

throw new EsqlIllegalArgumentException(format(null, "Limit value must be an integer in [{}], found [{}]", sourceText, limitField));

}

idegtiarenko · 2025-09-17T06:34:05Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/PhysicalSettings.java

+    public static final Setting<Integer> LUCENE_TOPN_LIMIT = Setting.intSetting(
+        "esql.lucene_topn_limit",
+        IndexSettings.MAX_RESULT_WINDOW_SETTING.getDefault(Settings.EMPTY),
+        -1,


We do not limit by default?

Uh, just realized IndexSettings.MAX_RESULT_WINDOW_SETTING.getDefault(Settings.EMPTY) is a default value

Right! We default to the window's default.

…it_lucene_pushdown

Right now we push all topn operations to lucene if possible. But Lucene was not written to handle a topn of 100,000. It's very fast, but it allocates more memory than we'd like. This limits the size of the topns that we push to lucene to the 10,000, which is the default window size limit. We'll run a regular lucene scan with our own in-engine topn instead. That's designed to scan huge numbers of documents. It doesn't have the nice min_competitive optimization. But it tracks memory very well.

nik9000 requested a review from dnhatn September 10, 2025 19:23

nik9000 added >bug :Analytics/ES|QL AKA ESQL v9.2.0 labels Sep 10, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 10, 2025

Update docs/changelog/134497.yaml

1ee7f0a

Merge branch 'main' into limit_lucene_pushdown

59f663a

dnhatn approved these changes Sep 10, 2025

View reviewed changes

nik9000 added 4 commits September 10, 2025 16:37

Merge branch 'main' into limit_lucene_pushdown

0da61e0

Merge remote-tracking branch 'nik9000/limit_lucene_pushdown' into lim…

b894cb7

…it_lucene_pushdown

Oh no

387d49d

Move turn off

18e2f0a

alex-spies reviewed Sep 11, 2025

View reviewed changes

craigtaverner approved these changes Sep 11, 2025

View reviewed changes

Merge branch 'main' into limit_lucene_pushdown

665df91

nik9000 enabled auto-merge (squash) September 11, 2025 12:41

nik9000 added 6 commits September 12, 2025 14:43

Merge branch 'main' into limit_lucene_pushdown

b7184b0

fix

6fa73c3

Merge remote-tracking branch 'nik9000/limit_lucene_pushdown' into lim…

cfa0bdc

…it_lucene_pushdown

Merge branch 'main' into limit_lucene_pushdown

b31bbae

Merge branch 'main' into limit_lucene_pushdown

f18dfa1

Merge branch 'main' into limit_lucene_pushdown

07e37f3

idegtiarenko reviewed Sep 17, 2025

View reviewed changes

nik9000 added 2 commits September 17, 2025 14:40

Merge branch 'main' into limit_lucene_pushdown

a9bd848

Better way

c6d05da

nik9000 added 6 commits September 17, 2025 14:58

Merge remote-tracking branch 'nik9000/limit_lucene_pushdown' into lim…

e829fa4

…it_lucene_pushdown

Merge branch 'main' into limit_lucene_pushdown

b7d4d0d

serverless

82b4f25

Merge branch 'main' into limit_lucene_pushdown

e63d68a

one more

e4bc310

Please stop

bfc240a

nik9000 merged commit eb51dc2 into elastic:main Sep 22, 2025
34 checks passed

przemekwitek mentioned this pull request Sep 23, 2025

[CI] TopNOperatorTests testSimpleWithCranky failing #135224

Closed

	public static Integer limitValue(Expression limitField, String sourceText) {
	if (limitField instanceof Literal literal) {
	Object value = literal.value();
	if (value instanceof Integer intValue) {
	return intValue;
	}
	}
	throw new EsqlIllegalArgumentException(format(null, "Limit value must be an integer in [{}], found [{}]", sourceText, limitField));
	}

ESQL: Limit when we push topn to lucene #134497

ESQL: Limit when we push topn to lucene #134497

Uh oh!

Conversation

nik9000 commented Sep 10, 2025

Uh oh!

elasticsearchmachine commented Sep 10, 2025

Uh oh!

elasticsearchmachine commented Sep 10, 2025

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

alex-spies Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

craigtaverner Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

alex-spies Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

craigtaverner left a comment

Choose a reason for hiding this comment

Uh oh!

nik9000 commented Sep 11, 2025

Uh oh!

idegtiarenko Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

nik9000 Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alex-spies Sep 11, 2025 •

edited

Loading