Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Nov 13, 2025

Fuses the MV_MIN and MV_MAX functions to block loading for a few types. And documents the fusion process so folks doing similar work have a recipe to follow. Like LENGTH, this is not often going to be a super hot function but it is an "obvious" one to fuse because it reduces the data we load and it's fairly easy to implement on top of the loaders we already have. Queries like:

FROM test | STATS SUM(MV_MAX(i))

over ten million docs go from ~84ms to ~56ms. So, like, 33% improvement.

nik9000 added 8 commits June 24, 2022 14:13
Adds REST tests for the `percentiles_bucket` pipeline bucket
aggregation. This gives us forwards and backwards compatibility tests
for these aggs as well as mixed version cluster tests for these aggs.

Relates to elastic#26220
@nik9000 nik9000 marked this pull request as draft November 13, 2025 15:28
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@nik9000
Copy link
Member Author

nik9000 commented Nov 13, 2025

I've flipped this to draft because I haven't done a full survey of the csv-spec functions for coverage around this issue.

@nik9000 nik9000 marked this pull request as ready for review November 24, 2025 18:48
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member Author

nik9000 commented Nov 25, 2025

I've removed this from draft because I've added a bunch more MV_MIN and MV_MAX tests. We could really use more around half_float, short, float, and byte which I added as a note to #137679.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this works and it gives a very good speedup. I left just one comment, and someone else should review too. But other than that LGTM

- class: org.elasticsearch.xpack.esql.heap_attack.HeapAttackLookupJoinIT
method: testLookupExplosionBigString
issue: https://github.com/elastic/elasticsearch/issues/138510
- class: org.elasticsearch.xpack.esql.qa.single_node.GenerativeForkIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be muted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sorry. Had a pending comment explaining it that I hadn't posted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those queries are now "optimized incorrectly". I've got one fix for the rule in #138544 so I'm going to get that and this in and work on that query. I'm calling this mute another thing that block release of the field fusion work.

issue: https://github.com/elastic/elasticsearch/issues/138510
- class: org.elasticsearch.xpack.esql.qa.single_node.GenerativeForkIT
method: test {csv-spec:inlinestats.MvMinMvExpand}
issue: https://github.com/elastic/elasticsearch/issues/137679
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tracked in the meta-issue blocking deployment of pushing functions to field loads: #137679

@nik9000 nik9000 enabled auto-merge (squash) November 25, 2025 16:39
Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice docs. Thank you.

* Interface for loading data in a block shape. Instances of this class
* must be immutable and thread safe.
* Loads values from a chunk of lucene documents into a "Block" for the compute engine.
* <p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My past self would have loved this. Thanks for the comments, will help my future self when my present self forgets about this.

* <li>{@code STATS} or another function - these <strong>won't</strong> use your new code</li>
* </ul>
* <p>
* It's <strong>fairly</strong> likely we already have tests for all these cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about FORK, LOOKUP JOIN and subqueries? Are we expecting them to work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that, once we fix FORK, LOOKUP JOIN, and subqueries, the same solution will apply to pushing all functions. So we won't need a test for every single function. We'll need some with a couple of representative functions. But I wrote this from the perspective of someone writing a new fused expression a few weeks from now. That's..... aspirational!

Comment on lines +140 to +141
* "fused" into field loading using {@link BlockLoaderExpression}. Functions like {@code V_COSINE}
* can use the vector search index to compute the result. Functions like {@code MV_MIN} can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exactly true - we avoid loading the vector into blocks by calculating just the single float coming out of a similarity function. The function is not calculated as part of the data structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought that was the plan and hadn't looked super close. I remember Ben saying something about being able to push down to lucene somehow if we knew the doc ids were in ascending order.

@nik9000 nik9000 disabled auto-merge November 25, 2025 18:14
@nik9000 nik9000 enabled auto-merge (squash) November 25, 2025 18:18
didn't mean to have this in this PR
@nik9000 nik9000 merged commit e6c5dcc into elastic:main Nov 25, 2025
34 checks passed
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025
Adds REST tests for the `percentiles_bucket` pipeline bucket
aggregation. This gives us forwards and backwards compatibility tests
for these aggs as well as mixed version cluster tests for these aggs.

Relates to elastic#26220
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants