ComputeService should not acquire and hold all SearchContexts at once #103666

dnhatn · 2023-12-21T19:17:52Z

If an ESQL query hits a thousand shards, then the ComputeService acquires and holds all search contexts. This can use a massive number of file descriptors and memory. I think we should rework the ComputeService to handle these incrementally.

elasticsearchmachine · 2023-12-21T19:18:15Z

Pinging @elastic/es-ql (Team:QL)

elasticsearchmachine · 2023-12-21T19:18:15Z

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

nik9000 · 2023-12-26T14:15:51Z

I had a look right before Christmas and it's a little tangled. There's two tricky things I've found so far:

We use the IndexReader for a bunch of optimizations very very early.
We have no good way for the Lucene sources to signal that they've produced the last block for shard and no way to pick that signal up in the values reader operator and ordinals agg operator.

So far, those are two things I can think of. I think we really do want something around delaying building the infrastructure for field loading until we see a document from a segment. We may even want this "start/stop" segment signaling too.

I wonder if we might limit the number of shards we process in a single "go" on each local node. Like, maybe only grab 64 shards at a time? Like, grab the first index, dump all of it's shards in a list, grab the second one, if the list is <=64 then dump it's shards in too. Something like that. Then we're only optimizing on 64 shards at a time. And we can't get as good with the work stealing. But that feels simpler.

Not that we shouldn't do this too. We totally should.

dnhatn · 2023-12-26T18:28:22Z

We have no good way for the Lucene sources to signal that they've produced the last block for shard and no way to pick that signal up in the values reader operator and ordinals agg operator.

In each driver, we execute shard by shard sequentially with any data_partitioning. This means that the current shard is considered finished once the operator encounters a new shard.

nik9000 · 2023-12-26T18:46:59Z

Right. So we only need one shard's worth of field loading infrastructure open at a time. I'm not sure that's true after a topn, but I think it's mostly true.

dnhatn · 2023-12-26T18:56:51Z

Ah, that's a good point. I missed that.

dnhatn · 2023-12-26T21:31:28Z

I'm not sure that's true after a topn, but I think it's mostly true.

I think the proposal to batch 64 shards also encounters this problem

nik9000 · 2023-12-26T21:50:48Z

I think the proposal to batch 64 shards also encounters this problem

If we'd do 64 shards we'd plan each batch separately. But, yeah, it'd hit the same problem.

I think it's ok - we can limit the number of open shards for fetching. The way you propose should work ok. It's just not as clean as "there is only one".

I've spent today running down a vector leak but I'll have a look at the values reading stuff tomorrow.

elasticsearchmachine · 2024-01-02T19:46:43Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

dnhatn · 2024-01-21T19:10:25Z

@nik9000 I have thought a bit more about this issue. I believe your proposal to batch 64 shards at a time is the best solution. Since we don't currently perform a reduction per node, the change should be pretty contained in the DataNodeRequestHandler. I don't think we need to worry about the pauses between these batches. Even if we consider performing reduction per node in the future, there's no need to perform the reduction for all these batches, as the performance doesn't matter much in this context. I also think one of us should work on this. WDYT?

elasticsearchmachine · 2024-01-26T18:55:37Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000 · 2024-01-26T19:02:45Z

Sorry, I hadn't noticed your comment! I did look at not batching, but, we really do rely on these things being open. It'd take a bit of surgery to change. But batching is, like you say, contained.

Yeah, I think if it's "just" the batching then, yeah, one of us can handle it. And should do it soon.

dnhatn · 2024-01-27T01:06:26Z

Thanks Nik! I have opened #104832.

Today, we allow ESQL to execute against an unlimited number of shards concurrently on each node. This can lead to cases where we open and hold too many shards, equivalent to opening too many file descriptors or using too much memory for FieldInfos in ValuesSourceReaderOperator. This change limits the number of concurrent shards to 10 per node. This number was chosen based on the _search API, which limits it to 5. Besides the primary reason stated above, this change has other implications: We might execute fewer shards for queries with LIMIT only, leading to scenarios where we execute only some high-priority shards then stop. For now, we don't have a partial reduce at the node level, but if we introduce one in the future, it might not be as efficient as executing all shards at the same time. There are pauses between batches because batches are executed sequentially one by one. However, I believe the performance of queries executing against many shards (after can_match) is less important than resiliency. Closes #103666

Today, we allow ESQL to execute against an unlimited number of shards concurrently on each node. This can lead to cases where we open and hold too many shards, equivalent to opening too many file descriptors or using too much memory for FieldInfos in ValuesSourceReaderOperator. This change limits the number of concurrent shards to 10 per node. This number was chosen based on the _search API, which limits it to 5. Besides the primary reason stated above, this change has other implications: We might execute fewer shards for queries with LIMIT only, leading to scenarios where we execute only some high-priority shards then stop. For now, we don't have a partial reduce at the node level, but if we introduce one in the future, it might not be as efficient as executing all shards at the same time. There are pauses between batches because batches are executed sequentially one by one. However, I believe the performance of queries executing against many shards (after can_match) is less important than resiliency. Closes elastic#103666

Today, we allow ESQL to execute against an unlimited number of shards concurrently on each node. This can lead to cases where we open and hold too many shards, equivalent to opening too many file descriptors or using too much memory for FieldInfos in ValuesSourceReaderOperator. This change limits the number of concurrent shards to 10 per node. This number was chosen based on the _search API, which limits it to 5. Besides the primary reason stated above, this change has other implications: We might execute fewer shards for queries with LIMIT only, leading to scenarios where we execute only some high-priority shards then stop. For now, we don't have a partial reduce at the node level, but if we introduce one in the future, it might not be as efficient as executing all shards at the same time. There are pauses between batches because batches are executed sequentially one by one. However, I believe the performance of queries executing against many shards (after can_match) is less important than resiliency. Closes #103666 Backport of #104832

* Change release version lookup to an instance method (#104902) * Upgrade to Lucene 9.9.2 (#104753) This commit upgrades to Lucene 9.9.2. * Improve `CANNOT_REBALANCE_CAN_ALLOCATE` explanation (#104904) Clarify that in this situation there is a rebalancing move that would improve the cluster balance, but there's some reason why rebalancing is not happening. Also points at the `can_rebalance_cluster_decisions` as well as the node-by-node decisions since the action needed could be described in either place. * Get from translog fails with large dense_vector (#104700) This change fixes the engine to apply the current codec when retrieving documents from the translog. We need to use the same codec than the main index in order to ensure that all the source data is indexable. The internal codec treats some fields differently than the default one, for instance dense_vectors are limited to 1024 dimensions. This PR ensures that these customizations are applied when indexing document for translog retrieval. Closes #104639 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> * [Connector Secrets] Add delete API endpoint (#104815) * Add DELETE endpoint for /_connector/_secret/{id} * Add endpoint to write_connector_secrets cluster privilege * Merge Aggregations into InternalAggregations (#104896) This commit merges Aggregations into InternalAggregations in order to remove the unnecessary hierarchy. * [Profiling] Simplify cost calculation (#104816) * [Profiling] Add the number of cores to HostMetadata * Update AWS pricelist (remove cost_factor, add usd_per_hour) * Switch cost calculations from 'cost_factor' to 'usd_per_hour' * Remove superfluous CostEntry.toXContent() * Check for Number type in CostEntry.fromSource() * Add comment * Retry get_from_translog during relocations (#104579) During a promotable relocation, a `get_from_translog` sent by the unpromotable shard to handle a real-time get might encounter `ShardNotFoundException` or `IndexNotFoundException`. In these cases, we should retry. This is just for `GET`. I'll open a second PR for `mGET`. The relevant IT is in the Stateless PR. Relates ES-5727 * indicating fix for 8.12.1 for int8_hnsw (#104912) * Removing the assumption from some tests that the request builder's request() method always returns the same object (#104881) * [DOCS] Adds get setting and update settings asciidoc files to security API index (#104916) * [DOCS] Adds get setting and update settings asciidoc files to security API index. * [DOCS] Fixes references in docs. * Reuse APMMeterService of APMTelemetryProvider (#104906) * Mute more tests that tend to leak searchhits (#104922) * ESQL: Fix SearchStats#count(String) to count values not rows (#104891) SearchStats#count incorrectly counts the number of documents (or rows) in which a document appears instead of the actual number of values. This PR fixes this by looking at the term frequency instead of the doc count. Fix #104795 * Adding request source for cohere (#104926) * Fixing a broken javadoc comment in ReindexDocumentationIT (#104930) This fixes a javadoc comment that was broken by #104881 * Fix enabling / disabling of APM agent "recording" in APMAgentSettings (#104324) * Add `type` parameter support, for sorting, to the Query API Key API (#104625) This adds support for the `type` parameter, for sorting, to the Query API key API. The type for an API Key can currently be either `rest` or `cross_cluster`. This was overlooked in #103695 when support for the `type` parameter was first introduced only for querying. * Apply publish plugin to es-opensaml-security-api project (#104933) * Support `match` for the Query API Key API (#104594) This adds support for the `match` query type to the Query API key Information API. Note that since string values associated to API Keys are mapped as `keywords`, a `match` query with no analyzer parameter is effectively equivalent to a `term` query for such fields (e.g. `name`, `username`, `realm_name`). Relates: #101691 * [Connectors API] Relax strict response parsing for get/list operations (#104909) * Limit concurrent shards per node for ESQL (#104832) Today, we allow ESQL to execute against an unlimited number of shards concurrently on each node. This can lead to cases where we open and hold too many shards, equivalent to opening too many file descriptors or using too much memory for FieldInfos in ValuesSourceReaderOperator. This change limits the number of concurrent shards to 10 per node. This number was chosen based on the _search API, which limits it to 5. Besides the primary reason stated above, this change has other implications: We might execute fewer shards for queries with LIMIT only, leading to scenarios where we execute only some high-priority shards then stop. For now, we don't have a partial reduce at the node level, but if we introduce one in the future, it might not be as efficient as executing all shards at the same time. There are pauses between batches because batches are executed sequentially one by one. However, I believe the performance of queries executing against many shards (after can_match) is less important than resiliency. Closes #103666 * [DOCS] Support for nested functions in ES|QL STATS...BY (#104788) * Document nested expressions for stats * More docs * Apply suggestions from review - count-distinct.asciidoc - Content restructured, moving the section about approximate counts to end of doc. - count.asciidoc - Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows. - percentile.asciidoc - Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc. - stats.asciidoc - Clarified the `STATS` command - Added a note indicating that individual `null` values are skipped during aggregation * Comment out mentioning a buggy behavior * Update sum with inline function example, update test file * Fix typo * Delete line * Simplify wording * Fix conflict fix typo --------- Co-authored-by: Liam Thompson <leemthompo@gmail.com> Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * [ML] Passing input type through to cohere request (#104781) * Pushing input type through to cohere request * switching logic to allow request to always override * Fixing failure * Removing getModelId calls * Addressing feedback * Switching to enumset * [Transform] Unmute 2 remaining continuous tests: HistogramGroupByIT and TermsGroupByIT (#104898) * Adding ActionRequestLazyBuilder implementation of RequestBuilder (#104927) This introduces a second implementation of RequestBuilder (#104778). As opposed to ActionRequestBuilder, ActionRequestLazyBuilder does not create its request until the request() method is called, and does not hold onto that request (so each call to request() gets a new request instance). This PR also updates BulkRequestBuilder to inherit from ActionRequestLazyBuilder as an example of its use. * Update versions to skip after backport to 8.12 (#104953) * Update/Cleanup references to old tracing.apm.* legacy settings in favor of the telemetry.* settings (#104917) * Exclude tests that do not work in a mixed cluster scenario (#104935) * ES|QL: Improve type validation in aggs for UNSIGNED_LONG and better support for VERSION (#104911) * [Connector API] Make update configuration action non-additive (#104615) * Save allocating enum values array in two hot spots (#104952) Our readEnum code instantiates/clones enum value arrays on read. Normally, this doesn't matter much but the two spots adjusted here are visibly hot during bulk indexing, causing GBs of allocations during e.g. the http_logs indexing run. * ESQL: Correct out-of-range filter pushdowns (#99961) Fix pushed down filters for binary comparisons that compare a byte/short/int/long with an out of range value, like WHERE some_int_field < 1E300. * [DOCS] Dense vector element type should be float for OpenAI (#104966) * Fix test assertions (#104963) * Move functions that generate lucene geometries under a utility class (#104928) We have functions that generate lucene geometries scattered in different places of the code. This commit moves everything under a utility class. * fixing index versions --------- Co-authored-by: Simon Cooper <simon.cooper@elastic.co> Co-authored-by: Chris Hegarty <62058229+ChrisHegarty@users.noreply.github.com> Co-authored-by: David Turner <david.turner@elastic.co> Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Navarone Feekery <13634519+navarone-feekery@users.noreply.github.com> Co-authored-by: Ignacio Vera <ivera@apache.org> Co-authored-by: Tim Rühsen <tim.ruehsen@gmx.de> Co-authored-by: Pooya Salehi <pxsalehi@users.noreply.github.com> Co-authored-by: Keith Massey <keith.massey@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: Moritz Mack <mmack@apache.org> Co-authored-by: Costin Leau <costin@users.noreply.github.com> Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com> Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co> Co-authored-by: Mark Vieira <portugee@gmail.com> Co-authored-by: Jedr Blaszyk <jedrazb@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> Co-authored-by: Liam Thompson <leemthompo@gmail.com> Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co> Co-authored-by: Joe Gallo <joe.gallo@elastic.co> Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co> Co-authored-by: Luigi Dell'Aquila <luigi.dellaquila@gmail.com> Co-authored-by: Armin Braun <me@obrown.io> Co-authored-by: Alexander Spies <alexander.spies@elastic.co> Co-authored-by: David Kyle <david.kyle@elastic.co>

dnhatn added the :Analytics/ES|QL AKA ESQL label Dec 21, 2023

elasticsearchmachine added the Team:QL (Deprecated) Meta label for query languages team label Dec 21, 2023

dnhatn assigned nik9000 Dec 21, 2023

wchaparro added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2024

elasticsearchmachine removed the Team:QL (Deprecated) Meta label for query languages team label Jan 2, 2024

nik9000 mentioned this issue Jan 2, 2024

ESQL: Delay finding field load infrastructure #103821

Merged

nik9000 assigned dnhatn Jan 26, 2024

dnhatn mentioned this issue Jan 27, 2024

Limit concurrent shards per node for ESQL #104832

Merged

dnhatn closed this as completed in #104832 Jan 30, 2024

dnhatn mentioned this issue Jan 30, 2024

Limit concurrent shards per node in ESQL (#104832) #104946

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComputeService should not acquire and hold all SearchContexts at once #103666

ComputeService should not acquire and hold all SearchContexts at once #103666

dnhatn commented Dec 21, 2023

elasticsearchmachine commented Dec 21, 2023

elasticsearchmachine commented Dec 21, 2023

nik9000 commented Dec 26, 2023

dnhatn commented Dec 26, 2023

nik9000 commented Dec 26, 2023

dnhatn commented Dec 26, 2023

dnhatn commented Dec 26, 2023

nik9000 commented Dec 26, 2023

elasticsearchmachine commented Jan 2, 2024

dnhatn commented Jan 21, 2024

elasticsearchmachine commented Jan 26, 2024

nik9000 commented Jan 26, 2024

dnhatn commented Jan 27, 2024

ComputeService should not acquire and hold all SearchContexts at once #103666

ComputeService should not acquire and hold all SearchContexts at once #103666

Comments

dnhatn commented Dec 21, 2023

elasticsearchmachine commented Dec 21, 2023

elasticsearchmachine commented Dec 21, 2023

nik9000 commented Dec 26, 2023

dnhatn commented Dec 26, 2023

nik9000 commented Dec 26, 2023

dnhatn commented Dec 26, 2023

dnhatn commented Dec 26, 2023

nik9000 commented Dec 26, 2023

elasticsearchmachine commented Jan 2, 2024

dnhatn commented Jan 21, 2024

elasticsearchmachine commented Jan 26, 2024

nik9000 commented Jan 26, 2024

dnhatn commented Jan 27, 2024