Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Nov 21, 2025

This change sets up infrastructure to store split original and concrete indices after resolution.
This information is required in ComputeService during the query execution and can not be derived there in case of CPS. When running in flat mode this information would be supplied separately (in a followup pr).

private final IndexMode indexMode;
// the below data structures are only needed on coordinator and are transient for now
private final transient Map<String, List<String>> originalIndices; // keyed by cluster alias
private final transient Map<String, List<String>> concreteIndices; // keyed by cluster alias
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping them transient causes a lot of serialization test failures.
I prefer to keep them serialized for now and find a way to make them transient later.

@idegtiarenko idegtiarenko marked this pull request as ready for review November 24, 2025 10:23
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 24, 2025
@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Nov 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @idegtiarenko, I did a first pass and it looks good in general.
I left a couple of comments, see below.
Apart from that, let's see what the CI says and then I think we can proceed

}

public Set<String> concreteIndices() {
public Set<String> concreteQualifiedIndices() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Will these all be qualified also in CPS? I mean, will local indices have an _origin: prefix? I'm just asking to learn, but I think concreteQualifiedIndices is a good method name even if they won't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe local indices will not have _origin prefix, only remote will have prefixes. I will confirm this once we are able to run CPS query.

originalIndices = in.readMapOfLists(StreamInput::readString);
concreteIndices = in.readMapOfLists(StreamInput::readString);
} else {
originalIndices = Map.of();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scares me a bit, but I think it's fine.

Will a remote need this information to execute the plan fragment?
Looking at the code I'd say no.

One day, when we start supporting nested sub-queries and maybe multi-step coordination (ie. when the remote cluster will become a "secondary" master, potentially sending a fragment to a third cluster), the primary master will still be unable to coordinate such queries if it's on an older version, so we'll always be sure that we have the two maps.

My conclusion is that it's all good, but I wanted to write it down, in case it rings a bell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Remotes/data nodes do not need this information, at least today. I will add a note that this information is only available in recent versions.

@idegtiarenko idegtiarenko merged commit f4d969d into elastic:main Nov 24, 2025
35 checks passed
@idegtiarenko idegtiarenko deleted the store_split_indices branch November 24, 2025 13:21
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 24, 2025
…-json

* upstream/main: (247 commits)
  Mute org.elasticsearch.xpack.inference.integration.SemanticTextIndexOptionsIT testValidateIndexOptionsWithBasicLicense elastic#138513
  Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackLookupJoinIT testLookupExplosionBigString elastic#138510
  This shouldn't be zero (elastic#138501)
  sum of empty histogram is now null (elastic#138378)
  Test ES|QL bfloat16 support (elastic#138499)
  Fix exception handling in S3 `compareAndExchangeRegister` (elastic#138488)
  Mute org.elasticsearch.xpack.exponentialhistogram.ExponentialHistogramFieldMapperTests testFormattedDocValues elastic#138504
  Mute org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT test {yaml=ingest_geoip/60_ip_location_databases/Test adding, getting, and removing ip location databases} elastic#138502
  ESQL: Refactor HeapAttackIT (elastic#138432)
  [Inference API] Add ElasticInferenceServiceDenseTextEmbeddingsServiceSettings to InferenceNamedWriteablesProvider (elastic#138484)
  Store split indices (elastic#138396)
  ES|QL Update CHUNK to support chunking_settings as optional argument (elastic#138123)
  Extract common blob-update logic in `S3HttpHandler` (elastic#138490)
  Cleanup esql request building api (elastic#138398)
  Round sum and avg in exponential_histogram CSV tests (elastic#138472)
  ESQL: load exponential_histogram total count as double instead of long (elastic#138417)
  [SIMD] Use fixed width native types for better Java interoperability (elastic#138429)
  Do not use Min or Max as Top's surrogate when there is an outputField (elastic#138380)
  ES|QL: Fix generative tests (elastic#138478)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants