-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Store split indices #138396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store split indices #138396
Conversation
6bca32e to
83f03f3
Compare
83f03f3 to
2a92340
Compare
| private final IndexMode indexMode; | ||
| // the below data structures are only needed on coordinator and are transient for now | ||
| private final transient Map<String, List<String>> originalIndices; // keyed by cluster alias | ||
| private final transient Map<String, List<String>> concreteIndices; // keyed by cluster alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping them transient causes a lot of serialization test failures.
I prefer to keep them serialized for now and find a way to make them transient later.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
luigidellaquila
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @idegtiarenko, I did a first pass and it looks good in general.
I left a couple of comments, see below.
Apart from that, let's see what the CI says and then I think we can proceed
| } | ||
|
|
||
| public Set<String> concreteIndices() { | ||
| public Set<String> concreteQualifiedIndices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Will these all be qualified also in CPS? I mean, will local indices have an _origin: prefix? I'm just asking to learn, but I think concreteQualifiedIndices is a good method name even if they won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe local indices will not have _origin prefix, only remote will have prefixes. I will confirm this once we are able to run CPS query.
| originalIndices = in.readMapOfLists(StreamInput::readString); | ||
| concreteIndices = in.readMapOfLists(StreamInput::readString); | ||
| } else { | ||
| originalIndices = Map.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scares me a bit, but I think it's fine.
Will a remote need this information to execute the plan fragment?
Looking at the code I'd say no.
One day, when we start supporting nested sub-queries and maybe multi-step coordination (ie. when the remote cluster will become a "secondary" master, potentially sending a fragment to a third cluster), the primary master will still be unable to coordinate such queries if it's on an older version, so we'll always be sure that we have the two maps.
My conclusion is that it's all good, but I wanted to write it down, in case it rings a bell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Remotes/data nodes do not need this information, at least today. I will add a note that this information is only available in recent versions.
…-json * upstream/main: (247 commits) Mute org.elasticsearch.xpack.inference.integration.SemanticTextIndexOptionsIT testValidateIndexOptionsWithBasicLicense elastic#138513 Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackLookupJoinIT testLookupExplosionBigString elastic#138510 This shouldn't be zero (elastic#138501) sum of empty histogram is now null (elastic#138378) Test ES|QL bfloat16 support (elastic#138499) Fix exception handling in S3 `compareAndExchangeRegister` (elastic#138488) Mute org.elasticsearch.xpack.exponentialhistogram.ExponentialHistogramFieldMapperTests testFormattedDocValues elastic#138504 Mute org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT test {yaml=ingest_geoip/60_ip_location_databases/Test adding, getting, and removing ip location databases} elastic#138502 ESQL: Refactor HeapAttackIT (elastic#138432) [Inference API] Add ElasticInferenceServiceDenseTextEmbeddingsServiceSettings to InferenceNamedWriteablesProvider (elastic#138484) Store split indices (elastic#138396) ES|QL Update CHUNK to support chunking_settings as optional argument (elastic#138123) Extract common blob-update logic in `S3HttpHandler` (elastic#138490) Cleanup esql request building api (elastic#138398) Round sum and avg in exponential_histogram CSV tests (elastic#138472) ESQL: load exponential_histogram total count as double instead of long (elastic#138417) [SIMD] Use fixed width native types for better Java interoperability (elastic#138429) Do not use Min or Max as Top's surrogate when there is an outputField (elastic#138380) ES|QL: Fix generative tests (elastic#138478) Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480 ...
This change sets up infrastructure to store split original and concrete indices after resolution.
This information is required in
ComputeServiceduring the query execution and can not be derived there in case of CPS. When running in flat mode this information would be supplied separately (in a followup pr).