New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch open-indices cluster state updates #83760
Conversation
These aren't used from tests or other code, so they might as well be private.
Mostly a bunch of ceremony around moving the innermost openIndices() call to a custom ClusterStateTaskExecutor. For now this does the simplest thing possible and 'just' pulls all the indices from all the requests and gloms them together (well, with de-duplication) into one big openIndices call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left only small comments.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexStateService.java
Outdated
Show resolved
Hide resolved
FWIW I'm undecided about the IMO it's a bit of a bug that an automatic close-and-reopen goes through this state where shards don't count towards the limit for a brief period and can therefore fail to reopen like this. We know we're reopening them soon, we should keep their spot in the cluster reserved and fail other things instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great Joe just one addition to David's points.
FWIW I'm undecided about the ShardLimitValidator question but I do lean towards the simple fail-everything solution proposed here.
++ good enough for now IMO
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexStateService.java
Outdated
Show resolved
Hide resolved
e21727b
to
1df639c
Compare
plus it parallels what we already have for "indices closed".
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @joegallo, I've created a changelog YAML for you. |
Related to #81627 and #83432
The crux of this is that it processes the whole batch all at once -- every task succeeds or fails. If we want to do better than that, then we'll need to be more clever about
shardLimitValidator.validateShardLimit(currentState, indices)
. At present, it takes a cluster state, and we want to avoid creating new intermediate cluster states inside this batching executor.Here's a scenario where the difference here matters. Let's imagine three requests to open indices: 3 indices for the first request, 2 for the second, and 1 for the third (all have 1 primary, 0 replicas). The shardLimitValidator thinks we have space for four more shards. Prior to this PR, the first and third requests would have succeeded, while the second would have failed. With this batching, we could have all of them fail (if they are executed as a single batch).
If we want to avoid that, we could run internal batching, but then we're generating intermediate cluster states just to pass them off to the
shardLimitValidator
and then we might as well go back to usingAckedClusterStateUpdateTask
. Or we could rewrite theshardLimitValidator
to accept something cheaper that we could build/have throughout.Yet another approach (my personal favorite) would be to invoke the
shardLimitValidator
multiple times against the various subsets of possible tasks to execute (e.g. if the current cluster state accepts the shards for the first task's indices (yes), will it accept the first and second tasks' (no), how about the first and third (yes)).