Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch open-indices cluster state updates #83760

Merged
merged 13 commits into from Feb 10, 2022

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Feb 9, 2022

Related to #81627 and #83432

The crux of this is that it processes the whole batch all at once -- every task succeeds or fails. If we want to do better than that, then we'll need to be more clever about shardLimitValidator.validateShardLimit(currentState, indices). At present, it takes a cluster state, and we want to avoid creating new intermediate cluster states inside this batching executor.

Here's a scenario where the difference here matters. Let's imagine three requests to open indices: 3 indices for the first request, 2 for the second, and 1 for the third (all have 1 primary, 0 replicas). The shardLimitValidator thinks we have space for four more shards. Prior to this PR, the first and third requests would have succeeded, while the second would have failed. With this batching, we could have all of them fail (if they are executed as a single batch).

If we want to avoid that, we could run internal batching, but then we're generating intermediate cluster states just to pass them off to the shardLimitValidator and then we might as well go back to using AckedClusterStateUpdateTask. Or we could rewrite the shardLimitValidator to accept something cheaper that we could build/have throughout.

Yet another approach (my personal favorite) would be to invoke the shardLimitValidator multiple times against the various subsets of possible tasks to execute (e.g. if the current cluster state accepts the shards for the first task's indices (yes), will it accept the first and second tasks' (no), how about the first and third (yes)).

These aren't used from tests or other code, so they might as well be
private.
Mostly a bunch of ceremony around moving the innermost openIndices()
call to a custom ClusterStateTaskExecutor. For now this does the
simplest thing possible and 'just' pulls all the indices from all the
requests and gloms them together (well, with de-duplication) into one
big openIndices call.
@joegallo joegallo added :Data Management/Indices APIs APIs to create and manage indices and templates v8.2.0 labels Feb 9, 2022
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left only small comments.

@DaveCTurner
Copy link
Contributor

FWIW I'm undecided about the ShardLimitValidator question but I do lean towards the simple fail-everything solution proposed here. The limits are supposed to be unreasonably high and in the context of ILM we close-and-open fairly fast so it's hard to hit a limit on open without already having breached it before closing. And some opens are going to have to fail if you hit the limits.

IMO it's a bit of a bug that an automatic close-and-reopen goes through this state where shards don't count towards the limit for a brief period and can therefore fail to reopen like this. We know we're reopening them soon, we should keep their spot in the cluster reserved and fail other things instead.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Joe just one addition to David's points.

FWIW I'm undecided about the ShardLimitValidator question but I do lean towards the simple fail-everything solution proposed here.

++ good enough for now IMO

@joegallo joegallo marked this pull request as ready for review February 10, 2022 14:27
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 10, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticsearchmachine
Copy link
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

@joegallo joegallo changed the title Batch open-indices Batch open-indices cluster state updates Feb 10, 2022
@joegallo joegallo merged commit 2f9ceaa into elastic:master Feb 10, 2022
@joegallo joegallo deleted the batch-open-indices branch February 10, 2022 17:30
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Feb 12, 2022
tlrx pushed a commit to tlrx/elasticsearch that referenced this pull request Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants