-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the number of concurrent shard snapshots #89826
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
I am not entirely sure I understand all the details here, but after checking also the linked SDH, I have the impression that the following two changes would potentially make this much less severe: @tlrx what do you think? Or do you mean that regardless of the size of a |
I think we have to be careful here with a limit because it could easily break large clusters at a random point when reached. The issue really isn't so much shards overall anyway is it? The issue is shards / node right? Also, note that #89619 will massively reduce the amount of memory consumed by the internal snapshot data structure as well as actually help speed up snapshotting many shards to some degree (the amount of CPU+time it takes to fully serialize the data structure for a large number of shards is out of control at the moment but disappears in the many shard benchmarks with the diffing implementation). Also, I wonder do we even still need this once we have #65318 (Joe's aware of it and on it now as far as I understand)? |
I agree we should be careful when defining this limit, which could also be defined as a max. number of shard snapshots per data nodes. Note that I'm proposing to reject new snapshots that would bring the total number of shard snapshots above the limit (I'm not proposing to fail snapshots of already queued/started shard snapshots).
I agree. My most recent experience was with a master+data node with a snapshot thread pool of
I do agree this is a good improvement but my fear is that without a limit on shard snapshots a cluster would still be at risk of having shard snapshots queued faster than it can process them, only at a different scale? And #65318 solves the SLM side of things but people using other tools to create snapshots won't be protected unless Elasticsearch rejects the snapshot creation itself. |
Maybe, but then it's really on the user to stop queuing up snapshots IMO. It seems to me form benchmarking diffing that at least memory wise we survive a lot with just the diffing fix and that might be good enough? |
I'm inclined to agree with Armin here. If they are pushing too hard then eventually they'll hit the concurrent ops limit, so as long as the cluster can cope with this many concurrent ops then I'm not sure we need another limit. Also in these cases many of the shard snapshots will be effective no-ops (except for some metadata work I guess). Perhaps we can work harder to deduplicate these things to save on the metadata work too? AIUI in this case it didn't work to cancel the ongoing snapshots because of all the enqueued work. If so, can we do something to address that? |
Yea we could add more batching here on the data node level for sure. It's part of the plan I had around #89019. Once that refactoring has landed we could make the NOOP snapshots a lot more efficient for sure. |
Description
Since #56911 we can create (or delete) snapshots concurrently and we are limited to 1000 operations at a time. But we don't have any limit on the number of shards these snapshots can contain, and in a cluster with many shards this can end up with hundred thousands shards waiting to be snapshotted.
I think we could introduce a limit on the maximum number of shards a cluster can snapshot a a time and reject any new snapshot creation that would cause this limit to be exceeded (without adding it to the cluster state as a new snapshot-in-progress entry).
This would also serve as a cheap back-pressure mechanism in case aggreassive SLM policies are creating new snapshots faster than the cluster can snapshot the shards.
The text was updated successfully, but these errors were encountered: